Sample records for proposed statistical method

  1. General Framework for Meta-analysis of Rare Variants in Sequencing Association Studies

    PubMed Central

    Lee, Seunggeun; Teslovich, Tanya M.; Boehnke, Michael; Lin, Xihong

    2013-01-01

    We propose a general statistical framework for meta-analysis of gene- or region-based multimarker rare variant association tests in sequencing association studies. In genome-wide association studies, single-marker meta-analysis has been widely used to increase statistical power by combining results via regression coefficients and standard errors from different studies. In analysis of rare variants in sequencing studies, region-based multimarker tests are often used to increase power. We propose meta-analysis methods for commonly used gene- or region-based rare variants tests, such as burden tests and variance component tests. Because estimation of regression coefficients of individual rare variants is often unstable or not feasible, the proposed method avoids this difficulty by calculating score statistics instead that only require fitting the null model for each study and then aggregating these score statistics across studies. Our proposed meta-analysis rare variant association tests are conducted based on study-specific summary statistics, specifically score statistics for each variant and between-variant covariance-type (linkage disequilibrium) relationship statistics for each gene or region. The proposed methods are able to incorporate different levels of heterogeneity of genetic effects across studies and are applicable to meta-analysis of multiple ancestry groups. We show that the proposed methods are essentially as powerful as joint analysis by directly pooling individual level genotype data. We conduct extensive simulations to evaluate the performance of our methods by varying levels of heterogeneity across studies, and we apply the proposed methods to meta-analysis of rare variant effects in a multicohort study of the genetics of blood lipid levels. PMID:23768515

  2. A spatial scan statistic for multiple clusters.

    PubMed

    Li, Xiao-Zhou; Wang, Jin-Feng; Yang, Wei-Zhong; Li, Zhong-Jie; Lai, Sheng-Jie

    2011-10-01

    Spatial scan statistics are commonly used for geographical disease surveillance and cluster detection. While there are multiple clusters coexisting in the study area, they become difficult to detect because of clusters' shadowing effect to each other. The recently proposed sequential method showed its better power for detecting the second weaker cluster, but did not improve the ability of detecting the first stronger cluster which is more important than the second one. We propose a new extension of the spatial scan statistic which could be used to detect multiple clusters. Through constructing two or more clusters in the alternative hypothesis, our proposed method accounts for other coexisting clusters in the detecting and evaluating process. The performance of the proposed method is compared to the sequential method through an intensive simulation study, in which our proposed method shows better power in terms of both rejecting the null hypothesis and accurately detecting the coexisting clusters. In the real study of hand-foot-mouth disease data in Pingdu city, a true cluster town is successfully detected by our proposed method, which cannot be evaluated to be statistically significant by the standard method due to another cluster's shadowing effect. Copyright © 2011 Elsevier Inc. All rights reserved.

  3. Autoregressive statistical pattern recognition algorithms for damage detection in civil structures

    NASA Astrophysics Data System (ADS)

    Yao, Ruigen; Pakzad, Shamim N.

    2012-08-01

    Statistical pattern recognition has recently emerged as a promising set of complementary methods to system identification for automatic structural damage assessment. Its essence is to use well-known concepts in statistics for boundary definition of different pattern classes, such as those for damaged and undamaged structures. In this paper, several statistical pattern recognition algorithms using autoregressive models, including statistical control charts and hypothesis testing, are reviewed as potentially competitive damage detection techniques. To enhance the performance of statistical methods, new feature extraction techniques using model spectra and residual autocorrelation, together with resampling-based threshold construction methods, are proposed. Subsequently, simulated acceleration data from a multi degree-of-freedom system is generated to test and compare the efficiency of the existing and proposed algorithms. Data from laboratory experiments conducted on a truss and a large-scale bridge slab model are then used to further validate the damage detection methods and demonstrate the superior performance of proposed algorithms.

  4. A note on the kappa statistic for clustered dichotomous data.

    PubMed

    Zhou, Ming; Yang, Zhao

    2014-06-30

    The kappa statistic is widely used to assess the agreement between two raters. Motivated by a simulation-based cluster bootstrap method to calculate the variance of the kappa statistic for clustered physician-patients dichotomous data, we investigate its special correlation structure and develop a new simple and efficient data generation algorithm. For the clustered physician-patients dichotomous data, based on the delta method and its special covariance structure, we propose a semi-parametric variance estimator for the kappa statistic. An extensive Monte Carlo simulation study is performed to evaluate the performance of the new proposal and five existing methods with respect to the empirical coverage probability, root-mean-square error, and average width of the 95% confidence interval for the kappa statistic. The variance estimator ignoring the dependence within a cluster is generally inappropriate, and the variance estimators from the new proposal, bootstrap-based methods, and the sampling-based delta method perform reasonably well for at least a moderately large number of clusters (e.g., the number of clusters K ⩾50). The new proposal and sampling-based delta method provide convenient tools for efficient computations and non-simulation-based alternatives to the existing bootstrap-based methods. Moreover, the new proposal has acceptable performance even when the number of clusters is as small as K = 25. To illustrate the practical application of all the methods, one psychiatric research data and two simulated clustered physician-patients dichotomous data are analyzed. Copyright © 2014 John Wiley & Sons, Ltd.

  5. Monte Carlo based statistical power analysis for mediation models: methods and software.

    PubMed

    Zhang, Zhiyong

    2014-12-01

    The existing literature on statistical power analysis for mediation models often assumes data normality and is based on a less powerful Sobel test instead of the more powerful bootstrap test. This study proposes to estimate statistical power to detect mediation effects on the basis of the bootstrap method through Monte Carlo simulation. Nonnormal data with excessive skewness and kurtosis are allowed in the proposed method. A free R package called bmem is developed to conduct the power analysis discussed in this study. Four examples, including a simple mediation model, a multiple-mediator model with a latent mediator, a multiple-group mediation model, and a longitudinal mediation model, are provided to illustrate the proposed method.

  6. Statistical methods to estimate treatment effects from multichannel electroencephalography (EEG) data in clinical trials.

    PubMed

    Ma, Junshui; Wang, Shubing; Raubertas, Richard; Svetnik, Vladimir

    2010-07-15

    With the increasing popularity of using electroencephalography (EEG) to reveal the treatment effect in drug development clinical trials, the vast volume and complex nature of EEG data compose an intriguing, but challenging, topic. In this paper the statistical analysis methods recommended by the EEG community, along with methods frequently used in the published literature, are first reviewed. A straightforward adjustment of the existing methods to handle multichannel EEG data is then introduced. In addition, based on the spatial smoothness property of EEG data, a new category of statistical methods is proposed. The new methods use a linear combination of low-degree spherical harmonic (SPHARM) basis functions to represent a spatially smoothed version of the EEG data on the scalp, which is close to a sphere in shape. In total, seven statistical methods, including both the existing and the newly proposed methods, are applied to two clinical datasets to compare their power to detect a drug effect. Contrary to the EEG community's recommendation, our results suggest that (1) the nonparametric method does not outperform its parametric counterpart; and (2) including baseline data in the analysis does not always improve the statistical power. In addition, our results recommend that (3) simple paired statistical tests should be avoided due to their poor power; and (4) the proposed spatially smoothed methods perform better than their unsmoothed versions. Copyright 2010 Elsevier B.V. All rights reserved.

  7. A powerful approach for association analysis incorporating imprinting effects

    PubMed Central

    Xia, Fan; Zhou, Ji-Yuan; Fung, Wing Kam

    2011-01-01

    Motivation: For a diallelic marker locus, the transmission disequilibrium test (TDT) is a simple and powerful design for genetic studies. The TDT was originally proposed for use in families with both parents available (complete nuclear families) and has further been extended to 1-TDT for use in families with only one of the parents available (incomplete nuclear families). Currently, the increasing interest of the influence of parental imprinting on heritability indicates the importance of incorporating imprinting effects into the mapping of association variants. Results: In this article, we extend the TDT-type statistics to incorporate imprinting effects and develop a series of new test statistics in a general two-stage framework for association studies. Our test statistics enjoy the nature of family-based designs that need no assumption of Hardy–Weinberg equilibrium. Also, the proposed methods accommodate complete and incomplete nuclear families with one or more affected children. In the simulation study, we verify the validity of the proposed test statistics under various scenarios, and compare the powers of the proposed statistics with some existing test statistics. It is shown that our methods greatly improve the power for detecting association in the presence of imprinting effects. We further demonstrate the advantage of our methods by the application of the proposed test statistics to a rheumatoid arthritis dataset. Contact: wingfung@hku.hk Supplementary information: Supplementary data are available at Bioinformatics online. PMID:21798962

  8. A powerful approach for association analysis incorporating imprinting effects.

    PubMed

    Xia, Fan; Zhou, Ji-Yuan; Fung, Wing Kam

    2011-09-15

    For a diallelic marker locus, the transmission disequilibrium test (TDT) is a simple and powerful design for genetic studies. The TDT was originally proposed for use in families with both parents available (complete nuclear families) and has further been extended to 1-TDT for use in families with only one of the parents available (incomplete nuclear families). Currently, the increasing interest of the influence of parental imprinting on heritability indicates the importance of incorporating imprinting effects into the mapping of association variants. In this article, we extend the TDT-type statistics to incorporate imprinting effects and develop a series of new test statistics in a general two-stage framework for association studies. Our test statistics enjoy the nature of family-based designs that need no assumption of Hardy-Weinberg equilibrium. Also, the proposed methods accommodate complete and incomplete nuclear families with one or more affected children. In the simulation study, we verify the validity of the proposed test statistics under various scenarios, and compare the powers of the proposed statistics with some existing test statistics. It is shown that our methods greatly improve the power for detecting association in the presence of imprinting effects. We further demonstrate the advantage of our methods by the application of the proposed test statistics to a rheumatoid arthritis dataset. wingfung@hku.hk Supplementary data are available at Bioinformatics online.

  9. 78 FR 70059 - Agency Information Collection Activities: Proposed Collection; Comment Request

    Federal Register 2010, 2011, 2012, 2013, 2014

    2013-11-22

    ... (as opposed to quantitative statistical methods). In consultation with research experts, we have... qualitative interviews (as opposed to quantitative statistical methods). In consultation with research experts... utilization of qualitative interviews (as opposed to quantitative statistical methods). In consultation with...

  10. Local statistics adaptive entropy coding method for the improvement of H.26L VLC coding

    NASA Astrophysics Data System (ADS)

    Yoo, Kook-yeol; Kim, Jong D.; Choi, Byung-Sun; Lee, Yung Lyul

    2000-05-01

    In this paper, we propose an adaptive entropy coding method to improve the VLC coding efficiency of H.26L TML-1 codec. First of all, we will show that the VLC coding presented in TML-1 does not satisfy the sibling property of entropy coding. Then, we will modify the coding method into the local statistics adaptive one to satisfy the property. The proposed method based on the local symbol statistics dynamically changes the mapping relationship between symbol and bit pattern in the VLC table according to sibling property. Note that the codewords in the VLC table of TML-1 codec is not changed. Since this changed mapping relationship also derived in the decoder side by using the decoded symbols, the proposed VLC coding method does not require any overhead information. The simulation results show that the proposed method gives about 30% and 37% reduction in average bit rate for MB type and CBP information, respectively.

  11. Finding differentially expressed genes in high dimensional data: Rank based test statistic via a distance measure.

    PubMed

    Mathur, Sunil; Sadana, Ajit

    2015-12-01

    We present a rank-based test statistic for the identification of differentially expressed genes using a distance measure. The proposed test statistic is highly robust against extreme values and does not assume the distribution of parent population. Simulation studies show that the proposed test is more powerful than some of the commonly used methods, such as paired t-test, Wilcoxon signed rank test, and significance analysis of microarray (SAM) under certain non-normal distributions. The asymptotic distribution of the test statistic, and the p-value function are discussed. The application of proposed method is shown using a real-life data set. © The Author(s) 2011.

  12. No-Reference Video Quality Assessment Based on Statistical Analysis in 3D-DCT Domain.

    PubMed

    Li, Xuelong; Guo, Qun; Lu, Xiaoqiang

    2016-05-13

    It is an important task to design models for universal no-reference video quality assessment (NR-VQA) in multiple video processing and computer vision applications. However, most existing NR-VQA metrics are designed for specific distortion types which are not often aware in practical applications. A further deficiency is that the spatial and temporal information of videos is hardly considered simultaneously. In this paper, we propose a new NR-VQA metric based on the spatiotemporal natural video statistics (NVS) in 3D discrete cosine transform (3D-DCT) domain. In the proposed method, a set of features are firstly extracted based on the statistical analysis of 3D-DCT coefficients to characterize the spatiotemporal statistics of videos in different views. These features are used to predict the perceived video quality via the efficient linear support vector regression (SVR) model afterwards. The contributions of this paper are: 1) we explore the spatiotemporal statistics of videos in 3DDCT domain which has the inherent spatiotemporal encoding advantage over other widely used 2D transformations; 2) we extract a small set of simple but effective statistical features for video visual quality prediction; 3) the proposed method is universal for multiple types of distortions and robust to different databases. The proposed method is tested on four widely used video databases. Extensive experimental results demonstrate that the proposed method is competitive with the state-of-art NR-VQA metrics and the top-performing FR-VQA and RR-VQA metrics.

  13. Robustness of S1 statistic with Hodges-Lehmann for skewed distributions

    NASA Astrophysics Data System (ADS)

    Ahad, Nor Aishah; Yahaya, Sharipah Soaad Syed; Yin, Lee Ping

    2016-10-01

    Analysis of variance (ANOVA) is a common use parametric method to test the differences in means for more than two groups when the populations are normally distributed. ANOVA is highly inefficient under the influence of non- normal and heteroscedastic settings. When the assumptions are violated, researchers are looking for alternative such as Kruskal-Wallis under nonparametric or robust method. This study focused on flexible method, S1 statistic for comparing groups using median as the location estimator. S1 statistic was modified by substituting the median with Hodges-Lehmann and the default scale estimator with the variance of Hodges-Lehmann and MADn to produce two different test statistics for comparing groups. Bootstrap method was used for testing the hypotheses since the sampling distributions of these modified S1 statistics are unknown. The performance of the proposed statistic in terms of Type I error was measured and compared against the original S1 statistic, ANOVA and Kruskal-Wallis. The propose procedures show improvement compared to the original statistic especially under extremely skewed distribution.

  14. A spatial scan statistic for survival data based on Weibull distribution.

    PubMed

    Bhatt, Vijaya; Tiwari, Neeraj

    2014-05-20

    The spatial scan statistic has been developed as a geographical cluster detection analysis tool for different types of data sets such as Bernoulli, Poisson, ordinal, normal and exponential. We propose a scan statistic for survival data based on Weibull distribution. It may also be used for other survival distributions, such as exponential, gamma, and log normal. The proposed method is applied on the survival data of tuberculosis patients for the years 2004-2005 in Nainital district of Uttarakhand, India. Simulation studies reveal that the proposed method performs well for different survival distribution functions. Copyright © 2013 John Wiley & Sons, Ltd.

  15. Statistical Modeling of Retinal Optical Coherence Tomography.

    PubMed

    Amini, Zahra; Rabbani, Hossein

    2016-06-01

    In this paper, a new model for retinal Optical Coherence Tomography (OCT) images is proposed. This statistical model is based on introducing a nonlinear Gaussianization transform to convert the probability distribution function (pdf) of each OCT intra-retinal layer to a Gaussian distribution. The retina is a layered structure and in OCT each of these layers has a specific pdf which is corrupted by speckle noise, therefore a mixture model for statistical modeling of OCT images is proposed. A Normal-Laplace distribution, which is a convolution of a Laplace pdf and Gaussian noise, is proposed as the distribution of each component of this model. The reason for choosing Laplace pdf is the monotonically decaying behavior of OCT intensities in each layer for healthy cases. After fitting a mixture model to the data, each component is gaussianized and all of them are combined by Averaged Maximum A Posterior (AMAP) method. To demonstrate the ability of this method, a new contrast enhancement method based on this statistical model is proposed and tested on thirteen healthy 3D OCTs taken by the Topcon 3D OCT and five 3D OCTs from Age-related Macular Degeneration (AMD) patients, taken by Zeiss Cirrus HD-OCT. Comparing the results with two contending techniques, the prominence of the proposed method is demonstrated both visually and numerically. Furthermore, to prove the efficacy of the proposed method for a more direct and specific purpose, an improvement in the segmentation of intra-retinal layers using the proposed contrast enhancement method as a preprocessing step, is demonstrated.

  16. Statistical Method to Overcome Overfitting Issue in Rational Function Models

    NASA Astrophysics Data System (ADS)

    Alizadeh Moghaddam, S. H.; Mokhtarzade, M.; Alizadeh Naeini, A.; Alizadeh Moghaddam, S. A.

    2017-09-01

    Rational function models (RFMs) are known as one of the most appealing models which are extensively applied in geometric correction of satellite images and map production. Overfitting is a common issue, in the case of terrain dependent RFMs, that degrades the accuracy of RFMs-derived geospatial products. This issue, resulting from the high number of RFMs' parameters, leads to ill-posedness of the RFMs. To tackle this problem, in this study, a fast and robust statistical approach is proposed and compared to Tikhonov regularization (TR) method, as a frequently-used solution to RFMs' overfitting. In the proposed method, a statistical test, namely, significance test is applied to search for the RFMs' parameters that are resistant against overfitting issue. The performance of the proposed method was evaluated for two real data sets of Cartosat-1 satellite images. The obtained results demonstrate the efficiency of the proposed method in term of the achievable level of accuracy. This technique, indeed, shows an improvement of 50-80% over the TR.

  17. The Utility of Robust Means in Statistics

    ERIC Educational Resources Information Center

    Goodwyn, Fara

    2012-01-01

    Location estimates calculated from heuristic data were examined using traditional and robust statistical methods. The current paper demonstrates the impact outliers have on the sample mean and proposes robust methods to control for outliers in sample data. Traditional methods fail because they rely on the statistical assumptions of normality and…

  18. Linear regression models and k-means clustering for statistical analysis of fNIRS data.

    PubMed

    Bonomini, Viola; Zucchelli, Lucia; Re, Rebecca; Ieva, Francesca; Spinelli, Lorenzo; Contini, Davide; Paganoni, Anna; Torricelli, Alessandro

    2015-02-01

    We propose a new algorithm, based on a linear regression model, to statistically estimate the hemodynamic activations in fNIRS data sets. The main concern guiding the algorithm development was the minimization of assumptions and approximations made on the data set for the application of statistical tests. Further, we propose a K-means method to cluster fNIRS data (i.e. channels) as activated or not activated. The methods were validated both on simulated and in vivo fNIRS data. A time domain (TD) fNIRS technique was preferred because of its high performances in discriminating cortical activation and superficial physiological changes. However, the proposed method is also applicable to continuous wave or frequency domain fNIRS data sets.

  19. Linear regression models and k-means clustering for statistical analysis of fNIRS data

    PubMed Central

    Bonomini, Viola; Zucchelli, Lucia; Re, Rebecca; Ieva, Francesca; Spinelli, Lorenzo; Contini, Davide; Paganoni, Anna; Torricelli, Alessandro

    2015-01-01

    We propose a new algorithm, based on a linear regression model, to statistically estimate the hemodynamic activations in fNIRS data sets. The main concern guiding the algorithm development was the minimization of assumptions and approximations made on the data set for the application of statistical tests. Further, we propose a K-means method to cluster fNIRS data (i.e. channels) as activated or not activated. The methods were validated both on simulated and in vivo fNIRS data. A time domain (TD) fNIRS technique was preferred because of its high performances in discriminating cortical activation and superficial physiological changes. However, the proposed method is also applicable to continuous wave or frequency domain fNIRS data sets. PMID:25780751

  20. On the Spike Train Variability Characterized by Variance-to-Mean Power Relationship.

    PubMed

    Koyama, Shinsuke

    2015-07-01

    We propose a statistical method for modeling the non-Poisson variability of spike trains observed in a wide range of brain regions. Central to our approach is the assumption that the variance and the mean of interspike intervals are related by a power function characterized by two parameters: the scale factor and exponent. It is shown that this single assumption allows the variability of spike trains to have an arbitrary scale and various dependencies on the firing rate in the spike count statistics, as well as in the interval statistics, depending on the two parameters of the power function. We also propose a statistical model for spike trains that exhibits the variance-to-mean power relationship. Based on this, a maximum likelihood method is developed for inferring the parameters from rate-modulated spike trains. The proposed method is illustrated on simulated and experimental spike trains.

  1. Test Statistics and Confidence Intervals to Establish Noninferiority between Treatments with Ordinal Categorical Data.

    PubMed

    Zhang, Fanghong; Miyaoka, Etsuo; Huang, Fuping; Tanaka, Yutaka

    2015-01-01

    The problem for establishing noninferiority is discussed between a new treatment and a standard (control) treatment with ordinal categorical data. A measure of treatment effect is used and a method of specifying noninferiority margin for the measure is provided. Two Z-type test statistics are proposed where the estimation of variance is constructed under the shifted null hypothesis using U-statistics. Furthermore, the confidence interval and the sample size formula are given based on the proposed test statistics. The proposed procedure is applied to a dataset from a clinical trial. A simulation study is conducted to compare the performance of the proposed test statistics with that of the existing ones, and the results show that the proposed test statistics are better in terms of the deviation from nominal level and the power.

  2. Statistical inference based on the nonparametric maximum likelihood estimator under double-truncation.

    PubMed

    Emura, Takeshi; Konno, Yoshihiko; Michimae, Hirofumi

    2015-07-01

    Doubly truncated data consist of samples whose observed values fall between the right- and left- truncation limits. With such samples, the distribution function of interest is estimated using the nonparametric maximum likelihood estimator (NPMLE) that is obtained through a self-consistency algorithm. Owing to the complicated asymptotic distribution of the NPMLE, the bootstrap method has been suggested for statistical inference. This paper proposes a closed-form estimator for the asymptotic covariance function of the NPMLE, which is computationally attractive alternative to bootstrapping. Furthermore, we develop various statistical inference procedures, such as confidence interval, goodness-of-fit tests, and confidence bands to demonstrate the usefulness of the proposed covariance estimator. Simulations are performed to compare the proposed method with both the bootstrap and jackknife methods. The methods are illustrated using the childhood cancer dataset.

  3. Statistical Methods in Psychology Journals.

    ERIC Educational Resources Information Center

    Willkinson, Leland

    1999-01-01

    Proposes guidelines for revising the American Psychological Association (APA) publication manual or other APA materials to clarify the application of statistics in research reports. The guidelines are intended to induce authors and editors to recognize the thoughtless application of statistical methods. Contains 54 references. (SLD)

  4. A spatial scan statistic for nonisotropic two-level risk cluster.

    PubMed

    Li, Xiao-Zhou; Wang, Jin-Feng; Yang, Wei-Zhong; Li, Zhong-Jie; Lai, Sheng-Jie

    2012-01-30

    Spatial scan statistic methods are commonly used for geographical disease surveillance and cluster detection. The standard spatial scan statistic does not model any variability in the underlying risks of subregions belonging to a detected cluster. For a multilevel risk cluster, the isotonic spatial scan statistic could model a centralized high-risk kernel in the cluster. Because variations in disease risks are anisotropic owing to different social, economical, or transport factors, the real high-risk kernel will not necessarily take the central place in a whole cluster area. We propose a spatial scan statistic for a nonisotropic two-level risk cluster, which could be used to detect a whole cluster and a noncentralized high-risk kernel within the cluster simultaneously. The performance of the three methods was evaluated through an intensive simulation study. Our proposed nonisotropic two-level method showed better power and geographical precision with two-level risk cluster scenarios, especially for a noncentralized high-risk kernel. Our proposed method is illustrated using the hand-foot-mouth disease data in Pingdu City, Shandong, China in May 2009, compared with two other methods. In this practical study, the nonisotropic two-level method is the only way to precisely detect a high-risk area in a detected whole cluster. Copyright © 2011 John Wiley & Sons, Ltd.

  5. Structural damage detection based on stochastic subspace identification and statistical pattern recognition: I. Theory

    NASA Astrophysics Data System (ADS)

    Ren, W. X.; Lin, Y. Q.; Fang, S. E.

    2011-11-01

    One of the key issues in vibration-based structural health monitoring is to extract the damage-sensitive but environment-insensitive features from sampled dynamic response measurements and to carry out the statistical analysis of these features for structural damage detection. A new damage feature is proposed in this paper by using the system matrices of the forward innovation model based on the covariance-driven stochastic subspace identification of a vibrating system. To overcome the variations of the system matrices, a non-singularity transposition matrix is introduced so that the system matrices are normalized to their standard forms. For reducing the effects of modeling errors, noise and environmental variations on measured structural responses, a statistical pattern recognition paradigm is incorporated into the proposed method. The Mahalanobis and Euclidean distance decision functions of the damage feature vector are adopted by defining a statistics-based damage index. The proposed structural damage detection method is verified against one numerical signal and two numerical beams. It is demonstrated that the proposed statistics-based damage index is sensitive to damage and shows some robustness to the noise and false estimation of the system ranks. The method is capable of locating damage of the beam structures under different types of excitations. The robustness of the proposed damage detection method to the variations in environmental temperature is further validated in a companion paper by a reinforced concrete beam tested in the laboratory and a full-scale arch bridge tested in the field.

  6. A nonparametric spatial scan statistic for continuous data.

    PubMed

    Jung, Inkyung; Cho, Ho Jin

    2015-10-20

    Spatial scan statistics are widely used for spatial cluster detection, and several parametric models exist. For continuous data, a normal-based scan statistic can be used. However, the performance of the model has not been fully evaluated for non-normal data. We propose a nonparametric spatial scan statistic based on the Wilcoxon rank-sum test statistic and compared the performance of the method with parametric models via a simulation study under various scenarios. The nonparametric method outperforms the normal-based scan statistic in terms of power and accuracy in almost all cases under consideration in the simulation study. The proposed nonparametric spatial scan statistic is therefore an excellent alternative to the normal model for continuous data and is especially useful for data following skewed or heavy-tailed distributions.

  7. Statistical testing of association between menstruation and migraine.

    PubMed

    Barra, Mathias; Dahl, Fredrik A; Vetvik, Kjersti G

    2015-02-01

    To repair and refine a previously proposed method for statistical analysis of association between migraine and menstruation. Menstrually related migraine (MRM) affects about 20% of female migraineurs in the general population. The exact pathophysiological link from menstruation to migraine is hypothesized to be through fluctuations in female reproductive hormones, but the exact mechanisms remain unknown. Therefore, the main diagnostic criterion today is concurrency of migraine attacks with menstruation. Methods aiming to exclude spurious associations are wanted, so that further research into these mechanisms can be performed on a population with a true association. The statistical method is based on a simple two-parameter null model of MRM (which allows for simulation modeling), and Fisher's exact test (with mid-p correction) applied to standard 2 × 2 contingency tables derived from the patients' headache diaries. Our method is a corrected version of a previously published flawed framework. To our best knowledge, no other published methods for establishing a menstruation-migraine association by statistical means exist today. The probabilistic methodology shows good performance when subjected to receiver operator characteristic curve analysis. Quick reference cutoff values for the clinical setting were tabulated for assessing association given a patient's headache history. In this paper, we correct a proposed method for establishing association between menstruation and migraine by statistical methods. We conclude that the proposed standard of 3-cycle observations prior to setting an MRM diagnosis should be extended with at least one perimenstrual window to obtain sufficient information for statistical processing. © 2014 American Headache Society.

  8. Density-based empirical likelihood procedures for testing symmetry of data distributions and K-sample comparisons.

    PubMed

    Vexler, Albert; Tanajian, Hovig; Hutson, Alan D

    In practice, parametric likelihood-ratio techniques are powerful statistical tools. In this article, we propose and examine novel and simple distribution-free test statistics that efficiently approximate parametric likelihood ratios to analyze and compare distributions of K groups of observations. Using the density-based empirical likelihood methodology, we develop a Stata package that applies to a test for symmetry of data distributions and compares K -sample distributions. Recognizing that recent statistical software packages do not sufficiently address K -sample nonparametric comparisons of data distributions, we propose a new Stata command, vxdbel, to execute exact density-based empirical likelihood-ratio tests using K samples. To calculate p -values of the proposed tests, we use the following methods: 1) a classical technique based on Monte Carlo p -value evaluations; 2) an interpolation technique based on tabulated critical values; and 3) a new hybrid technique that combines methods 1 and 2. The third, cutting-edge method is shown to be very efficient in the context of exact-test p -value computations. This Bayesian-type method considers tabulated critical values as prior information and Monte Carlo generations of test statistic values as data used to depict the likelihood function. In this case, a nonparametric Bayesian method is proposed to compute critical values of exact tests.

  9. Noise removing in encrypted color images by statistical analysis

    NASA Astrophysics Data System (ADS)

    Islam, N.; Puech, W.

    2012-03-01

    Cryptographic techniques are used to secure confidential data from unauthorized access but these techniques are very sensitive to noise. A single bit change in encrypted data can have catastrophic impact over the decrypted data. This paper addresses the problem of removing bit error in visual data which are encrypted using AES algorithm in the CBC mode. In order to remove the noise, a method is proposed which is based on the statistical analysis of each block during the decryption. The proposed method exploits local statistics of the visual data and confusion/diffusion properties of the encryption algorithm to remove the errors. Experimental results show that the proposed method can be used at the receiving end for the possible solution for noise removing in visual data in encrypted domain.

  10. A Fast Framework for Abrupt Change Detection Based on Binary Search Trees and Kolmogorov Statistic

    PubMed Central

    Qi, Jin-Peng; Qi, Jie; Zhang, Qing

    2016-01-01

    Change-Point (CP) detection has attracted considerable attention in the fields of data mining and statistics; it is very meaningful to discuss how to quickly and efficiently detect abrupt change from large-scale bioelectric signals. Currently, most of the existing methods, like Kolmogorov-Smirnov (KS) statistic and so forth, are time-consuming, especially for large-scale datasets. In this paper, we propose a fast framework for abrupt change detection based on binary search trees (BSTs) and a modified KS statistic, named BSTKS (binary search trees and Kolmogorov statistic). In this method, first, two binary search trees, termed as BSTcA and BSTcD, are constructed by multilevel Haar Wavelet Transform (HWT); second, three search criteria are introduced in terms of the statistic and variance fluctuations in the diagnosed time series; last, an optimal search path is detected from the root to leaf nodes of two BSTs. The studies on both the synthetic time series samples and the real electroencephalograph (EEG) recordings indicate that the proposed BSTKS can detect abrupt change more quickly and efficiently than KS, t-statistic (t), and Singular-Spectrum Analyses (SSA) methods, with the shortest computation time, the highest hit rate, the smallest error, and the highest accuracy out of four methods. This study suggests that the proposed BSTKS is very helpful for useful information inspection on all kinds of bioelectric time series signals. PMID:27413364

  11. A Fast Framework for Abrupt Change Detection Based on Binary Search Trees and Kolmogorov Statistic.

    PubMed

    Qi, Jin-Peng; Qi, Jie; Zhang, Qing

    2016-01-01

    Change-Point (CP) detection has attracted considerable attention in the fields of data mining and statistics; it is very meaningful to discuss how to quickly and efficiently detect abrupt change from large-scale bioelectric signals. Currently, most of the existing methods, like Kolmogorov-Smirnov (KS) statistic and so forth, are time-consuming, especially for large-scale datasets. In this paper, we propose a fast framework for abrupt change detection based on binary search trees (BSTs) and a modified KS statistic, named BSTKS (binary search trees and Kolmogorov statistic). In this method, first, two binary search trees, termed as BSTcA and BSTcD, are constructed by multilevel Haar Wavelet Transform (HWT); second, three search criteria are introduced in terms of the statistic and variance fluctuations in the diagnosed time series; last, an optimal search path is detected from the root to leaf nodes of two BSTs. The studies on both the synthetic time series samples and the real electroencephalograph (EEG) recordings indicate that the proposed BSTKS can detect abrupt change more quickly and efficiently than KS, t-statistic (t), and Singular-Spectrum Analyses (SSA) methods, with the shortest computation time, the highest hit rate, the smallest error, and the highest accuracy out of four methods. This study suggests that the proposed BSTKS is very helpful for useful information inspection on all kinds of bioelectric time series signals.

  12. Cluster mass inference via random field theory.

    PubMed

    Zhang, Hui; Nichols, Thomas E; Johnson, Timothy D

    2009-01-01

    Cluster extent and voxel intensity are two widely used statistics in neuroimaging inference. Cluster extent is sensitive to spatially extended signals while voxel intensity is better for intense but focal signals. In order to leverage strength from both statistics, several nonparametric permutation methods have been proposed to combine the two methods. Simulation studies have shown that of the different cluster permutation methods, the cluster mass statistic is generally the best. However, to date, there is no parametric cluster mass inference available. In this paper, we propose a cluster mass inference method based on random field theory (RFT). We develop this method for Gaussian images, evaluate it on Gaussian and Gaussianized t-statistic images and investigate its statistical properties via simulation studies and real data. Simulation results show that the method is valid under the null hypothesis and demonstrate that it can be more powerful than the cluster extent inference method. Further, analyses with a single subject and a group fMRI dataset demonstrate better power than traditional cluster size inference, and good accuracy relative to a gold-standard permutation test.

  13. Nakagami-based total variation method for speckle reduction in thyroid ultrasound images.

    PubMed

    Koundal, Deepika; Gupta, Savita; Singh, Sukhwinder

    2016-02-01

    A good statistical model is necessary for the reduction in speckle noise. The Nakagami model is more general than the Rayleigh distribution for statistical modeling of speckle in ultrasound images. In this article, the Nakagami-based noise removal method is presented to enhance thyroid ultrasound images and to improve clinical diagnosis. The statistics of log-compressed image are derived from the Nakagami distribution following a maximum a posteriori estimation framework. The minimization problem is solved by optimizing an augmented Lagrange and Chambolle's projection method. The proposed method is evaluated on both artificial speckle-simulated and real ultrasound images. The experimental findings reveal the superiority of the proposed method both quantitatively and qualitatively in comparison with other speckle reduction methods reported in the literature. The proposed method yields an average signal-to-noise ratio gain of more than 2.16 dB over the non-convex regularizer-based speckle noise removal method, 3.83 dB over the Aubert-Aujol model, 1.71 dB over the Shi-Osher model and 3.21 dB over the Rudin-Lions-Osher model on speckle-simulated synthetic images. Furthermore, visual evaluation of the despeckled images shows that the proposed method suppresses speckle noise well while preserving the textures and fine details. © IMechE 2015.

  14. Weather related continuity and completeness on Deep Space Ka-band links: statistics and forecasting

    NASA Technical Reports Server (NTRS)

    Shambayati, Shervin

    2006-01-01

    In this paper the concept of link 'stability' as means of measuring the continuity of the link is introduced and through it, along with the distributions of 'good' periods and 'bad' periods, the performance of the proposed Ka-band link design method using both forecasting and long-term statistics has been analyzed. The results indicate that the proposed link design method has relatively good continuity and completeness characteristics even when only long-term statistics are used and that the continuity performance further improves when forecasting is employed. .

  15. An efficient genome-wide association test for mixed binary and continuous phenotypes with applications to substance abuse research.

    PubMed

    Buu, Anne; Williams, L Keoki; Yang, James J

    2018-03-01

    We propose a new genome-wide association test for mixed binary and continuous phenotypes that uses an efficient numerical method to estimate the empirical distribution of the Fisher's combination statistic under the null hypothesis. Our simulation study shows that the proposed method controls the type I error rate and also maintains its power at the level of the permutation method. More importantly, the computational efficiency of the proposed method is much higher than the one of the permutation method. The simulation results also indicate that the power of the test increases when the genetic effect increases, the minor allele frequency increases, and the correlation between responses decreases. The statistical analysis on the database of the Study of Addiction: Genetics and Environment demonstrates that the proposed method combining multiple phenotypes can increase the power of identifying markers that may not be, otherwise, chosen using marginal tests.

  16. a Method of Time-Series Change Detection Using Full Polsar Images from Different Sensors

    NASA Astrophysics Data System (ADS)

    Liu, W.; Yang, J.; Zhao, J.; Shi, H.; Yang, L.

    2018-04-01

    Most of the existing change detection methods using full polarimetric synthetic aperture radar (PolSAR) are limited to detecting change between two points in time. In this paper, a novel method was proposed to detect the change based on time-series data from different sensors. Firstly, the overall difference image of a time-series PolSAR was calculated by ominous statistic test. Secondly, difference images between any two images in different times ware acquired by Rj statistic test. Generalized Gaussian mixture model (GGMM) was used to obtain time-series change detection maps in the last step for the proposed method. To verify the effectiveness of the proposed method, we carried out the experiment of change detection by using the time-series PolSAR images acquired by Radarsat-2 and Gaofen-3 over the city of Wuhan, in China. Results show that the proposed method can detect the time-series change from different sensors.

  17. Testing the Difference of Correlated Agreement Coefficients for Statistical Significance

    ERIC Educational Resources Information Center

    Gwet, Kilem L.

    2016-01-01

    This article addresses the problem of testing the difference between two correlated agreement coefficients for statistical significance. A number of authors have proposed methods for testing the difference between two correlated kappa coefficients, which require either the use of resampling methods or the use of advanced statistical modeling…

  18. Statistical methods to detect novel genetic variants using publicly available GWAS summary data.

    PubMed

    Guo, Bin; Wu, Baolin

    2018-03-01

    We propose statistical methods to detect novel genetic variants using only genome-wide association studies (GWAS) summary data without access to raw genotype and phenotype data. With more and more summary data being posted for public access in the post GWAS era, the proposed methods are practically very useful to identify additional interesting genetic variants and shed lights on the underlying disease mechanism. We illustrate the utility of our proposed methods with application to GWAS meta-analysis results of fasting glucose from the international MAGIC consortium. We found several novel genome-wide significant loci that are worth further study. Copyright © 2018 Elsevier Ltd. All rights reserved.

  19. Using protein-protein interactions for refining gene networks estimated from microarray data by Bayesian networks.

    PubMed

    Nariai, N; Kim, S; Imoto, S; Miyano, S

    2004-01-01

    We propose a statistical method to estimate gene networks from DNA microarray data and protein-protein interactions. Because physical interactions between proteins or multiprotein complexes are likely to regulate biological processes, using only mRNA expression data is not sufficient for estimating a gene network accurately. Our method adds knowledge about protein-protein interactions to the estimation method of gene networks under a Bayesian statistical framework. In the estimated gene network, a protein complex is modeled as a virtual node based on principal component analysis. We show the effectiveness of the proposed method through the analysis of Saccharomyces cerevisiae cell cycle data. The proposed method improves the accuracy of the estimated gene networks, and successfully identifies some biological facts.

  20. A REVIEW OF STATISTICAL METHODS FOR THE METEOROLOGICAL ADJUSTMENT OF TROPOSPHERIC OZONE

    EPA Science Inventory

    A variety of statistical methods for meteorological adjustment of ozone have been proposed in the literature over the last decade for purposes of forecasting, estimating ozone time trends, or investigating underlying mechanisms from an empirical perspective. The methods can be...

  1. Blind image quality assessment based on aesthetic and statistical quality-aware features

    NASA Astrophysics Data System (ADS)

    Jenadeleh, Mohsen; Masaeli, Mohammad Masood; Moghaddam, Mohsen Ebrahimi

    2017-07-01

    The main goal of image quality assessment (IQA) methods is the emulation of human perceptual image quality judgments. Therefore, the correlation between objective scores of these methods with human perceptual scores is considered as their performance metric. Human judgment of the image quality implicitly includes many factors when assessing perceptual image qualities such as aesthetics, semantics, context, and various types of visual distortions. The main idea of this paper is to use a host of features that are commonly employed in image aesthetics assessment in order to improve blind image quality assessment (BIQA) methods accuracy. We propose an approach that enriches the features of BIQA methods by integrating a host of aesthetics image features with the features of natural image statistics derived from multiple domains. The proposed features have been used for augmenting five different state-of-the-art BIQA methods, which use statistical natural scene statistics features. Experiments were performed on seven benchmark image quality databases. The experimental results showed significant improvement of the accuracy of the methods.

  2. A Statistical Method of Evaluating the Pronunciation Proficiency/Intelligibility of English Presentations by Japanese Speakers

    ERIC Educational Resources Information Center

    Kibishi, Hiroshi; Hirabayashi, Kuniaki; Nakagawa, Seiichi

    2015-01-01

    In this paper, we propose a statistical evaluation method of pronunciation proficiency and intelligibility for presentations made in English by native Japanese speakers. We statistically analyzed the actual utterances of speakers to find combinations of acoustic and linguistic features with high correlation between the scores estimated by the…

  3. A feature refinement approach for statistical interior CT reconstruction

    NASA Astrophysics Data System (ADS)

    Hu, Zhanli; Zhang, Yunwan; Liu, Jianbo; Ma, Jianhua; Zheng, Hairong; Liang, Dong

    2016-07-01

    Interior tomography is clinically desired to reduce the radiation dose rendered to patients. In this work, a new statistical interior tomography approach for computed tomography is proposed. The developed design focuses on taking into account the statistical nature of local projection data and recovering fine structures which are lost in the conventional total-variation (TV)—minimization reconstruction. The proposed method falls within the compressed sensing framework of TV minimization, which only assumes that the interior ROI is piecewise constant or polynomial and does not need any additional prior knowledge. To integrate the statistical distribution property of projection data, the objective function is built under the criteria of penalized weighed least-square (PWLS-TV). In the implementation of the proposed method, the interior projection extrapolation based FBP reconstruction is first used as the initial guess to mitigate truncation artifacts and also provide an extended field-of-view. Moreover, an interior feature refinement step, as an important processing operation is performed after each iteration of PWLS-TV to recover the desired structure information which is lost during the TV minimization. Here, a feature descriptor is specifically designed and employed to distinguish structure from noise and noise-like artifacts. A modified steepest descent algorithm is adopted to minimize the associated objective function. The proposed method is applied to both digital phantom and in vivo Micro-CT datasets, and compared to FBP, ART-TV and PWLS-TV. The reconstruction results demonstrate that the proposed method performs better than other conventional methods in suppressing noise, reducing truncated and streak artifacts, and preserving features. The proposed approach demonstrates its potential usefulness for feature preservation of interior tomography under truncated projection measurements.

  4. A feature refinement approach for statistical interior CT reconstruction.

    PubMed

    Hu, Zhanli; Zhang, Yunwan; Liu, Jianbo; Ma, Jianhua; Zheng, Hairong; Liang, Dong

    2016-07-21

    Interior tomography is clinically desired to reduce the radiation dose rendered to patients. In this work, a new statistical interior tomography approach for computed tomography is proposed. The developed design focuses on taking into account the statistical nature of local projection data and recovering fine structures which are lost in the conventional total-variation (TV)-minimization reconstruction. The proposed method falls within the compressed sensing framework of TV minimization, which only assumes that the interior ROI is piecewise constant or polynomial and does not need any additional prior knowledge. To integrate the statistical distribution property of projection data, the objective function is built under the criteria of penalized weighed least-square (PWLS-TV). In the implementation of the proposed method, the interior projection extrapolation based FBP reconstruction is first used as the initial guess to mitigate truncation artifacts and also provide an extended field-of-view. Moreover, an interior feature refinement step, as an important processing operation is performed after each iteration of PWLS-TV to recover the desired structure information which is lost during the TV minimization. Here, a feature descriptor is specifically designed and employed to distinguish structure from noise and noise-like artifacts. A modified steepest descent algorithm is adopted to minimize the associated objective function. The proposed method is applied to both digital phantom and in vivo Micro-CT datasets, and compared to FBP, ART-TV and PWLS-TV. The reconstruction results demonstrate that the proposed method performs better than other conventional methods in suppressing noise, reducing truncated and streak artifacts, and preserving features. The proposed approach demonstrates its potential usefulness for feature preservation of interior tomography under truncated projection measurements.

  5. Quality evaluation of no-reference MR images using multidirectional filters and image statistics.

    PubMed

    Jang, Jinseong; Bang, Kihun; Jang, Hanbyol; Hwang, Dosik

    2018-09-01

    This study aimed to develop a fully automatic, no-reference image-quality assessment (IQA) method for MR images. New quality-aware features were obtained by applying multidirectional filters to MR images and examining the feature statistics. A histogram of these features was then fitted to a generalized Gaussian distribution function for which the shape parameters yielded different values depending on the type of distortion in the MR image. Standard feature statistics were established through a training process based on high-quality MR images without distortion. Subsequently, the feature statistics of a test MR image were calculated and compared with the standards. The quality score was calculated as the difference between the shape parameters of the test image and the undistorted standard images. The proposed IQA method showed a >0.99 correlation with the conventional full-reference assessment methods; accordingly, this proposed method yielded the best performance among no-reference IQA methods for images containing six types of synthetic, MR-specific distortions. In addition, for authentically distorted images, the proposed method yielded the highest correlation with subjective assessments by human observers, thus demonstrating its superior performance over other no-reference IQAs. Our proposed IQA was designed to consider MR-specific features and outperformed other no-reference IQAs designed mainly for photographic images. Magn Reson Med 80:914-924, 2018. © 2018 International Society for Magnetic Resonance in Medicine. © 2018 International Society for Magnetic Resonance in Medicine.

  6. Time-of-Flight Measurements as a Possible Method to Observe Anyonic Statistics

    NASA Astrophysics Data System (ADS)

    Umucalılar, R. O.; Macaluso, E.; Comparin, T.; Carusotto, I.

    2018-06-01

    We propose a standard time-of-flight experiment as a method for observing the anyonic statistics of quasiholes in a fractional quantum Hall state of ultracold atoms. The quasihole states can be stably prepared by pinning the quasiholes with localized potentials and a measurement of the mean square radius of the freely expanding cloud, which is related to the average total angular momentum of the initial state, offers direct signatures of the statistical phase. Our proposed method is validated by Monte Carlo calculations for ν =1 /2 and 1 /3 fractional quantum Hall liquids containing a realistic number of particles. Extensions to quantum Hall liquids of light and to non-Abelian anyons are briefly discussed.

  7. A fast algorithm for determining bounds and accurate approximate p-values of the rank product statistic for replicate experiments.

    PubMed

    Heskes, Tom; Eisinga, Rob; Breitling, Rainer

    2014-11-21

    The rank product method is a powerful statistical technique for identifying differentially expressed molecules in replicated experiments. A critical issue in molecule selection is accurate calculation of the p-value of the rank product statistic to adequately address multiple testing. Both exact calculation and permutation and gamma approximations have been proposed to determine molecule-level significance. These current approaches have serious drawbacks as they are either computationally burdensome or provide inaccurate estimates in the tail of the p-value distribution. We derive strict lower and upper bounds to the exact p-value along with an accurate approximation that can be used to assess the significance of the rank product statistic in a computationally fast manner. The bounds and the proposed approximation are shown to provide far better accuracy over existing approximate methods in determining tail probabilities, with the slightly conservative upper bound protecting against false positives. We illustrate the proposed method in the context of a recently published analysis on transcriptomic profiling performed in blood. We provide a method to determine upper bounds and accurate approximate p-values of the rank product statistic. The proposed algorithm provides an order of magnitude increase in throughput as compared with current approaches and offers the opportunity to explore new application domains with even larger multiple testing issue. The R code is published in one of the Additional files and is available at http://www.ru.nl/publish/pages/726696/rankprodbounds.zip .

  8. [Road Extraction in Remote Sensing Images Based on Spectral and Edge Analysis].

    PubMed

    Zhao, Wen-zhi; Luo, Li-qun; Guo, Zhou; Yue, Jun; Yu, Xue-ying; Liu, Hui; Wei, Jing

    2015-10-01

    Roads are typically man-made objects in urban areas. Road extraction from high-resolution images has important applications for urban planning and transportation development. However, due to the confusion of spectral characteristic, it is difficult to distinguish roads from other objects by merely using traditional classification methods that mainly depend on spectral information. Edge is an important feature for the identification of linear objects (e. g. , roads). The distribution patterns of edges vary greatly among different objects. It is crucial to merge edge statistical information into spectral ones. In this study, a new method that combines spectral information and edge statistical features has been proposed. First, edge detection is conducted by using self-adaptive mean-shift algorithm on the panchromatic band, which can greatly reduce pseudo-edges and noise effects. Then, edge statistical features are obtained from the edge statistical model, which measures the length and angle distribution of edges. Finally, by integrating the spectral and edge statistical features, SVM algorithm is used to classify the image and roads are ultimately extracted. A series of experiments are conducted and the results show that the overall accuracy of proposed method is 93% comparing with only 78% overall accuracy of the traditional. The results demonstrate that the proposed method is efficient and valuable for road extraction, especially on high-resolution images.

  9. Statistical and Machine Learning forecasting methods: Concerns and ways forward

    PubMed Central

    Makridakis, Spyros; Assimakopoulos, Vassilios

    2018-01-01

    Machine Learning (ML) methods have been proposed in the academic literature as alternatives to statistical ones for time series forecasting. Yet, scant evidence is available about their relative performance in terms of accuracy and computational requirements. The purpose of this paper is to evaluate such performance across multiple forecasting horizons using a large subset of 1045 monthly time series used in the M3 Competition. After comparing the post-sample accuracy of popular ML methods with that of eight traditional statistical ones, we found that the former are dominated across both accuracy measures used and for all forecasting horizons examined. Moreover, we observed that their computational requirements are considerably greater than those of statistical methods. The paper discusses the results, explains why the accuracy of ML models is below that of statistical ones and proposes some possible ways forward. The empirical results found in our research stress the need for objective and unbiased ways to test the performance of forecasting methods that can be achieved through sizable and open competitions allowing meaningful comparisons and definite conclusions. PMID:29584784

  10. Testing high SPF sunscreens: a demonstration of the accuracy and reproducibility of the results of testing high SPF formulations by two methods and at different testing sites.

    PubMed

    Agin, Patricia Poh; Edmonds, Susan H

    2002-08-01

    The goals of this study were (i) to demonstrate that existing and widely used sun protection factor (SPF) test methodologies can produce accurate and reproducible results for high SPF formulations and (ii) to provide data on the number of test-subjects needed, the variability of the data, and the appropriate exposure increments needed for testing high SPF formulations. Three high SPF formulations were tested, according to the Food and Drug Administration's (FDA) 1993 tentative final monograph (TFM) 'very water resistant' test method and/or the 1978 proposed monograph 'waterproof' test method, within one laboratory. A fourth high SPF formulation was tested at four independent SPF testing laboratories, using the 1978 waterproof SPF test method. All laboratories utilized xenon arc solar simulators. The data illustrate that the testing conducted within one laboratory, following either the 1978 proposed or the 1993 TFM SPF test method, was able to reproducibly determine the SPFs of the formulations tested, using either the statistical analysis method in the proposed monograph or the statistical method described in the TFM. When one formulation was tested at four different laboratories, the anticipated variation in the data owing to the equipment and other operational differences was minimized through the use of the statistical method described in the 1993 monograph. The data illustrate that either the 1978 proposed monograph SPF test method or the 1993 TFM SPF test method can provide accurate and reproducible results for high SPF formulations. Further, these results can be achieved with panels of 20-25 subjects with an acceptable level of variability. Utilization of the statistical controls from the 1993 sunscreen monograph can help to minimize lab-to-lab variability for well-formulated products.

  11. Across-cohort QC analyses of GWAS summary statistics from complex traits.

    PubMed

    Chen, Guo-Bo; Lee, Sang Hong; Robinson, Matthew R; Trzaskowski, Maciej; Zhu, Zhi-Xiang; Winkler, Thomas W; Day, Felix R; Croteau-Chonka, Damien C; Wood, Andrew R; Locke, Adam E; Kutalik, Zoltán; Loos, Ruth J F; Frayling, Timothy M; Hirschhorn, Joel N; Yang, Jian; Wray, Naomi R; Visscher, Peter M

    2016-01-01

    Genome-wide association studies (GWASs) have been successful in discovering SNP trait associations for many quantitative traits and common diseases. Typically, the effect sizes of SNP alleles are very small and this requires large genome-wide association meta-analyses (GWAMAs) to maximize statistical power. A trend towards ever-larger GWAMA is likely to continue, yet dealing with summary statistics from hundreds of cohorts increases logistical and quality control problems, including unknown sample overlap, and these can lead to both false positive and false negative findings. In this study, we propose four metrics and visualization tools for GWAMA, using summary statistics from cohort-level GWASs. We propose methods to examine the concordance between demographic information, and summary statistics and methods to investigate sample overlap. (I) We use the population genetics F st statistic to verify the genetic origin of each cohort and their geographic location, and demonstrate using GWAMA data from the GIANT Consortium that geographic locations of cohorts can be recovered and outlier cohorts can be detected. (II) We conduct principal component analysis based on reported allele frequencies, and are able to recover the ancestral information for each cohort. (III) We propose a new statistic that uses the reported allelic effect sizes and their standard errors to identify significant sample overlap or heterogeneity between pairs of cohorts. (IV) To quantify unknown sample overlap across all pairs of cohorts, we propose a method that uses randomly generated genetic predictors that does not require the sharing of individual-level genotype data and does not breach individual privacy.

  12. Across-cohort QC analyses of GWAS summary statistics from complex traits

    PubMed Central

    Chen, Guo-Bo; Lee, Sang Hong; Robinson, Matthew R; Trzaskowski, Maciej; Zhu, Zhi-Xiang; Winkler, Thomas W; Day, Felix R; Croteau-Chonka, Damien C; Wood, Andrew R; Locke, Adam E; Kutalik, Zoltán; Loos, Ruth J F; Frayling, Timothy M; Hirschhorn, Joel N; Yang, Jian; Wray, Naomi R; Visscher, Peter M

    2017-01-01

    Genome-wide association studies (GWASs) have been successful in discovering SNP trait associations for many quantitative traits and common diseases. Typically, the effect sizes of SNP alleles are very small and this requires large genome-wide association meta-analyses (GWAMAs) to maximize statistical power. A trend towards ever-larger GWAMA is likely to continue, yet dealing with summary statistics from hundreds of cohorts increases logistical and quality control problems, including unknown sample overlap, and these can lead to both false positive and false negative findings. In this study, we propose four metrics and visualization tools for GWAMA, using summary statistics from cohort-level GWASs. We propose methods to examine the concordance between demographic information, and summary statistics and methods to investigate sample overlap. (I) We use the population genetics Fst statistic to verify the genetic origin of each cohort and their geographic location, and demonstrate using GWAMA data from the GIANT Consortium that geographic locations of cohorts can be recovered and outlier cohorts can be detected. (II) We conduct principal component analysis based on reported allele frequencies, and are able to recover the ancestral information for each cohort. (III) We propose a new statistic that uses the reported allelic effect sizes and their standard errors to identify significant sample overlap or heterogeneity between pairs of cohorts. (IV) To quantify unknown sample overlap across all pairs of cohorts, we propose a method that uses randomly generated genetic predictors that does not require the sharing of individual-level genotype data and does not breach individual privacy. PMID:27552965

  13. An Unsupervised Change Detection Method Using Time-Series of PolSAR Images from Radarsat-2 and GaoFen-3.

    PubMed

    Liu, Wensong; Yang, Jie; Zhao, Jinqi; Shi, Hongtao; Yang, Le

    2018-02-12

    The traditional unsupervised change detection methods based on the pixel level can only detect the changes between two different times with same sensor, and the results are easily affected by speckle noise. In this paper, a novel method is proposed to detect change based on time-series data from different sensors. Firstly, the overall difference image of the time-series PolSAR is calculated by omnibus test statistics, and difference images between any two images in different times are acquired by R j test statistics. Secondly, the difference images are segmented with a Generalized Statistical Region Merging (GSRM) algorithm which can suppress the effect of speckle noise. Generalized Gaussian Mixture Model (GGMM) is then used to obtain the time-series change detection maps in the final step of the proposed method. To verify the effectiveness of the proposed method, we carried out the experiment of change detection using time-series PolSAR images acquired by Radarsat-2 and Gaofen-3 over the city of Wuhan, in China. Results show that the proposed method can not only detect the time-series change from different sensors, but it can also better suppress the influence of speckle noise and improve the overall accuracy and Kappa coefficient.

  14. Research of facial feature extraction based on MMC

    NASA Astrophysics Data System (ADS)

    Xue, Donglin; Zhao, Jiufen; Tang, Qinhong; Shi, Shaokun

    2017-07-01

    Based on the maximum margin criterion (MMC), a new algorithm of statistically uncorrelated optimal discriminant vectors and a new algorithm of orthogonal optimal discriminant vectors for feature extraction were proposed. The purpose of the maximum margin criterion is to maximize the inter-class scatter while simultaneously minimizing the intra-class scatter after the projection. Compared with original MMC method and principal component analysis (PCA) method, the proposed methods are better in terms of reducing or eliminating the statistically correlation between features and improving recognition rate. The experiment results on Olivetti Research Laboratory (ORL) face database shows that the new feature extraction method of statistically uncorrelated maximum margin criterion (SUMMC) are better in terms of recognition rate and stability. Besides, the relations between maximum margin criterion and Fisher criterion for feature extraction were revealed.

  15. Multiple Phenotype Association Tests Using Summary Statistics in Genome-Wide Association Studies

    PubMed Central

    Liu, Zhonghua; Lin, Xihong

    2017-01-01

    Summary We study in this paper jointly testing the associations of a genetic variant with correlated multiple phenotypes using the summary statistics of individual phenotype analysis from Genome-Wide Association Studies (GWASs). We estimated the between-phenotype correlation matrix using the summary statistics of individual phenotype GWAS analyses, and developed genetic association tests for multiple phenotypes by accounting for between-phenotype correlation without the need to access individual-level data. Since genetic variants often affect multiple phenotypes differently across the genome and the between-phenotype correlation can be arbitrary, we proposed robust and powerful multiple phenotype testing procedures by jointly testing a common mean and a variance component in linear mixed models for summary statistics. We computed the p-values of the proposed tests analytically. This computational advantage makes our methods practically appealing in large-scale GWASs. We performed simulation studies to show that the proposed tests maintained correct type I error rates, and to compare their powers in various settings with the existing methods. We applied the proposed tests to a GWAS Global Lipids Genetics Consortium summary statistics data set and identified additional genetic variants that were missed by the original single-trait analysis. PMID:28653391

  16. Multiple phenotype association tests using summary statistics in genome-wide association studies.

    PubMed

    Liu, Zhonghua; Lin, Xihong

    2018-03-01

    We study in this article jointly testing the associations of a genetic variant with correlated multiple phenotypes using the summary statistics of individual phenotype analysis from Genome-Wide Association Studies (GWASs). We estimated the between-phenotype correlation matrix using the summary statistics of individual phenotype GWAS analyses, and developed genetic association tests for multiple phenotypes by accounting for between-phenotype correlation without the need to access individual-level data. Since genetic variants often affect multiple phenotypes differently across the genome and the between-phenotype correlation can be arbitrary, we proposed robust and powerful multiple phenotype testing procedures by jointly testing a common mean and a variance component in linear mixed models for summary statistics. We computed the p-values of the proposed tests analytically. This computational advantage makes our methods practically appealing in large-scale GWASs. We performed simulation studies to show that the proposed tests maintained correct type I error rates, and to compare their powers in various settings with the existing methods. We applied the proposed tests to a GWAS Global Lipids Genetics Consortium summary statistics data set and identified additional genetic variants that were missed by the original single-trait analysis. © 2017, The International Biometric Society.

  17. Using the Bootstrap Method for a Statistical Significance Test of Differences between Summary Histograms

    NASA Technical Reports Server (NTRS)

    Xu, Kuan-Man

    2006-01-01

    A new method is proposed to compare statistical differences between summary histograms, which are the histograms summed over a large ensemble of individual histograms. It consists of choosing a distance statistic for measuring the difference between summary histograms and using a bootstrap procedure to calculate the statistical significance level. Bootstrapping is an approach to statistical inference that makes few assumptions about the underlying probability distribution that describes the data. Three distance statistics are compared in this study. They are the Euclidean distance, the Jeffries-Matusita distance and the Kuiper distance. The data used in testing the bootstrap method are satellite measurements of cloud systems called cloud objects. Each cloud object is defined as a contiguous region/patch composed of individual footprints or fields of view. A histogram of measured values over footprints is generated for each parameter of each cloud object and then summary histograms are accumulated over all individual histograms in a given cloud-object size category. The results of statistical hypothesis tests using all three distances as test statistics are generally similar, indicating the validity of the proposed method. The Euclidean distance is determined to be most suitable after comparing the statistical tests of several parameters with distinct probability distributions among three cloud-object size categories. Impacts on the statistical significance levels resulting from differences in the total lengths of satellite footprint data between two size categories are also discussed.

  18. Design and analysis of multiple diseases genome-wide association studies without controls.

    PubMed

    Chen, Zhongxue; Huang, Hanwen; Ng, Hon Keung Tony

    2012-11-15

    In genome-wide association studies (GWAS), multiple diseases with shared controls is one of the case-control study designs. If data obtained from these studies are appropriately analyzed, this design can have several advantages such as improving statistical power in detecting associations and reducing the time and cost in the data collection process. In this paper, we propose a study design for GWAS which involves multiple diseases but without controls. We also propose corresponding statistical data analysis strategy for GWAS with multiple diseases but no controls. Through a simulation study, we show that the statistical association test with the proposed study design is more powerful than the test with single disease sharing common controls, and it has comparable power to the overall test based on the whole dataset including the controls. We also apply the proposed method to a real GWAS dataset to illustrate the methodologies and the advantages of the proposed design. Some possible limitations of this study design and testing method and their solutions are also discussed. Our findings indicate that the proposed study design and statistical analysis strategy could be more efficient than the usual case-control GWAS as well as those with shared controls. Copyright © 2012 Elsevier B.V. All rights reserved.

  19. DISTMIX: direct imputation of summary statistics for unmeasured SNPs from mixed ethnicity cohorts.

    PubMed

    Lee, Donghyung; Bigdeli, T Bernard; Williamson, Vernell S; Vladimirov, Vladimir I; Riley, Brien P; Fanous, Ayman H; Bacanu, Silviu-Alin

    2015-10-01

    To increase the signal resolution for large-scale meta-analyses of genome-wide association studies, genotypes at unmeasured single nucleotide polymorphisms (SNPs) are commonly imputed using large multi-ethnic reference panels. However, the ever increasing size and ethnic diversity of both reference panels and cohorts makes genotype imputation computationally challenging for moderately sized computer clusters. Moreover, genotype imputation requires subject-level genetic data, which unlike summary statistics provided by virtually all studies, is not publicly available. While there are much less demanding methods which avoid the genotype imputation step by directly imputing SNP statistics, e.g. Directly Imputing summary STatistics (DIST) proposed by our group, their implicit assumptions make them applicable only to ethnically homogeneous cohorts. To decrease computational and access requirements for the analysis of cosmopolitan cohorts, we propose DISTMIX, which extends DIST capabilities to the analysis of mixed ethnicity cohorts. The method uses a relevant reference panel to directly impute unmeasured SNP statistics based only on statistics at measured SNPs and estimated/user-specified ethnic proportions. Simulations show that the proposed method adequately controls the Type I error rates. The 1000 Genomes panel imputation of summary statistics from the ethnically diverse Psychiatric Genetic Consortium Schizophrenia Phase 2 suggests that, when compared to genotype imputation methods, DISTMIX offers comparable imputation accuracy for only a fraction of computational resources. DISTMIX software, its reference population data, and usage examples are publicly available at http://code.google.com/p/distmix. dlee4@vcu.edu Supplementary Data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press.

  20. A flexibly shaped space-time scan statistic for disease outbreak detection and monitoring.

    PubMed

    Takahashi, Kunihiko; Kulldorff, Martin; Tango, Toshiro; Yih, Katherine

    2008-04-11

    Early detection of disease outbreaks enables public health officials to implement disease control and prevention measures at the earliest possible time. A time periodic geographical disease surveillance system based on a cylindrical space-time scan statistic has been used extensively for disease surveillance along with the SaTScan software. In the purely spatial setting, many different methods have been proposed to detect spatial disease clusters. In particular, some spatial scan statistics are aimed at detecting irregularly shaped clusters which may not be detected by the circular spatial scan statistic. Based on the flexible purely spatial scan statistic, we propose a flexibly shaped space-time scan statistic for early detection of disease outbreaks. The performance of the proposed space-time scan statistic is compared with that of the cylindrical scan statistic using benchmark data. In order to compare their performances, we have developed a space-time power distribution by extending the purely spatial bivariate power distribution. Daily syndromic surveillance data in Massachusetts, USA, are used to illustrate the proposed test statistic. The flexible space-time scan statistic is well suited for detecting and monitoring disease outbreaks in irregularly shaped areas.

  1. The Profile of Creativity and Proposing Statistical Problem Quality Level Reviewed From Cognitive Style

    NASA Astrophysics Data System (ADS)

    Awi; Ahmar, A. S.; Rahman, A.; Minggi, I.; Mulbar, U.; Asdar; Ruslan; Upu, H.; Alimuddin; Hamda; Rosidah; Sutamrin; Tiro, M. A.; Rusli

    2018-01-01

    This research aims to reveal the profile about the level of creativity and the ability to propose statistical problem of students at Mathematics Education 2014 Batch in the State University of Makassar in terms of their cognitive style. This research uses explorative qualitative method by giving meta-cognitive scaffolding at the time of research. The hypothesis of research is that students who have field independent (FI) cognitive style in statistics problem posing from the provided information already able to propose the statistical problem that can be solved and create new data and the problem is already been included as a high quality statistical problem, while students who have dependent cognitive field (FD) commonly are still limited in statistics problem posing that can be finished and do not load new data and the problem is included as medium quality statistical problem.

  2. Towards sound epistemological foundations of statistical methods for high-dimensional biology.

    PubMed

    Mehta, Tapan; Tanik, Murat; Allison, David B

    2004-09-01

    A sound epistemological foundation for biological inquiry comes, in part, from application of valid statistical procedures. This tenet is widely appreciated by scientists studying the new realm of high-dimensional biology, or 'omic' research, which involves multiplicity at unprecedented scales. Many papers aimed at the high-dimensional biology community describe the development or application of statistical techniques. The validity of many of these is questionable, and a shared understanding about the epistemological foundations of the statistical methods themselves seems to be lacking. Here we offer a framework in which the epistemological foundation of proposed statistical methods can be evaluated.

  3. A flexible spatial scan statistic with a restricted likelihood ratio for detecting disease clusters.

    PubMed

    Tango, Toshiro; Takahashi, Kunihiko

    2012-12-30

    Spatial scan statistics are widely used tools for detection of disease clusters. Especially, the circular spatial scan statistic proposed by Kulldorff (1997) has been utilized in a wide variety of epidemiological studies and disease surveillance. However, as it cannot detect noncircular, irregularly shaped clusters, many authors have proposed different spatial scan statistics, including the elliptic version of Kulldorff's scan statistic. The flexible spatial scan statistic proposed by Tango and Takahashi (2005) has also been used for detecting irregularly shaped clusters. However, this method sets a feasible limitation of a maximum of 30 nearest neighbors for searching candidate clusters because of heavy computational load. In this paper, we show a flexible spatial scan statistic implemented with a restricted likelihood ratio proposed by Tango (2008) to (1) eliminate the limitation of 30 nearest neighbors and (2) to have surprisingly much less computational time than the original flexible spatial scan statistic. As a side effect, it is shown to be able to detect clusters with any shape reasonably well as the relative risk of the cluster becomes large via Monte Carlo simulation. We illustrate the proposed spatial scan statistic with data on mortality from cerebrovascular disease in the Tokyo Metropolitan area, Japan. Copyright © 2012 John Wiley & Sons, Ltd.

  4. Polygenic scores via penalized regression on summary statistics.

    PubMed

    Mak, Timothy Shin Heng; Porsch, Robert Milan; Choi, Shing Wan; Zhou, Xueya; Sham, Pak Chung

    2017-09-01

    Polygenic scores (PGS) summarize the genetic contribution of a person's genotype to a disease or phenotype. They can be used to group participants into different risk categories for diseases, and are also used as covariates in epidemiological analyses. A number of possible ways of calculating PGS have been proposed, and recently there is much interest in methods that incorporate information available in published summary statistics. As there is no inherent information on linkage disequilibrium (LD) in summary statistics, a pertinent question is how we can use LD information available elsewhere to supplement such analyses. To answer this question, we propose a method for constructing PGS using summary statistics and a reference panel in a penalized regression framework, which we call lassosum. We also propose a general method for choosing the value of the tuning parameter in the absence of validation data. In our simulations, we showed that pseudovalidation often resulted in prediction accuracy that is comparable to using a dataset with validation phenotype and was clearly superior to the conservative option of setting the tuning parameter of lassosum to its lowest value. We also showed that lassosum achieved better prediction accuracy than simple clumping and P-value thresholding in almost all scenarios. It was also substantially faster and more accurate than the recently proposed LDpred. © 2017 WILEY PERIODICALS, INC.

  5. SEGMENTING CT PROSTATE IMAGES USING POPULATION AND PATIENT-SPECIFIC STATISTICS FOR RADIOTHERAPY.

    PubMed

    Feng, Qianjin; Foskey, Mark; Tang, Songyuan; Chen, Wufan; Shen, Dinggang

    2009-08-07

    This paper presents a new deformable model using both population and patient-specific statistics to segment the prostate from CT images. There are two novelties in the proposed method. First, a modified scale invariant feature transform (SIFT) local descriptor, which is more distinctive than general intensity and gradient features, is used to characterize the image features. Second, an online training approach is used to build the shape statistics for accurately capturing intra-patient variation, which is more important than inter-patient variation for prostate segmentation in clinical radiotherapy. Experimental results show that the proposed method is robust and accurate, suitable for clinical application.

  6. SEGMENTING CT PROSTATE IMAGES USING POPULATION AND PATIENT-SPECIFIC STATISTICS FOR RADIOTHERAPY

    PubMed Central

    Feng, Qianjin; Foskey, Mark; Tang, Songyuan; Chen, Wufan; Shen, Dinggang

    2010-01-01

    This paper presents a new deformable model using both population and patient-specific statistics to segment the prostate from CT images. There are two novelties in the proposed method. First, a modified scale invariant feature transform (SIFT) local descriptor, which is more distinctive than general intensity and gradient features, is used to characterize the image features. Second, an online training approach is used to build the shape statistics for accurately capturing intra-patient variation, which is more important than inter-patient variation for prostate segmentation in clinical radiotherapy. Experimental results show that the proposed method is robust and accurate, suitable for clinical application. PMID:21197416

  7. Identification of natural images and computer-generated graphics based on statistical and textural features.

    PubMed

    Peng, Fei; Li, Jiao-ting; Long, Min

    2015-03-01

    To discriminate the acquisition pipelines of digital images, a novel scheme for the identification of natural images and computer-generated graphics is proposed based on statistical and textural features. First, the differences between them are investigated from the view of statistics and texture, and 31 dimensions of feature are acquired for identification. Then, LIBSVM is used for the classification. Finally, the experimental results are presented. The results show that it can achieve an identification accuracy of 97.89% for computer-generated graphics, and an identification accuracy of 97.75% for natural images. The analyses also demonstrate the proposed method has excellent performance, compared with some existing methods based only on statistical features or other features. The method has a great potential to be implemented for the identification of natural images and computer-generated graphics. © 2014 American Academy of Forensic Sciences.

  8. Statistical Irreversible Thermodynamics in the Framework of Zubarev's Nonequilibrium Statistical Operator Method

    NASA Astrophysics Data System (ADS)

    Luzzi, R.; Vasconcellos, A. R.; Ramos, J. G.; Rodrigues, C. G.

    2018-01-01

    We describe the formalism of statistical irreversible thermodynamics constructed based on Zubarev's nonequilibrium statistical operator (NSO) method, which is a powerful and universal tool for investigating the most varied physical phenomena. We present brief overviews of the statistical ensemble formalism and statistical irreversible thermodynamics. The first can be constructed either based on a heuristic approach or in the framework of information theory in the Jeffreys-Jaynes scheme of scientific inference; Zubarev and his school used both approaches in formulating the NSO method. We describe the main characteristics of statistical irreversible thermodynamics and discuss some particular considerations of several authors. We briefly describe how Rosenfeld, Bohr, and Prigogine proposed to derive a thermodynamic uncertainty principle.

  9. A model and variance reduction method for computing statistical outputs of stochastic elliptic partial differential equations

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Vidal-Codina, F., E-mail: fvidal@mit.edu; Nguyen, N.C., E-mail: cuongng@mit.edu; Giles, M.B., E-mail: mike.giles@maths.ox.ac.uk

    We present a model and variance reduction method for the fast and reliable computation of statistical outputs of stochastic elliptic partial differential equations. Our method consists of three main ingredients: (1) the hybridizable discontinuous Galerkin (HDG) discretization of elliptic partial differential equations (PDEs), which allows us to obtain high-order accurate solutions of the governing PDE; (2) the reduced basis method for a new HDG discretization of the underlying PDE to enable real-time solution of the parameterized PDE in the presence of stochastic parameters; and (3) a multilevel variance reduction method that exploits the statistical correlation among the different reduced basismore » approximations and the high-fidelity HDG discretization to accelerate the convergence of the Monte Carlo simulations. The multilevel variance reduction method provides efficient computation of the statistical outputs by shifting most of the computational burden from the high-fidelity HDG approximation to the reduced basis approximations. Furthermore, we develop a posteriori error estimates for our approximations of the statistical outputs. Based on these error estimates, we propose an algorithm for optimally choosing both the dimensions of the reduced basis approximations and the sizes of Monte Carlo samples to achieve a given error tolerance. We provide numerical examples to demonstrate the performance of the proposed method.« less

  10. Methods for Assessment of Memory Reactivation.

    PubMed

    Liu, Shizhao; Grosmark, Andres D; Chen, Zhe

    2018-04-13

    It has been suggested that reactivation of previously acquired experiences or stored information in declarative memories in the hippocampus and neocortex contributes to memory consolidation and learning. Understanding memory consolidation depends crucially on the development of robust statistical methods for assessing memory reactivation. To date, several statistical methods have seen established for assessing memory reactivation based on bursts of ensemble neural spike activity during offline states. Using population-decoding methods, we propose a new statistical metric, the weighted distance correlation, to assess hippocampal memory reactivation (i.e., spatial memory replay) during quiet wakefulness and slow-wave sleep. The new metric can be combined with an unsupervised population decoding analysis, which is invariant to latent state labeling and allows us to detect statistical dependency beyond linearity in memory traces. We validate the new metric using two rat hippocampal recordings in spatial navigation tasks. Our proposed analysis framework may have a broader impact on assessing memory reactivations in other brain regions under different behavioral tasks.

  11. Propensity score to detect baseline imbalance in cluster randomized trials: the role of the c-statistic.

    PubMed

    Leyrat, Clémence; Caille, Agnès; Foucher, Yohann; Giraudeau, Bruno

    2016-01-22

    Despite randomization, baseline imbalance and confounding bias may occur in cluster randomized trials (CRTs). Covariate imbalance may jeopardize the validity of statistical inferences if they occur on prognostic factors. Thus, the diagnosis of a such imbalance is essential to adjust statistical analysis if required. We developed a tool based on the c-statistic of the propensity score (PS) model to detect global baseline covariate imbalance in CRTs and assess the risk of confounding bias. We performed a simulation study to assess the performance of the proposed tool and applied this method to analyze the data from 2 published CRTs. The proposed method had good performance for large sample sizes (n =500 per arm) and when the number of unbalanced covariates was not too small as compared with the total number of baseline covariates (≥40% of unbalanced covariates). We also provide a strategy for pre selection of the covariates needed to be included in the PS model to enhance imbalance detection. The proposed tool could be useful in deciding whether covariate adjustment is required before performing statistical analyses of CRTs.

  12. Design and Test of Pseudorandom Number Generator Using a Star Network of Lorenz Oscillators

    NASA Astrophysics Data System (ADS)

    Cho, Kenichiro; Miyano, Takaya

    We have recently developed a chaos-based stream cipher based on augmented Lorenz equations as a star network of Lorenz subsystems. In our method, the augmented Lorenz equations are used as a pseudorandom number generator. In this study, we propose a new method based on the augmented Lorenz equations for generating binary pseudorandom numbers and evaluate its security using the statistical tests of SP800-22 published by the National Institute for Standards and Technology in comparison with the performances of other chaotic dynamical models used as binary pseudorandom number generators. We further propose a faster version of the proposed method and evaluate its security using the statistical tests of TestU01 published by L’Ecuyer and Simard.

  13. Generalizing Terwilliger's likelihood approach: a new score statistic to test for genetic association.

    PubMed

    el Galta, Rachid; Uitte de Willige, Shirley; de Visser, Marieke C H; Helmer, Quinta; Hsu, Li; Houwing-Duistermaat, Jeanine J

    2007-09-24

    In this paper, we propose a one degree of freedom test for association between a candidate gene and a binary trait. This method is a generalization of Terwilliger's likelihood ratio statistic and is especially powerful for the situation of one associated haplotype. As an alternative to the likelihood ratio statistic, we derive a score statistic, which has a tractable expression. For haplotype analysis, we assume that phase is known. By means of a simulation study, we compare the performance of the score statistic to Pearson's chi-square statistic and the likelihood ratio statistic proposed by Terwilliger. We illustrate the method on three candidate genes studied in the Leiden Thrombophilia Study. We conclude that the statistic follows a chi square distribution under the null hypothesis and that the score statistic is more powerful than Terwilliger's likelihood ratio statistic when the associated haplotype has frequency between 0.1 and 0.4 and has a small impact on the studied disorder. With regard to Pearson's chi-square statistic, the score statistic has more power when the associated haplotype has frequency above 0.2 and the number of variants is above five.

  14. Gene Level Meta-Analysis of Quantitative Traits by Functional Linear Models.

    PubMed

    Fan, Ruzong; Wang, Yifan; Boehnke, Michael; Chen, Wei; Li, Yun; Ren, Haobo; Lobach, Iryna; Xiong, Momiao

    2015-08-01

    Meta-analysis of genetic data must account for differences among studies including study designs, markers genotyped, and covariates. The effects of genetic variants may differ from population to population, i.e., heterogeneity. Thus, meta-analysis of combining data of multiple studies is difficult. Novel statistical methods for meta-analysis are needed. In this article, functional linear models are developed for meta-analyses that connect genetic data to quantitative traits, adjusting for covariates. The models can be used to analyze rare variants, common variants, or a combination of the two. Both likelihood-ratio test (LRT) and F-distributed statistics are introduced to test association between quantitative traits and multiple variants in one genetic region. Extensive simulations are performed to evaluate empirical type I error rates and power performance of the proposed tests. The proposed LRT and F-distributed statistics control the type I error very well and have higher power than the existing methods of the meta-analysis sequence kernel association test (MetaSKAT). We analyze four blood lipid levels in data from a meta-analysis of eight European studies. The proposed methods detect more significant associations than MetaSKAT and the P-values of the proposed LRT and F-distributed statistics are usually much smaller than those of MetaSKAT. The functional linear models and related test statistics can be useful in whole-genome and whole-exome association studies. Copyright © 2015 by the Genetics Society of America.

  15. Dissolution curve comparisons through the F(2) parameter, a Bayesian extension of the f(2) statistic.

    PubMed

    Novick, Steven; Shen, Yan; Yang, Harry; Peterson, John; LeBlond, Dave; Altan, Stan

    2015-01-01

    Dissolution (or in vitro release) studies constitute an important aspect of pharmaceutical drug development. One important use of such studies is for justifying a biowaiver for post-approval changes which requires establishing equivalence between the new and old product. We propose a statistically rigorous modeling approach for this purpose based on the estimation of what we refer to as the F2 parameter, an extension of the commonly used f2 statistic. A Bayesian test procedure is proposed in relation to a set of composite hypotheses that capture the similarity requirement on the absolute mean differences between test and reference dissolution profiles. Several examples are provided to illustrate the application. Results of our simulation study comparing the performance of f2 and the proposed method show that our Bayesian approach is comparable to or in many cases superior to the f2 statistic as a decision rule. Further useful extensions of the method, such as the use of continuous-time dissolution modeling, are considered.

  16. A Seasonal Time-Series Model Based on Gene Expression Programming for Predicting Financial Distress

    PubMed Central

    2018-01-01

    The issue of financial distress prediction plays an important and challenging research topic in the financial field. Currently, there have been many methods for predicting firm bankruptcy and financial crisis, including the artificial intelligence and the traditional statistical methods, and the past studies have shown that the prediction result of the artificial intelligence method is better than the traditional statistical method. Financial statements are quarterly reports; hence, the financial crisis of companies is seasonal time-series data, and the attribute data affecting the financial distress of companies is nonlinear and nonstationary time-series data with fluctuations. Therefore, this study employed the nonlinear attribute selection method to build a nonlinear financial distress prediction model: that is, this paper proposed a novel seasonal time-series gene expression programming model for predicting the financial distress of companies. The proposed model has several advantages including the following: (i) the proposed model is different from the previous models lacking the concept of time series; (ii) the proposed integrated attribute selection method can find the core attributes and reduce high dimensional data; and (iii) the proposed model can generate the rules and mathematical formulas of financial distress for providing references to the investors and decision makers. The result shows that the proposed method is better than the listing classifiers under three criteria; hence, the proposed model has competitive advantages in predicting the financial distress of companies. PMID:29765399

  17. A Seasonal Time-Series Model Based on Gene Expression Programming for Predicting Financial Distress.

    PubMed

    Cheng, Ching-Hsue; Chan, Chia-Pang; Yang, Jun-He

    2018-01-01

    The issue of financial distress prediction plays an important and challenging research topic in the financial field. Currently, there have been many methods for predicting firm bankruptcy and financial crisis, including the artificial intelligence and the traditional statistical methods, and the past studies have shown that the prediction result of the artificial intelligence method is better than the traditional statistical method. Financial statements are quarterly reports; hence, the financial crisis of companies is seasonal time-series data, and the attribute data affecting the financial distress of companies is nonlinear and nonstationary time-series data with fluctuations. Therefore, this study employed the nonlinear attribute selection method to build a nonlinear financial distress prediction model: that is, this paper proposed a novel seasonal time-series gene expression programming model for predicting the financial distress of companies. The proposed model has several advantages including the following: (i) the proposed model is different from the previous models lacking the concept of time series; (ii) the proposed integrated attribute selection method can find the core attributes and reduce high dimensional data; and (iii) the proposed model can generate the rules and mathematical formulas of financial distress for providing references to the investors and decision makers. The result shows that the proposed method is better than the listing classifiers under three criteria; hence, the proposed model has competitive advantages in predicting the financial distress of companies.

  18. Machine Learning Methods for Attack Detection in the Smart Grid.

    PubMed

    Ozay, Mete; Esnaola, Inaki; Yarman Vural, Fatos Tunay; Kulkarni, Sanjeev R; Poor, H Vincent

    2016-08-01

    Attack detection problems in the smart grid are posed as statistical learning problems for different attack scenarios in which the measurements are observed in batch or online settings. In this approach, machine learning algorithms are used to classify measurements as being either secure or attacked. An attack detection framework is provided to exploit any available prior knowledge about the system and surmount constraints arising from the sparse structure of the problem in the proposed approach. Well-known batch and online learning algorithms (supervised and semisupervised) are employed with decision- and feature-level fusion to model the attack detection problem. The relationships between statistical and geometric properties of attack vectors employed in the attack scenarios and learning algorithms are analyzed to detect unobservable attacks using statistical learning methods. The proposed algorithms are examined on various IEEE test systems. Experimental analyses show that machine learning algorithms can detect attacks with performances higher than attack detection algorithms that employ state vector estimation methods in the proposed attack detection framework.

  19. Weighted analysis methods for mapped plot forest inventory data: Tables, regressions, maps and graphs

    Treesearch

    Paul C. Van Deusen; Linda S. Heath

    2010-01-01

    Weighted estimation methods for analysis of mapped plot forest inventory data are discussed. The appropriate weighting scheme can vary depending on the type of analysis and graphical display. Both statistical issues and user expectations need to be considered in these methods. A weighting scheme is proposed that balances statistical considerations and the logical...

  20. Level set method with automatic selective local statistics for brain tumor segmentation in MR images.

    PubMed

    Thapaliya, Kiran; Pyun, Jae-Young; Park, Chun-Su; Kwon, Goo-Rak

    2013-01-01

    The level set approach is a powerful tool for segmenting images. This paper proposes a method for segmenting brain tumor images from MR images. A new signed pressure function (SPF) that can efficiently stop the contours at weak or blurred edges is introduced. The local statistics of the different objects present in the MR images were calculated. Using local statistics, the tumor objects were identified among different objects. In this level set method, the calculation of the parameters is a challenging task. The calculations of different parameters for different types of images were automatic. The basic thresholding value was updated and adjusted automatically for different MR images. This thresholding value was used to calculate the different parameters in the proposed algorithm. The proposed algorithm was tested on the magnetic resonance images of the brain for tumor segmentation and its performance was evaluated visually and quantitatively. Numerical experiments on some brain tumor images highlighted the efficiency and robustness of this method. Crown Copyright © 2013. Published by Elsevier Ltd. All rights reserved.

  1. Intelligent Condition Diagnosis Method Based on Adaptive Statistic Test Filter and Diagnostic Bayesian Network

    PubMed Central

    Li, Ke; Zhang, Qiuju; Wang, Kun; Chen, Peng; Wang, Huaqing

    2016-01-01

    A new fault diagnosis method for rotating machinery based on adaptive statistic test filter (ASTF) and Diagnostic Bayesian Network (DBN) is presented in this paper. ASTF is proposed to obtain weak fault features under background noise, ASTF is based on statistic hypothesis testing in the frequency domain to evaluate similarity between reference signal (noise signal) and original signal, and remove the component of high similarity. The optimal level of significance α is obtained using particle swarm optimization (PSO). To evaluate the performance of the ASTF, evaluation factor Ipq is also defined. In addition, a simulation experiment is designed to verify the effectiveness and robustness of ASTF. A sensitive evaluation method using principal component analysis (PCA) is proposed to evaluate the sensitiveness of symptom parameters (SPs) for condition diagnosis. By this way, the good SPs that have high sensitiveness for condition diagnosis can be selected. A three-layer DBN is developed to identify condition of rotation machinery based on the Bayesian Belief Network (BBN) theory. Condition diagnosis experiment for rolling element bearings demonstrates the effectiveness of the proposed method. PMID:26761006

  2. Intelligent Condition Diagnosis Method Based on Adaptive Statistic Test Filter and Diagnostic Bayesian Network.

    PubMed

    Li, Ke; Zhang, Qiuju; Wang, Kun; Chen, Peng; Wang, Huaqing

    2016-01-08

    A new fault diagnosis method for rotating machinery based on adaptive statistic test filter (ASTF) and Diagnostic Bayesian Network (DBN) is presented in this paper. ASTF is proposed to obtain weak fault features under background noise, ASTF is based on statistic hypothesis testing in the frequency domain to evaluate similarity between reference signal (noise signal) and original signal, and remove the component of high similarity. The optimal level of significance α is obtained using particle swarm optimization (PSO). To evaluate the performance of the ASTF, evaluation factor Ipq is also defined. In addition, a simulation experiment is designed to verify the effectiveness and robustness of ASTF. A sensitive evaluation method using principal component analysis (PCA) is proposed to evaluate the sensitiveness of symptom parameters (SPs) for condition diagnosis. By this way, the good SPs that have high sensitiveness for condition diagnosis can be selected. A three-layer DBN is developed to identify condition of rotation machinery based on the Bayesian Belief Network (BBN) theory. Condition diagnosis experiment for rolling element bearings demonstrates the effectiveness of the proposed method.

  3. Valid statistical inference methods for a case-control study with missing data.

    PubMed

    Tian, Guo-Liang; Zhang, Chi; Jiang, Xuejun

    2018-04-01

    The main objective of this paper is to derive the valid sampling distribution of the observed counts in a case-control study with missing data under the assumption of missing at random by employing the conditional sampling method and the mechanism augmentation method. The proposed sampling distribution, called the case-control sampling distribution, can be used to calculate the standard errors of the maximum likelihood estimates of parameters via the Fisher information matrix and to generate independent samples for constructing small-sample bootstrap confidence intervals. Theoretical comparisons of the new case-control sampling distribution with two existing sampling distributions exhibit a large difference. Simulations are conducted to investigate the influence of the three different sampling distributions on statistical inferences. One finding is that the conclusion by the Wald test for testing independency under the two existing sampling distributions could be completely different (even contradictory) from the Wald test for testing the equality of the success probabilities in control/case groups under the proposed distribution. A real cervical cancer data set is used to illustrate the proposed statistical methods.

  4. Statistical analysis of loopy belief propagation in random fields

    NASA Astrophysics Data System (ADS)

    Yasuda, Muneki; Kataoka, Shun; Tanaka, Kazuyuki

    2015-10-01

    Loopy belief propagation (LBP), which is equivalent to the Bethe approximation in statistical mechanics, is a message-passing-type inference method that is widely used to analyze systems based on Markov random fields (MRFs). In this paper, we propose a message-passing-type method to analytically evaluate the quenched average of LBP in random fields by using the replica cluster variation method. The proposed analytical method is applicable to general pairwise MRFs with random fields whose distributions differ from each other and can give the quenched averages of the Bethe free energies over random fields, which are consistent with numerical results. The order of its computational cost is equivalent to that of standard LBP. In the latter part of this paper, we describe the application of the proposed method to Bayesian image restoration, in which we observed that our theoretical results are in good agreement with the numerical results for natural images.

  5. A global goodness-of-fit statistic for Cox regression models.

    PubMed

    Parzen, M; Lipsitz, S R

    1999-06-01

    In this paper, a global goodness-of-fit test statistic for a Cox regression model, which has an approximate chi-squared distribution when the model has been correctly specified, is proposed. Our goodness-of-fit statistic is global and has power to detect if interactions or higher order powers of covariates in the model are needed. The proposed statistic is similar to the Hosmer and Lemeshow (1980, Communications in Statistics A10, 1043-1069) goodness-of-fit statistic for binary data as well as Schoenfeld's (1980, Biometrika 67, 145-153) statistic for the Cox model. The methods are illustrated using data from a Mayo Clinic trial in primary billiary cirrhosis of the liver (Fleming and Harrington, 1991, Counting Processes and Survival Analysis), in which the outcome is the time until liver transplantation or death. The are 17 possible covariates. Two Cox proportional hazards models are fit to the data, and the proposed goodness-of-fit statistic is applied to the fitted models.

  6. The Novel Quantitative Technique for Assessment of Gait Symmetry Using Advanced Statistical Learning Algorithm

    PubMed Central

    Wu, Jianning; Wu, Bin

    2015-01-01

    The accurate identification of gait asymmetry is very beneficial to the assessment of at-risk gait in the clinical applications. This paper investigated the application of classification method based on statistical learning algorithm to quantify gait symmetry based on the assumption that the degree of intrinsic change in dynamical system of gait is associated with the different statistical distributions between gait variables from left-right side of lower limbs; that is, the discrimination of small difference of similarity between lower limbs is considered the reorganization of their different probability distribution. The kinetic gait data of 60 participants were recorded using a strain gauge force platform during normal walking. The classification method is designed based on advanced statistical learning algorithm such as support vector machine algorithm for binary classification and is adopted to quantitatively evaluate gait symmetry. The experiment results showed that the proposed method could capture more intrinsic dynamic information hidden in gait variables and recognize the right-left gait patterns with superior generalization performance. Moreover, our proposed techniques could identify the small significant difference between lower limbs when compared to the traditional symmetry index method for gait. The proposed algorithm would become an effective tool for early identification of the elderly gait asymmetry in the clinical diagnosis. PMID:25705672

  7. The novel quantitative technique for assessment of gait symmetry using advanced statistical learning algorithm.

    PubMed

    Wu, Jianning; Wu, Bin

    2015-01-01

    The accurate identification of gait asymmetry is very beneficial to the assessment of at-risk gait in the clinical applications. This paper investigated the application of classification method based on statistical learning algorithm to quantify gait symmetry based on the assumption that the degree of intrinsic change in dynamical system of gait is associated with the different statistical distributions between gait variables from left-right side of lower limbs; that is, the discrimination of small difference of similarity between lower limbs is considered the reorganization of their different probability distribution. The kinetic gait data of 60 participants were recorded using a strain gauge force platform during normal walking. The classification method is designed based on advanced statistical learning algorithm such as support vector machine algorithm for binary classification and is adopted to quantitatively evaluate gait symmetry. The experiment results showed that the proposed method could capture more intrinsic dynamic information hidden in gait variables and recognize the right-left gait patterns with superior generalization performance. Moreover, our proposed techniques could identify the small significant difference between lower limbs when compared to the traditional symmetry index method for gait. The proposed algorithm would become an effective tool for early identification of the elderly gait asymmetry in the clinical diagnosis.

  8. A comparison of performance of automatic cloud coverage assessment algorithm for Formosat-2 image using clustering-based and spatial thresholding methods

    NASA Astrophysics Data System (ADS)

    Hsu, Kuo-Hsien

    2012-11-01

    Formosat-2 image is a kind of high-spatial-resolution (2 meters GSD) remote sensing satellite data, which includes one panchromatic band and four multispectral bands (Blue, Green, Red, near-infrared). An essential sector in the daily processing of received Formosat-2 image is to estimate the cloud statistic of image using Automatic Cloud Coverage Assessment (ACCA) algorithm. The information of cloud statistic of image is subsequently recorded as an important metadata for image product catalog. In this paper, we propose an ACCA method with two consecutive stages: preprocessing and post-processing analysis. For pre-processing analysis, the un-supervised K-means classification, Sobel's method, thresholding method, non-cloudy pixels reexamination, and cross-band filter method are implemented in sequence for cloud statistic determination. For post-processing analysis, Box-Counting fractal method is implemented. In other words, the cloud statistic is firstly determined via pre-processing analysis, the correctness of cloud statistic of image of different spectral band is eventually cross-examined qualitatively and quantitatively via post-processing analysis. The selection of an appropriate thresholding method is very critical to the result of ACCA method. Therefore, in this work, We firstly conduct a series of experiments of the clustering-based and spatial thresholding methods that include Otsu's, Local Entropy(LE), Joint Entropy(JE), Global Entropy(GE), and Global Relative Entropy(GRE) method, for performance comparison. The result shows that Otsu's and GE methods both perform better than others for Formosat-2 image. Additionally, our proposed ACCA method by selecting Otsu's method as the threshoding method has successfully extracted the cloudy pixels of Formosat-2 image for accurate cloud statistic estimation.

  9. Multi-classification of cell deformation based on object alignment and run length statistic.

    PubMed

    Li, Heng; Liu, Zhiwen; An, Xing; Shi, Yonggang

    2014-01-01

    Cellular morphology is widely applied in digital pathology and is essential for improving our understanding of the basic physiological processes of organisms. One of the main issues of application is to develop efficient methods for cell deformation measurement. We propose an innovative indirect approach to analyze dynamic cell morphology in image sequences. The proposed approach considers both the cellular shape change and cytoplasm variation, and takes each frame in the image sequence into account. The cell deformation is measured by the minimum energy function of object alignment, which is invariant to object pose. Then an indirect analysis strategy is employed to overcome the limitation of gradual deformation by run length statistic. We demonstrate the power of the proposed approach with one application: multi-classification of cell deformation. Experimental results show that the proposed method is sensitive to the morphology variation and performs better than standard shape representation methods.

  10. A REVIEW OF STATISTICAL METHODS FOR THE METEOROLOGICAL ADJUSTMENT OF TROPOSPHERIC OZONE. (R825173)

    EPA Science Inventory

    Abstract

    A variety of statistical methods for meteorological adjustment of ozone have been proposed in the literature over the last decade for purposes of forecasting, estimating ozone time trends, or investigating underlying mechanisms from an empirical perspective. T...

  11. Statistical image-domain multimaterial decomposition for dual-energy CT.

    PubMed

    Xue, Yi; Ruan, Ruoshui; Hu, Xiuhua; Kuang, Yu; Wang, Jing; Long, Yong; Niu, Tianye

    2017-03-01

    Dual-energy CT (DECT) enhances tissue characterization because of its basis material decomposition capability. In addition to conventional two-material decomposition from DECT measurements, multimaterial decomposition (MMD) is required in many clinical applications. To solve the ill-posed problem of reconstructing multi-material images from dual-energy measurements, additional constraints are incorporated into the formulation, including volume and mass conservation and the assumptions that there are at most three materials in each pixel and various material types among pixels. The recently proposed flexible image-domain MMD method decomposes pixels sequentially into multiple basis materials using a direct inversion scheme which leads to magnified noise in the material images. In this paper, we propose a statistical image-domain MMD method for DECT to suppress the noise. The proposed method applies penalized weighted least-square (PWLS) reconstruction with a negative log-likelihood term and edge-preserving regularization for each material. The statistical weight is determined by a data-based method accounting for the noise variance of high- and low-energy CT images. We apply the optimization transfer principles to design a serial of pixel-wise separable quadratic surrogates (PWSQS) functions which monotonically decrease the cost function. The separability in each pixel enables the simultaneous update of all pixels. The proposed method is evaluated on a digital phantom, Catphan©600 phantom and three patients (pelvis, head, and thigh). We also implement the direct inversion and low-pass filtration methods for a comparison purpose. Compared with the direct inversion method, the proposed method reduces noise standard deviation (STD) in soft tissue by 95.35% in the digital phantom study, by 88.01% in the Catphan©600 phantom study, by 92.45% in the pelvis patient study, by 60.21% in the head patient study, and by 81.22% in the thigh patient study, respectively. The overall volume fraction accuracy is improved by around 6.85%. Compared with the low-pass filtration method, the root-mean-square percentage error (RMSE(%)) of electron densities in the Catphan©600 phantom is decreased by 20.89%. As modulation transfer function (MTF) magnitude decreased to 50%, the proposed method increases the spatial resolution by an overall factor of 1.64 on the digital phantom, and 2.16 on the Catphan©600 phantom. The overall volume fraction accuracy is increased by 6.15%. We proposed a statistical image-domain MMD method using DECT measurements. The method successfully suppresses the magnified noise while faithfully retaining the quantification accuracy and anatomical structure in the decomposed material images. The proposed method is practical and promising for advanced clinical applications using DECT imaging. © 2017 American Association of Physicists in Medicine.

  12. History of Science and Statistical Education: Examples from Fisherian and Pearsonian Schools

    ERIC Educational Resources Information Center

    Yu, Chong Ho

    2004-01-01

    Many students share a popular misconception that statistics is a subject-free methodology derived from invariant and timeless mathematical axioms. It is proposed that statistical education should include the aspect of history/philosophy of science. This article will discuss how statistical methods developed by Karl Pearson and R. A. Fisher are…

  13. Statistical Methods for Generalized Linear Models with Covariates Subject to Detection Limits.

    PubMed

    Bernhardt, Paul W; Wang, Huixia J; Zhang, Daowen

    2015-05-01

    Censored observations are a common occurrence in biomedical data sets. Although a large amount of research has been devoted to estimation and inference for data with censored responses, very little research has focused on proper statistical procedures when predictors are censored. In this paper, we consider statistical methods for dealing with multiple predictors subject to detection limits within the context of generalized linear models. We investigate and adapt several conventional methods and develop a new multiple imputation approach for analyzing data sets with predictors censored due to detection limits. We establish the consistency and asymptotic normality of the proposed multiple imputation estimator and suggest a computationally simple and consistent variance estimator. We also demonstrate that the conditional mean imputation method often leads to inconsistent estimates in generalized linear models, while several other methods are either computationally intensive or lead to parameter estimates that are biased or more variable compared to the proposed multiple imputation estimator. In an extensive simulation study, we assess the bias and variability of different approaches within the context of a logistic regression model and compare variance estimation methods for the proposed multiple imputation estimator. Lastly, we apply several methods to analyze the data set from a recently-conducted GenIMS study.

  14. The Mantel-Haenszel procedure revisited: models and generalizations.

    PubMed

    Fidler, Vaclav; Nagelkerke, Nico

    2013-01-01

    Several statistical methods have been developed for adjusting the Odds Ratio of the relation between two dichotomous variables X and Y for some confounders Z. With the exception of the Mantel-Haenszel method, commonly used methods, notably binary logistic regression, are not symmetrical in X and Y. The classical Mantel-Haenszel method however only works for confounders with a limited number of discrete strata, which limits its utility, and appears to have no basis in statistical models. Here we revisit the Mantel-Haenszel method and propose an extension to continuous and vector valued Z. The idea is to replace the observed cell entries in strata of the Mantel-Haenszel procedure by subject specific classification probabilities for the four possible values of (X,Y) predicted by a suitable statistical model. For situations where X and Y can be treated symmetrically we propose and explore the multinomial logistic model. Under the homogeneity hypothesis, which states that the odds ratio does not depend on Z, the logarithm of the odds ratio estimator can be expressed as a simple linear combination of three parameters of this model. Methods for testing the homogeneity hypothesis are proposed. The relationship between this method and binary logistic regression is explored. A numerical example using survey data is presented.

  15. The Mantel-Haenszel Procedure Revisited: Models and Generalizations

    PubMed Central

    Fidler, Vaclav; Nagelkerke, Nico

    2013-01-01

    Several statistical methods have been developed for adjusting the Odds Ratio of the relation between two dichotomous variables X and Y for some confounders Z. With the exception of the Mantel-Haenszel method, commonly used methods, notably binary logistic regression, are not symmetrical in X and Y. The classical Mantel-Haenszel method however only works for confounders with a limited number of discrete strata, which limits its utility, and appears to have no basis in statistical models. Here we revisit the Mantel-Haenszel method and propose an extension to continuous and vector valued Z. The idea is to replace the observed cell entries in strata of the Mantel-Haenszel procedure by subject specific classification probabilities for the four possible values of (X,Y) predicted by a suitable statistical model. For situations where X and Y can be treated symmetrically we propose and explore the multinomial logistic model. Under the homogeneity hypothesis, which states that the odds ratio does not depend on Z, the logarithm of the odds ratio estimator can be expressed as a simple linear combination of three parameters of this model. Methods for testing the homogeneity hypothesis are proposed. The relationship between this method and binary logistic regression is explored. A numerical example using survey data is presented. PMID:23516463

  16. Spatial Statistics for Tumor Cell Counting and Classification

    NASA Astrophysics Data System (ADS)

    Wirjadi, Oliver; Kim, Yoo-Jin; Breuel, Thomas

    To count and classify cells in histological sections is a standard task in histology. One example is the grading of meningiomas, benign tumors of the meninges, which requires to assess the fraction of proliferating cells in an image. As this process is very time consuming when performed manually, automation is required. To address such problems, we propose a novel application of Markov point process methods in computer vision, leading to algorithms for computing the locations of circular objects in images. In contrast to previous algorithms using such spatial statistics methods in image analysis, the present one is fully trainable. This is achieved by combining point process methods with statistical classifiers. Using simulated data, the method proposed in this paper will be shown to be more accurate and more robust to noise than standard image processing methods. On the publicly available SIMCEP benchmark for cell image analysis algorithms, the cell count performance of the present paper is significantly more accurate than results published elsewhere, especially when cells form dense clusters. Furthermore, the proposed system performs as well as a state-of-the-art algorithm for the computer-aided histological grading of meningiomas when combined with a simple k-nearest neighbor classifier for identifying proliferating cells.

  17. A Third Moment Adjusted Test Statistic for Small Sample Factor Analysis.

    PubMed

    Lin, Johnny; Bentler, Peter M

    2012-01-01

    Goodness of fit testing in factor analysis is based on the assumption that the test statistic is asymptotically chi-square; but this property may not hold in small samples even when the factors and errors are normally distributed in the population. Robust methods such as Browne's asymptotically distribution-free method and Satorra Bentler's mean scaling statistic were developed under the presumption of non-normality in the factors and errors. This paper finds new application to the case where factors and errors are normally distributed in the population but the skewness of the obtained test statistic is still high due to sampling error in the observed indicators. An extension of Satorra Bentler's statistic is proposed that not only scales the mean but also adjusts the degrees of freedom based on the skewness of the obtained test statistic in order to improve its robustness under small samples. A simple simulation study shows that this third moment adjusted statistic asymptotically performs on par with previously proposed methods, and at a very small sample size offers superior Type I error rates under a properly specified model. Data from Mardia, Kent and Bibby's study of students tested for their ability in five content areas that were either open or closed book were used to illustrate the real-world performance of this statistic.

  18. A scan statistic to extract causal gene clusters from case-control genome-wide rare CNV data.

    PubMed

    Nishiyama, Takeshi; Takahashi, Kunihiko; Tango, Toshiro; Pinto, Dalila; Scherer, Stephen W; Takami, Satoshi; Kishino, Hirohisa

    2011-05-26

    Several statistical tests have been developed for analyzing genome-wide association data by incorporating gene pathway information in terms of gene sets. Using these methods, hundreds of gene sets are typically tested, and the tested gene sets often overlap. This overlapping greatly increases the probability of generating false positives, and the results obtained are difficult to interpret, particularly when many gene sets show statistical significance. We propose a flexible statistical framework to circumvent these problems. Inspired by spatial scan statistics for detecting clustering of disease occurrence in the field of epidemiology, we developed a scan statistic to extract disease-associated gene clusters from a whole gene pathway. Extracting one or a few significant gene clusters from a global pathway limits the overall false positive probability, which results in increased statistical power, and facilitates the interpretation of test results. In the present study, we applied our method to genome-wide association data for rare copy-number variations, which have been strongly implicated in common diseases. Application of our method to a simulated dataset demonstrated the high accuracy of this method in detecting disease-associated gene clusters in a whole gene pathway. The scan statistic approach proposed here shows a high level of accuracy in detecting gene clusters in a whole gene pathway. This study has provided a sound statistical framework for analyzing genome-wide rare CNV data by incorporating topological information on the gene pathway.

  19. Multi-Reader ROC studies with Split-Plot Designs: A Comparison of Statistical Methods

    PubMed Central

    Obuchowski, Nancy A.; Gallas, Brandon D.; Hillis, Stephen L.

    2012-01-01

    Rationale and Objectives Multi-reader imaging trials often use a factorial design, where study patients undergo testing with all imaging modalities and readers interpret the results of all tests for all patients. A drawback of the design is the large number of interpretations required of each reader. Split-plot designs have been proposed as an alternative, in which one or a subset of readers interprets all images of a sample of patients, while other readers interpret the images of other samples of patients. In this paper we compare three methods of analysis for the split-plot design. Materials and Methods Three statistical methods are presented: Obuchowski-Rockette method modified for the split-plot design, a newly proposed marginal-mean ANOVA approach, and an extension of the three-sample U-statistic method. A simulation study using the Roe-Metz model was performed to compare the type I error rate, power and confidence interval coverage of the three test statistics. Results The type I error rates for all three methods are close to the nominal level but tend to be slightly conservative. The statistical power is nearly identical for the three methods. The coverage of 95% CIs fall close to the nominal coverage for small and large sample sizes. Conclusions The split-plot MRMC study design can be statistically efficient compared with the factorial design, reducing the number of interpretations required per reader. Three methods of analysis, shown to have nominal type I error rate, similar power, and nominal CI coverage, are available for this study design. PMID:23122570

  20. Nonparametric estimation and testing of fixed effects panel data models

    PubMed Central

    Henderson, Daniel J.; Carroll, Raymond J.; Li, Qi

    2009-01-01

    In this paper we consider the problem of estimating nonparametric panel data models with fixed effects. We introduce an iterative nonparametric kernel estimator. We also extend the estimation method to the case of a semiparametric partially linear fixed effects model. To determine whether a parametric, semiparametric or nonparametric model is appropriate, we propose test statistics to test between the three alternatives in practice. We further propose a test statistic for testing the null hypothesis of random effects against fixed effects in a nonparametric panel data regression model. Simulations are used to examine the finite sample performance of the proposed estimators and the test statistics. PMID:19444335

  1. Quantitative evaluation of cross correlation between two finite-length time series with applications to single-molecule FRET.

    PubMed

    Hanson, Jeffery A; Yang, Haw

    2008-11-06

    The statistical properties of the cross correlation between two time series has been studied. An analytical expression for the cross correlation function's variance has been derived. On the basis of these results, a statistically robust method has been proposed to detect the existence and determine the direction of cross correlation between two time series. The proposed method has been characterized by computer simulations. Applications to single-molecule fluorescence spectroscopy are discussed. The results may also find immediate applications in fluorescence correlation spectroscopy (FCS) and its variants.

  2. A nonparametric smoothing method for assessing GEE models with longitudinal binary data.

    PubMed

    Lin, Kuo-Chin; Chen, Yi-Ju; Shyr, Yu

    2008-09-30

    Studies involving longitudinal binary responses are widely applied in the health and biomedical sciences research and frequently analyzed by generalized estimating equations (GEE) method. This article proposes an alternative goodness-of-fit test based on the nonparametric smoothing approach for assessing the adequacy of GEE fitted models, which can be regarded as an extension of the goodness-of-fit test of le Cessie and van Houwelingen (Biometrics 1991; 47:1267-1282). The expectation and approximate variance of the proposed test statistic are derived. The asymptotic distribution of the proposed test statistic in terms of a scaled chi-squared distribution and the power performance of the proposed test are discussed by simulation studies. The testing procedure is demonstrated by two real data. Copyright (c) 2008 John Wiley & Sons, Ltd.

  3. The assignment of scores procedure for ordinal categorical data.

    PubMed

    Chen, Han-Ching; Wang, Nae-Sheng

    2014-01-01

    Ordinal data are the most frequently encountered type of data in the social sciences. Many statistical methods can be used to process such data. One common method is to assign scores to the data, convert them into interval data, and further perform statistical analysis. There are several authors who have recently developed assigning score methods to assign scores to ordered categorical data. This paper proposes an approach that defines an assigning score system for an ordinal categorical variable based on underlying continuous latent distribution with interpretation by using three case study examples. The results show that the proposed score system is well for skewed ordinal categorical data.

  4. Brain tissues volume measurements from 2D MRI using parametric approach

    NASA Astrophysics Data System (ADS)

    L'vov, A. A.; Toropova, O. A.; Litovka, Yu. V.

    2018-04-01

    The purpose of the paper is to propose a fully automated method of volume assessment of structures within human brain. Our statistical approach uses maximum interdependency principle for decision making process of measurements consistency and unequal observations. Detecting outliers performed using maximum normalized residual test. We propose a statistical model which utilizes knowledge of tissues distribution in human brain and applies partial data restoration for precision improvement. The approach proposes completed computationally efficient and independent from segmentation algorithm used in the application.

  5. Kappa statistic for clustered matched-pair data.

    PubMed

    Yang, Zhao; Zhou, Ming

    2014-07-10

    Kappa statistic is widely used to assess the agreement between two procedures in the independent matched-pair data. For matched-pair data collected in clusters, on the basis of the delta method and sampling techniques, we propose a nonparametric variance estimator for the kappa statistic without within-cluster correlation structure or distributional assumptions. The results of an extensive Monte Carlo simulation study demonstrate that the proposed kappa statistic provides consistent estimation and the proposed variance estimator behaves reasonably well for at least a moderately large number of clusters (e.g., K ≥50). Compared with the variance estimator ignoring dependence within a cluster, the proposed variance estimator performs better in maintaining the nominal coverage probability when the intra-cluster correlation is fair (ρ ≥0.3), with more pronounced improvement when ρ is further increased. To illustrate the practical application of the proposed estimator, we analyze two real data examples of clustered matched-pair data. Copyright © 2014 John Wiley & Sons, Ltd.

  6. Scene-based nonuniformity correction and enhancement: pixel statistics and subpixel motion.

    PubMed

    Zhao, Wenyi; Zhang, Chao

    2008-07-01

    We propose a framework for scene-based nonuniformity correction (NUC) and nonuniformity correction and enhancement (NUCE) that is required for focal-plane array-like sensors to obtain clean and enhanced-quality images. The core of the proposed framework is a novel registration-based nonuniformity correction super-resolution (NUCSR) method that is bootstrapped by statistical scene-based NUC methods. Based on a comprehensive imaging model and an accurate parametric motion estimation, we are able to remove severe/structured nonuniformity and in the presence of subpixel motion to simultaneously improve image resolution. One important feature of our NUCSR method is the adoption of a parametric motion model that allows us to (1) handle many practical scenarios where parametric motions are present and (2) carry out perfect super-resolution in principle by exploring available subpixel motions. Experiments with real data demonstrate the efficiency of the proposed NUCE framework and the effectiveness of the NUCSR method.

  7. Statistical methods for investigating quiescence and other temporal seismicity patterns

    USGS Publications Warehouse

    Matthews, M.V.; Reasenberg, P.A.

    1988-01-01

    We propose a statistical model and a technique for objective recognition of one of the most commonly cited seismicity patterns:microearthquake quiescence. We use a Poisson process model for seismicity and define a process with quiescence as one with a particular type of piece-wise constant intensity function. From this model, we derive a statistic for testing stationarity against a 'quiescence' alternative. The large-sample null distribution of this statistic is approximated from simulated distributions of appropriate functionals applied to Brownian bridge processes. We point out the restrictiveness of the particular model we propose and of the quiescence idea in general. The fact that there are many point processes which have neither constant nor quiescent rate functions underscores the need to test for and describe nonuniformity thoroughly. We advocate the use of the quiescence test in conjunction with various other tests for nonuniformity and with graphical methods such as density estimation. ideally these methods may promote accurate description of temporal seismicity distributions and useful characterizations of interesting patterns. ?? 1988 Birkha??user Verlag.

  8. Multi-reader ROC studies with split-plot designs: a comparison of statistical methods.

    PubMed

    Obuchowski, Nancy A; Gallas, Brandon D; Hillis, Stephen L

    2012-12-01

    Multireader imaging trials often use a factorial design, in which study patients undergo testing with all imaging modalities and readers interpret the results of all tests for all patients. A drawback of this design is the large number of interpretations required of each reader. Split-plot designs have been proposed as an alternative, in which one or a subset of readers interprets all images of a sample of patients, while other readers interpret the images of other samples of patients. In this paper, the authors compare three methods of analysis for the split-plot design. Three statistical methods are presented: the Obuchowski-Rockette method modified for the split-plot design, a newly proposed marginal-mean analysis-of-variance approach, and an extension of the three-sample U-statistic method. A simulation study using the Roe-Metz model was performed to compare the type I error rate, power, and confidence interval coverage of the three test statistics. The type I error rates for all three methods are close to the nominal level but tend to be slightly conservative. The statistical power is nearly identical for the three methods. The coverage of 95% confidence intervals falls close to the nominal coverage for small and large sample sizes. The split-plot multireader, multicase study design can be statistically efficient compared to the factorial design, reducing the number of interpretations required per reader. Three methods of analysis, shown to have nominal type I error rates, similar power, and nominal confidence interval coverage, are available for this study design. Copyright © 2012 AUR. All rights reserved.

  9. Probability of identification: a statistical model for the validation of qualitative botanical identification methods.

    PubMed

    LaBudde, Robert A; Harnly, James M

    2012-01-01

    A qualitative botanical identification method (BIM) is an analytical procedure that returns a binary result (1 = Identified, 0 = Not Identified). A BIM may be used by a buyer, manufacturer, or regulator to determine whether a botanical material being tested is the same as the target (desired) material, or whether it contains excessive nontarget (undesirable) material. The report describes the development and validation of studies for a BIM based on the proportion of replicates identified, or probability of identification (POI), as the basic observed statistic. The statistical procedures proposed for data analysis follow closely those of the probability of detection, and harmonize the statistical concepts and parameters between quantitative and qualitative method validation. Use of POI statistics also harmonizes statistical concepts for botanical, microbiological, toxin, and other analyte identification methods that produce binary results. The POI statistical model provides a tool for graphical representation of response curves for qualitative methods, reporting of descriptive statistics, and application of performance requirements. Single collaborator and multicollaborative study examples are given.

  10. New heterogeneous test statistics for the unbalanced fixed-effect nested design.

    PubMed

    Guo, Jiin-Huarng; Billard, L; Luh, Wei-Ming

    2011-05-01

    When the underlying variances are unknown or/and unequal, using the conventional F test is problematic in the two-factor hierarchical data structure. Prompted by the approximate test statistics (Welch and Alexander-Govern methods), the authors develop four new heterogeneous test statistics to test factor A and factor B nested within A for the unbalanced fixed-effect two-stage nested design under variance heterogeneity. The actual significance levels and statistical power of the test statistics were compared in a simulation study. The results show that the proposed procedures maintain better Type I error rate control and have greater statistical power than those obtained by the conventional F test in various conditions. Therefore, the proposed test statistics are recommended in terms of robustness and easy implementation. ©2010 The British Psychological Society.

  11. Spectrophotometric methods for simultaneous determination of betamethasone valerate and fusidic acid in their binary mixture.

    PubMed

    Lotfy, Hayam Mahmoud; Salem, Hesham; Abdelkawy, Mohammad; Samir, Ahmed

    2015-04-05

    Five spectrophotometric methods were successfully developed and validated for the determination of betamethasone valerate and fusidic acid in their binary mixture. Those methods are isoabsorptive point method combined with the first derivative (ISO Point--D1) and the recently developed and well established methods namely ratio difference (RD) and constant center coupled with spectrum subtraction (CC) methods, in addition to derivative ratio (1DD) and mean centering of ratio spectra (MCR). New enrichment technique called spectrum addition technique was used instead of traditional spiking technique. The proposed spectrophotometric procedures do not require any separation steps. Accuracy, precision and linearity ranges of the proposed methods were determined and the specificity was assessed by analyzing synthetic mixtures of both drugs. They were applied to their pharmaceutical formulation and the results obtained were statistically compared to that of official methods. The statistical comparison showed that there is no significant difference between the proposed methods and the official ones regarding both accuracy and precision. Copyright © 2015 Elsevier B.V. All rights reserved.

  12. Critical appraisal of fundamental items in approved clinical trial research proposals in Mashhad University of Medical Sciences

    PubMed Central

    Shakeri, Mohammad-Taghi; Taghipour, Ali; Sadeghi, Masoumeh; Nezami, Hossein; Amirabadizadeh, Ali-Reza; Bonakchi, Hossein

    2017-01-01

    Background: Writing, designing, and conducting a clinical trial research proposal has an important role in achieving valid and reliable findings. Thus, this study aimed at critically appraising fundamental information in approved clinical trial research proposals in Mashhad University of Medical Sciences (MUMS) from 2008 to 2014. Methods: This cross-sectional study was conducted on all 935 approved clinical trial research proposals in MUMS from 2008 to 2014. A valid and reliable as well as comprehensive, simple, and usable checklist in sessions with biostatisticians and methodologists, consisting of 11 main items as research tool, were used. Agreement rate between the reviewers of the proposals, who were responsible for data collection, was assessed during 3 sessions, and Kappa statistics was calculated at the last session as 97%. Results: More than 60% of the research proposals had a methodologist consultant, moreover, type of study or study design had been specified in almost all of them (98%). Appropriateness of study aims with hypotheses was not observed in a significant number of research proposals (585 proposals, 62.6%). The required sample size for 66.8% of the approved proposals was based on a sample size formula; however, in 25% of the proposals, sample size formula was not in accordance with the study design. Data collection tool was not selected appropriately in 55.2% of the approved research proposals. Type and method of randomization were unknown in 21% of the proposals and dealing with missing data had not been described in most of them (98%). Inclusion and exclusion criteria were (92%) fully and adequately explained. Moreover, 44% and 31% of the research proposals were moderate and weak in rank, respectively, with respect to the correctness of the statistical analysis methods. Conclusion: Findings of the present study revealed that a large portion of the approved proposals were highly biased or ambiguous with respect to randomization, blinding, dealing with missing data, data collection tool, sampling methods, and statistical analysis. Thus, it is essential to consult and collaborate with a methodologist in all parts of a proposal to control the possible and specific biases in clinical trials. PMID:29445703

  13. Critical appraisal of fundamental items in approved clinical trial research proposals in Mashhad University of Medical Sciences.

    PubMed

    Shakeri, Mohammad-Taghi; Taghipour, Ali; Sadeghi, Masoumeh; Nezami, Hossein; Amirabadizadeh, Ali-Reza; Bonakchi, Hossein

    2017-01-01

    Background: Writing, designing, and conducting a clinical trial research proposal has an important role in achieving valid and reliable findings. Thus, this study aimed at critically appraising fundamental information in approved clinical trial research proposals in Mashhad University of Medical Sciences (MUMS) from 2008 to 2014. Methods: This cross-sectional study was conducted on all 935 approved clinical trial research proposals in MUMS from 2008 to 2014. A valid and reliable as well as comprehensive, simple, and usable checklist in sessions with biostatisticians and methodologists, consisting of 11 main items as research tool, were used. Agreement rate between the reviewers of the proposals, who were responsible for data collection, was assessed during 3 sessions, and Kappa statistics was calculated at the last session as 97%. Results: More than 60% of the research proposals had a methodologist consultant, moreover, type of study or study design had been specified in almost all of them (98%). Appropriateness of study aims with hypotheses was not observed in a significant number of research proposals (585 proposals, 62.6%). The required sample size for 66.8% of the approved proposals was based on a sample size formula; however, in 25% of the proposals, sample size formula was not in accordance with the study design. Data collection tool was not selected appropriately in 55.2% of the approved research proposals. Type and method of randomization were unknown in 21% of the proposals and dealing with missing data had not been described in most of them (98%). Inclusion and exclusion criteria were (92%) fully and adequately explained. Moreover, 44% and 31% of the research proposals were moderate and weak in rank, respectively, with respect to the correctness of the statistical analysis methods. Conclusion: Findings of the present study revealed that a large portion of the approved proposals were highly biased or ambiguous with respect to randomization, blinding, dealing with missing data, data collection tool, sampling methods, and statistical analysis. Thus, it is essential to consult and collaborate with a methodologist in all parts of a proposal to control the possible and specific biases in clinical trials.

  14. The intervals method: a new approach to analyse finite element outputs using multivariate statistics

    PubMed Central

    De Esteban-Trivigno, Soledad; Püschel, Thomas A.; Fortuny, Josep

    2017-01-01

    Background In this paper, we propose a new method, named the intervals’ method, to analyse data from finite element models in a comparative multivariate framework. As a case study, several armadillo mandibles are analysed, showing that the proposed method is useful to distinguish and characterise biomechanical differences related to diet/ecomorphology. Methods The intervals’ method consists of generating a set of variables, each one defined by an interval of stress values. Each variable is expressed as a percentage of the area of the mandible occupied by those stress values. Afterwards these newly generated variables can be analysed using multivariate methods. Results Applying this novel method to the biological case study of whether armadillo mandibles differ according to dietary groups, we show that the intervals’ method is a powerful tool to characterize biomechanical performance and how this relates to different diets. This allows us to positively discriminate between specialist and generalist species. Discussion We show that the proposed approach is a useful methodology not affected by the characteristics of the finite element mesh. Additionally, the positive discriminating results obtained when analysing a difficult case study suggest that the proposed method could be a very useful tool for comparative studies in finite element analysis using multivariate statistical approaches. PMID:29043107

  15. Control chart pattern recognition using RBF neural network with new training algorithm and practical features.

    PubMed

    Addeh, Abdoljalil; Khormali, Aminollah; Golilarz, Noorbakhsh Amiri

    2018-05-04

    The control chart patterns are the most commonly used statistical process control (SPC) tools to monitor process changes. When a control chart produces an out-of-control signal, this means that the process has been changed. In this study, a new method based on optimized radial basis function neural network (RBFNN) is proposed for control chart patterns (CCPs) recognition. The proposed method consists of four main modules: feature extraction, feature selection, classification and learning algorithm. In the feature extraction module, shape and statistical features are used. Recently, various shape and statistical features have been presented for the CCPs recognition. In the feature selection module, the association rules (AR) method has been employed to select the best set of the shape and statistical features. In the classifier section, RBFNN is used and finally, in RBFNN, learning algorithm has a high impact on the network performance. Therefore, a new learning algorithm based on the bees algorithm has been used in the learning module. Most studies have considered only six patterns: Normal, Cyclic, Increasing Trend, Decreasing Trend, Upward Shift and Downward Shift. Since three patterns namely Normal, Stratification, and Systematic are very similar to each other and distinguishing them is very difficult, in most studies Stratification and Systematic have not been considered. Regarding to the continuous monitoring and control over the production process and the exact type detection of the problem encountered during the production process, eight patterns have been investigated in this study. The proposed method is tested on a dataset containing 1600 samples (200 samples from each pattern) and the results showed that the proposed method has a very good performance. Copyright © 2018 ISA. Published by Elsevier Ltd. All rights reserved.

  16. Evaluation of Fuzzy-Logic Framework for Spatial Statistics Preserving Methods for Estimation of Missing Precipitation Data

    NASA Astrophysics Data System (ADS)

    El Sharif, H.; Teegavarapu, R. S.

    2012-12-01

    Spatial interpolation methods used for estimation of missing precipitation data at a site seldom check for their ability to preserve site and regional statistics. Such statistics are primarily defined by spatial correlations and other site-to-site statistics in a region. Preservation of site and regional statistics represents a means of assessing the validity of missing precipitation estimates at a site. This study evaluates the efficacy of a fuzzy-logic methodology for infilling missing historical daily precipitation data in preserving site and regional statistics. Rain gauge sites in the state of Kentucky, USA, are used as a case study for evaluation of this newly proposed method in comparison to traditional data infilling techniques. Several error and performance measures will be used to evaluate the methods and trade-offs in accuracy of estimation and preservation of site and regional statistics.

  17. Blind identification of image manipulation type using mixed statistical moments

    NASA Astrophysics Data System (ADS)

    Jeong, Bo Gyu; Moon, Yong Ho; Eom, Il Kyu

    2015-01-01

    We present a blind identification of image manipulation types such as blurring, scaling, sharpening, and histogram equalization. Motivated by the fact that image manipulations can change the frequency characteristics of an image, we introduce three types of feature vectors composed of statistical moments. The proposed statistical moments are generated from separated wavelet histograms, the characteristic functions of the wavelet variance, and the characteristic functions of the spatial image. Our method can solve the n-class classification problem. Through experimental simulations, we demonstrate that our proposed method can achieve high performance in manipulation type detection. The average rate of the correctly identified manipulation types is as high as 99.22%, using 10,800 test images and six manipulation types including the authentic image.

  18. [Methods of statistical analysis in differential diagnostics of the degree of brain glioma anaplasia during preoperative stage].

    PubMed

    Glavatskiĭ, A Ia; Guzhovskaia, N V; Lysenko, S N; Kulik, A V

    2005-12-01

    The authors proposed a possible preoperative diagnostics of the degree of supratentorial brain gliom anaplasia using statistical analysis methods. It relies on a complex examination of 934 patients with I-IV degree anaplasias, which had been treated in the Institute of Neurosurgery from 1990 to 2004. The use of statistical analysis methods for differential diagnostics of the degree of brain gliom anaplasia may optimize a diagnostic algorithm, increase reliability of obtained data and in some cases avoid carrying out irrational operative intrusions. Clinically important signs for the use of statistical analysis methods directed to preoperative diagnostics of brain gliom anaplasia have been defined

  19. SDGs and Geospatial Frameworks: Data Integration in the United States

    NASA Astrophysics Data System (ADS)

    Trainor, T.

    2016-12-01

    Responding to the need to monitor a nation's progress towards meeting the Sustainable Development Goals (SDG) outlined in the 2030 U.N. Agenda requires the integration of earth observations with statistical information. The urban agenda proposed in SDG 11 challenges the global community to find a geospatial approach to monitor and measure inclusive, safe, resilient, and sustainable cities and communities. Target 11.7 identifies public safety, accessibility to green and public spaces, and the most vulnerable populations (i.e., women and children, older persons, and persons with disabilities) as the most important priorities of this goal. A challenge for both national statistical organizations and earth observation agencies in addressing SDG 11 is the requirement for detailed statistics at a sufficient spatial resolution to provide the basis for meaningful analysis of the urban population and city environments. Using an example for the city of Pittsburgh, this presentation proposes data and methods to illustrate how earth science and statistical data can be integrated to respond to Target 11.7. Finally, a preliminary series of data initiatives are proposed for extending this method to other global cities.

  20. Rank-based testing of equal survivorship based on cross-sectional survival data with or without prospective follow-up.

    PubMed

    Chan, Kwun Chuen Gary; Qin, Jing

    2015-10-01

    Existing linear rank statistics cannot be applied to cross-sectional survival data without follow-up since all subjects are essentially censored. However, partial survival information are available from backward recurrence times and are frequently collected from health surveys without prospective follow-up. Under length-biased sampling, a class of linear rank statistics is proposed based only on backward recurrence times without any prospective follow-up. When follow-up data are available, the proposed rank statistic and a conventional rank statistic that utilizes follow-up information from the same sample are shown to be asymptotically independent. We discuss four ways to combine these two statistics when follow-up is present. Simulations show that all combined statistics have substantially improved power compared with conventional rank statistics, and a Mantel-Haenszel test performed the best among the proposal statistics. The method is applied to a cross-sectional health survey without follow-up and a study of Alzheimer's disease with prospective follow-up. © The Author 2015. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  1. Water Quality Sensing and Spatio-Temporal Monitoring Structure with Autocorrelation Kernel Methods.

    PubMed

    Vizcaíno, Iván P; Carrera, Enrique V; Muñoz-Romero, Sergio; Cumbal, Luis H; Rojo-Álvarez, José Luis

    2017-10-16

    Pollution on water resources is usually analyzed with monitoring campaigns, which consist of programmed sampling, measurement, and recording of the most representative water quality parameters. These campaign measurements yields a non-uniform spatio-temporal sampled data structure to characterize complex dynamics phenomena. In this work, we propose an enhanced statistical interpolation method to provide water quality managers with statistically interpolated representations of spatial-temporal dynamics. Specifically, our proposal makes efficient use of the a priori available information of the quality parameter measurements through Support Vector Regression (SVR) based on Mercer's kernels. The methods are benchmarked against previously proposed methods in three segments of the Machángara River and one segment of the San Pedro River in Ecuador, and their different dynamics are shown by statistically interpolated spatial-temporal maps. The best interpolation performance in terms of mean absolute error was the SVR with Mercer's kernel given by either the Mahalanobis spatial-temporal covariance matrix or by the bivariate estimated autocorrelation function. In particular, the autocorrelation kernel provides with significant improvement of the estimation quality, consistently for all the six water quality variables, which points out the relevance of including a priori knowledge of the problem.

  2. Water Quality Sensing and Spatio-Temporal Monitoring Structure with Autocorrelation Kernel Methods

    PubMed Central

    Vizcaíno, Iván P.; Muñoz-Romero, Sergio; Cumbal, Luis H.

    2017-01-01

    Pollution on water resources is usually analyzed with monitoring campaigns, which consist of programmed sampling, measurement, and recording of the most representative water quality parameters. These campaign measurements yields a non-uniform spatio-temporal sampled data structure to characterize complex dynamics phenomena. In this work, we propose an enhanced statistical interpolation method to provide water quality managers with statistically interpolated representations of spatial-temporal dynamics. Specifically, our proposal makes efficient use of the a priori available information of the quality parameter measurements through Support Vector Regression (SVR) based on Mercer’s kernels. The methods are benchmarked against previously proposed methods in three segments of the Machángara River and one segment of the San Pedro River in Ecuador, and their different dynamics are shown by statistically interpolated spatial-temporal maps. The best interpolation performance in terms of mean absolute error was the SVR with Mercer’s kernel given by either the Mahalanobis spatial-temporal covariance matrix or by the bivariate estimated autocorrelation function. In particular, the autocorrelation kernel provides with significant improvement of the estimation quality, consistently for all the six water quality variables, which points out the relevance of including a priori knowledge of the problem. PMID:29035333

  3. Leads Detection Using Mixture Statistical Distribution Based CRF Algorithm from Sentinel-1 Dual Polarization SAR Imagery

    NASA Astrophysics Data System (ADS)

    Zhang, Yu; Li, Fei; Zhang, Shengkai; Zhu, Tingting

    2017-04-01

    Synthetic Aperture Radar (SAR) is significantly important for polar remote sensing since it can provide continuous observations in all days and all weather. SAR can be used for extracting the surface roughness information characterized by the variance of dielectric properties and different polarization channels, which make it possible to observe different ice types and surface structure for deformation analysis. In November, 2016, Chinese National Antarctic Research Expedition (CHINARE) 33rd cruise has set sails in sea ice zone in Antarctic. Accurate leads spatial distribution in sea ice zone for routine planning of ship navigation is essential. In this study, the semantic relationship between leads and sea ice categories has been described by the Conditional Random Fields (CRF) model, and leads characteristics have been modeled by statistical distributions in SAR imagery. In the proposed algorithm, a mixture statistical distribution based CRF is developed by considering the contexture information and the statistical characteristics of sea ice for improving leads detection in Sentinel-1A dual polarization SAR imagery. The unary potential and pairwise potential in CRF model is constructed by integrating the posteriori probability estimated from statistical distributions. For mixture statistical distribution parameter estimation, Method of Logarithmic Cumulants (MoLC) is exploited for single statistical distribution parameters estimation. The iteration based Expectation Maximal (EM) algorithm is investigated to calculate the parameters in mixture statistical distribution based CRF model. In the posteriori probability inference, graph-cut energy minimization method is adopted in the initial leads detection. The post-processing procedures including aspect ratio constrain and spatial smoothing approaches are utilized to improve the visual result. The proposed method is validated on Sentinel-1A SAR C-band Extra Wide Swath (EW) Ground Range Detected (GRD) imagery with a pixel spacing of 40 meters near Prydz Bay area, East Antarctica. Main work is listed as follows: 1) A mixture statistical distribution based CRF algorithm has been developed for leads detection from Sentinel-1A dual polarization images. 2) The assessment of the proposed mixture statistical distribution based CRF method and single distribution based CRF algorithm has been presented. 3) The preferable parameters sets including statistical distributions, the aspect ratio threshold and spatial smoothing window size have been provided. In the future, the proposed algorithm will be developed for the operational Sentinel series data sets processing due to its less time consuming cost and high accuracy in leads detection.

  4. Scalable privacy-preserving data sharing methodology for genome-wide association studies.

    PubMed

    Yu, Fei; Fienberg, Stephen E; Slavković, Aleksandra B; Uhler, Caroline

    2014-08-01

    The protection of privacy of individual-level information in genome-wide association study (GWAS) databases has been a major concern of researchers following the publication of "an attack" on GWAS data by Homer et al. (2008). Traditional statistical methods for confidentiality and privacy protection of statistical databases do not scale well to deal with GWAS data, especially in terms of guarantees regarding protection from linkage to external information. The more recent concept of differential privacy, introduced by the cryptographic community, is an approach that provides a rigorous definition of privacy with meaningful privacy guarantees in the presence of arbitrary external information, although the guarantees may come at a serious price in terms of data utility. Building on such notions, Uhler et al. (2013) proposed new methods to release aggregate GWAS data without compromising an individual's privacy. We extend the methods developed in Uhler et al. (2013) for releasing differentially-private χ(2)-statistics by allowing for arbitrary number of cases and controls, and for releasing differentially-private allelic test statistics. We also provide a new interpretation by assuming the controls' data are known, which is a realistic assumption because some GWAS use publicly available data as controls. We assess the performance of the proposed methods through a risk-utility analysis on a real data set consisting of DNA samples collected by the Wellcome Trust Case Control Consortium and compare the methods with the differentially-private release mechanism proposed by Johnson and Shmatikov (2013). Copyright © 2014 Elsevier Inc. All rights reserved.

  5. Feature extraction and classification algorithms for high dimensional data

    NASA Technical Reports Server (NTRS)

    Lee, Chulhee; Landgrebe, David

    1993-01-01

    Feature extraction and classification algorithms for high dimensional data are investigated. Developments with regard to sensors for Earth observation are moving in the direction of providing much higher dimensional multispectral imagery than is now possible. In analyzing such high dimensional data, processing time becomes an important factor. With large increases in dimensionality and the number of classes, processing time will increase significantly. To address this problem, a multistage classification scheme is proposed which reduces the processing time substantially by eliminating unlikely classes from further consideration at each stage. Several truncation criteria are developed and the relationship between thresholds and the error caused by the truncation is investigated. Next an approach to feature extraction for classification is proposed based directly on the decision boundaries. It is shown that all the features needed for classification can be extracted from decision boundaries. A characteristic of the proposed method arises by noting that only a portion of the decision boundary is effective in discriminating between classes, and the concept of the effective decision boundary is introduced. The proposed feature extraction algorithm has several desirable properties: it predicts the minimum number of features necessary to achieve the same classification accuracy as in the original space for a given pattern recognition problem; and it finds the necessary feature vectors. The proposed algorithm does not deteriorate under the circumstances of equal means or equal covariances as some previous algorithms do. In addition, the decision boundary feature extraction algorithm can be used both for parametric and non-parametric classifiers. Finally, some problems encountered in analyzing high dimensional data are studied and possible solutions are proposed. First, the increased importance of the second order statistics in analyzing high dimensional data is recognized. By investigating the characteristics of high dimensional data, the reason why the second order statistics must be taken into account in high dimensional data is suggested. Recognizing the importance of the second order statistics, there is a need to represent the second order statistics. A method to visualize statistics using a color code is proposed. By representing statistics using color coding, one can easily extract and compare the first and the second statistics.

  6. A Third Moment Adjusted Test Statistic for Small Sample Factor Analysis

    PubMed Central

    Lin, Johnny; Bentler, Peter M.

    2012-01-01

    Goodness of fit testing in factor analysis is based on the assumption that the test statistic is asymptotically chi-square; but this property may not hold in small samples even when the factors and errors are normally distributed in the population. Robust methods such as Browne’s asymptotically distribution-free method and Satorra Bentler’s mean scaling statistic were developed under the presumption of non-normality in the factors and errors. This paper finds new application to the case where factors and errors are normally distributed in the population but the skewness of the obtained test statistic is still high due to sampling error in the observed indicators. An extension of Satorra Bentler’s statistic is proposed that not only scales the mean but also adjusts the degrees of freedom based on the skewness of the obtained test statistic in order to improve its robustness under small samples. A simple simulation study shows that this third moment adjusted statistic asymptotically performs on par with previously proposed methods, and at a very small sample size offers superior Type I error rates under a properly specified model. Data from Mardia, Kent and Bibby’s study of students tested for their ability in five content areas that were either open or closed book were used to illustrate the real-world performance of this statistic. PMID:23144511

  7. Numerical solutions for patterns statistics on Markov chains.

    PubMed

    Nuel, Gregory

    2006-01-01

    We propose here a review of the methods available to compute pattern statistics on text generated by a Markov source. Theoretical, but also numerical aspects are detailed for a wide range of techniques (exact, Gaussian, large deviations, binomial and compound Poisson). The SPatt package (Statistics for Pattern, free software available at http://stat.genopole.cnrs.fr/spatt) implementing all these methods is then used to compare all these approaches in terms of computational time and reliability in the most complete pattern statistics benchmark available at the present time.

  8. EFFECTS OF LASER RADIATION ON MATTER. LASER PLASMA: Feasibility of investigation of optical breakdown statistics using multifrequency lasers

    NASA Astrophysics Data System (ADS)

    Ulanov, S. F.

    1990-06-01

    A method proposed for investigating the statistics of bulk optical breakdown relies on multifrequency lasers, which eliminates the influence of the laser radiation intensity statistics. The method is based on preliminary recording of the peak intensity statistics of multifrequency laser radiation pulses at the caustic using the optical breakdown threshold of K8 glass. The probability density distribution function was obtained at the focus for the peak intensities of the radiation pulses of a multifrequency laser. This method may be used to study the self-interaction under conditions of bulk optical breakdown of transparent dielectrics.

  9. Fusion of Local Statistical Parameters for Buried Underwater Mine Detection in Sonar Imaging

    NASA Astrophysics Data System (ADS)

    Maussang, F.; Rombaut, M.; Chanussot, J.; Hétet, A.; Amate, M.

    2008-12-01

    Detection of buried underwater objects, and especially mines, is a current crucial strategic task. Images provided by sonar systems allowing to penetrate in the sea floor, such as the synthetic aperture sonars (SASs), are of great interest for the detection and classification of such objects. However, the signal-to-noise ratio is fairly low and advanced information processing is required for a correct and reliable detection of the echoes generated by the objects. The detection method proposed in this paper is based on a data-fusion architecture using the belief theory. The input data of this architecture are local statistical characteristics extracted from SAS data corresponding to the first-, second-, third-, and fourth-order statistical properties of the sonar images, respectively. The interest of these parameters is derived from a statistical model of the sonar data. Numerical criteria are also proposed to estimate the detection performances and to validate the method.

  10. Empirical comparison study of approximate methods for structure selection in binary graphical models.

    PubMed

    Viallon, Vivian; Banerjee, Onureena; Jougla, Eric; Rey, Grégoire; Coste, Joel

    2014-03-01

    Looking for associations among multiple variables is a topical issue in statistics due to the increasing amount of data encountered in biology, medicine, and many other domains involving statistical applications. Graphical models have recently gained popularity for this purpose in the statistical literature. In the binary case, however, exact inference is generally very slow or even intractable because of the form of the so-called log-partition function. In this paper, we review various approximate methods for structure selection in binary graphical models that have recently been proposed in the literature and compare them through an extensive simulation study. We also propose a modification of one existing method, that is shown to achieve good performance and to be generally very fast. We conclude with an application in which we search for associations among causes of death recorded on French death certificates. © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  11. High-Density Signal Interface Electromagnetic Radiation Prediction for Electromagnetic Compatibility Evaluation.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Halligan, Matthew

    Radiated power calculation approaches for practical scenarios of incomplete high- density interface characterization information and incomplete incident power information are presented. The suggested approaches build upon a method that characterizes power losses through the definition of power loss constant matrices. Potential radiated power estimates include using total power loss information, partial radiated power loss information, worst case analysis, and statistical bounding analysis. A method is also proposed to calculate radiated power when incident power information is not fully known for non-periodic signals at the interface. Incident data signals are modeled from a two-state Markov chain where bit state probabilities aremore » derived. The total spectrum for windowed signals is postulated as the superposition of spectra from individual pulses in a data sequence. Statistical bounding methods are proposed as a basis for the radiated power calculation due to the statistical calculation complexity to find a radiated power probability density function.« less

  12. Frequentist Model Averaging in Structural Equation Modelling.

    PubMed

    Jin, Shaobo; Ankargren, Sebastian

    2018-06-04

    Model selection from a set of candidate models plays an important role in many structural equation modelling applications. However, traditional model selection methods introduce extra randomness that is not accounted for by post-model selection inference. In the current study, we propose a model averaging technique within the frequentist statistical framework. Instead of selecting an optimal model, the contributions of all candidate models are acknowledged. Valid confidence intervals and a [Formula: see text] test statistic are proposed. A simulation study shows that the proposed method is able to produce a robust mean-squared error, a better coverage probability, and a better goodness-of-fit test compared to model selection. It is an interesting compromise between model selection and the full model.

  13. Localized Statistics for DW-MRI Fiber Bundle Segmentation

    PubMed Central

    Lankton, Shawn; Melonakos, John; Malcolm, James; Dambreville, Samuel; Tannenbaum, Allen

    2013-01-01

    We describe a method for segmenting neural fiber bundles in diffusion-weighted magnetic resonance images (DWMRI). As these bundles traverse the brain to connect regions, their local orientation of diffusion changes drastically, hence a constant global model is inaccurate. We propose a method to compute localized statistics on orientation information and use it to drive a variational active contour segmentation that accurately models the non-homogeneous orientation information present along the bundle. Initialized from a single fiber path, the proposed method proceeds to capture the entire bundle. We demonstrate results using the technique to segment the cingulum bundle and describe several extensions making the technique applicable to a wide range of tissues. PMID:23652079

  14. Novel spectrophotometric methods for simultaneous determination of timolol and dorzolamide in their binary mixture.

    PubMed

    Lotfy, Hayam Mahmoud; Hegazy, Maha A; Rezk, Mamdouh R; Omran, Yasmin Rostom

    2014-05-21

    Two smart and novel spectrophotometric methods namely; absorbance subtraction (AS) and amplitude modulation (AM) were developed and validated for the determination of a binary mixture of timolol maleate (TIM) and dorzolamide hydrochloride (DOR) in presence of benzalkonium chloride without prior separation, using unified regression equation. Additionally, simple, specific, accurate and precise spectrophotometric methods manipulating ratio spectra were developed and validated for simultaneous determination of the binary mixture namely; simultaneous ratio subtraction (SRS), ratio difference (RD), ratio subtraction (RS) coupled with extended ratio subtraction (EXRS), constant multiplication method (CM) and mean centering of ratio spectra (MCR). The proposed spectrophotometric procedures do not require any separation steps. Accuracy, precision and linearity ranges of the proposed methods were determined and the specificity was assessed by analyzing synthetic mixtures of both drugs. They were applied to their pharmaceutical formulation and the results obtained were statistically compared to that of a reported spectrophotometric method. The statistical comparison showed that there is no significant difference between the proposed methods and the reported one regarding both accuracy and precision. Copyright © 2014 Elsevier B.V. All rights reserved.

  15. Cocaine Dependence Treatment Data: Methods for Measurement Error Problems With Predictors Derived From Stationary Stochastic Processes

    PubMed Central

    Guan, Yongtao; Li, Yehua; Sinha, Rajita

    2011-01-01

    In a cocaine dependence treatment study, we use linear and nonlinear regression models to model posttreatment cocaine craving scores and first cocaine relapse time. A subset of the covariates are summary statistics derived from baseline daily cocaine use trajectories, such as baseline cocaine use frequency and average daily use amount. These summary statistics are subject to estimation error and can therefore cause biased estimators for the regression coefficients. Unlike classical measurement error problems, the error we encounter here is heteroscedastic with an unknown distribution, and there are no replicates for the error-prone variables or instrumental variables. We propose two robust methods to correct for the bias: a computationally efficient method-of-moments-based method for linear regression models and a subsampling extrapolation method that is generally applicable to both linear and nonlinear regression models. Simulations and an application to the cocaine dependence treatment data are used to illustrate the efficacy of the proposed methods. Asymptotic theory and variance estimation for the proposed subsampling extrapolation method and some additional simulation results are described in the online supplementary material. PMID:21984854

  16. Level set method for image segmentation based on moment competition

    NASA Astrophysics Data System (ADS)

    Min, Hai; Wang, Xiao-Feng; Huang, De-Shuang; Jin, Jing; Wang, Hong-Zhi; Li, Hai

    2015-05-01

    We propose a level set method for image segmentation which introduces the moment competition and weakly supervised information into the energy functional construction. Different from the region-based level set methods which use force competition, the moment competition is adopted to drive the contour evolution. Here, a so-called three-point labeling scheme is proposed to manually label three independent points (weakly supervised information) on the image. Then the intensity differences between the three points and the unlabeled pixels are used to construct the force arms for each image pixel. The corresponding force is generated from the global statistical information of a region-based method and weighted by the force arm. As a result, the moment can be constructed and incorporated into the energy functional to drive the evolving contour to approach the object boundary. In our method, the force arm can take full advantage of the three-point labeling scheme to constrain the moment competition. Additionally, the global statistical information and weakly supervised information are successfully integrated, which makes the proposed method more robust than traditional methods for initial contour placement and parameter setting. Experimental results with performance analysis also show the superiority of the proposed method on segmenting different types of complicated images, such as noisy images, three-phase images, images with intensity inhomogeneity, and texture images.

  17. Online Denoising Based on the Second-Order Adaptive Statistics Model.

    PubMed

    Yi, Sheng-Lun; Jin, Xue-Bo; Su, Ting-Li; Tang, Zhen-Yun; Wang, Fa-Fa; Xiang, Na; Kong, Jian-Lei

    2017-07-20

    Online denoising is motivated by real-time applications in the industrial process, where the data must be utilizable soon after it is collected. Since the noise in practical process is usually colored, it is quite a challenge for denoising techniques. In this paper, a novel online denoising method was proposed to achieve the processing of the practical measurement data with colored noise, and the characteristics of the colored noise were considered in the dynamic model via an adaptive parameter. The proposed method consists of two parts within a closed loop: the first one is to estimate the system state based on the second-order adaptive statistics model and the other is to update the adaptive parameter in the model using the Yule-Walker algorithm. Specifically, the state estimation process was implemented via the Kalman filter in a recursive way, and the online purpose was therefore attained. Experimental data in a reinforced concrete structure test was used to verify the effectiveness of the proposed method. Results show the proposed method not only dealt with the signals with colored noise, but also achieved a tradeoff between efficiency and accuracy.

  18. Change Detection in Rough Time Series

    DTIC Science & Technology

    2014-09-01

    Business Statistics : An Inferential Approach, Dellen: San Francisco. [18] Winston, W. (1997) Operations Research Applications and Algorithms, Duxbury...distribution that can present significant challenges to conventional statistical tracking techniques. To address this problem the proposed method...applies hybrid fuzzy statistical techniques to series granules instead of to individual measures. Three examples demonstrated the robust nature of the

  19. On a Calculus-Based Statistics Course for Life Science Students

    ERIC Educational Resources Information Center

    Watkins, Joseph C.

    2010-01-01

    The choice of pedagogy in statistics should take advantage of the quantitative capabilities and scientific background of the students. In this article, we propose a model for a statistics course that assumes student competency in calculus and a broadening knowledge in biology. We illustrate our methods and practices through examples from the…

  20. Representation of Probability Density Functions from Orbit Determination using the Particle Filter

    NASA Technical Reports Server (NTRS)

    Mashiku, Alinda K.; Garrison, James; Carpenter, J. Russell

    2012-01-01

    Statistical orbit determination enables us to obtain estimates of the state and the statistical information of its region of uncertainty. In order to obtain an accurate representation of the probability density function (PDF) that incorporates higher order statistical information, we propose the use of nonlinear estimation methods such as the Particle Filter. The Particle Filter (PF) is capable of providing a PDF representation of the state estimates whose accuracy is dependent on the number of particles or samples used. For this method to be applicable to real case scenarios, we need a way of accurately representing the PDF in a compressed manner with little information loss. Hence we propose using the Independent Component Analysis (ICA) as a non-Gaussian dimensional reduction method that is capable of maintaining higher order statistical information obtained using the PF. Methods such as the Principal Component Analysis (PCA) are based on utilizing up to second order statistics, hence will not suffice in maintaining maximum information content. Both the PCA and the ICA are applied to two scenarios that involve a highly eccentric orbit with a lower apriori uncertainty covariance and a less eccentric orbit with a higher a priori uncertainty covariance, to illustrate the capability of the ICA in relation to the PCA.

  1. DISSCO: direct imputation of summary statistics allowing covariates

    PubMed Central

    Xu, Zheng; Duan, Qing; Yan, Song; Chen, Wei; Li, Mingyao; Lange, Ethan; Li, Yun

    2015-01-01

    Background: Imputation of individual level genotypes at untyped markers using an external reference panel of genotyped or sequenced individuals has become standard practice in genetic association studies. Direct imputation of summary statistics can also be valuable, for example in meta-analyses where individual level genotype data are not available. Two methods (DIST and ImpG-Summary/LD), that assume a multivariate Gaussian distribution for the association summary statistics, have been proposed for imputing association summary statistics. However, both methods assume that the correlations between association summary statistics are the same as the correlations between the corresponding genotypes. This assumption can be violated in the presence of confounding covariates. Methods: We analytically show that in the absence of covariates, correlation among association summary statistics is indeed the same as that among the corresponding genotypes, thus serving as a theoretical justification for the recently proposed methods. We continue to prove that in the presence of covariates, correlation among association summary statistics becomes the partial correlation of the corresponding genotypes controlling for covariates. We therefore develop direct imputation of summary statistics allowing covariates (DISSCO). Results: We consider two real-life scenarios where the correlation and partial correlation likely make practical difference: (i) association studies in admixed populations; (ii) association studies in presence of other confounding covariate(s). Application of DISSCO to real datasets under both scenarios shows at least comparable, if not better, performance compared with existing correlation-based methods, particularly for lower frequency variants. For example, DISSCO can reduce the absolute deviation from the truth by 3.9–15.2% for variants with minor allele frequency <5%. Availability and implementation: http://www.unc.edu/∼yunmli/DISSCO. Contact: yunli@med.unc.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:25810429

  2. GEE-based SNP set association test for continuous and discrete traits in family-based association studies.

    PubMed

    Wang, Xuefeng; Lee, Seunggeun; Zhu, Xiaofeng; Redline, Susan; Lin, Xihong

    2013-12-01

    Family-based genetic association studies of related individuals provide opportunities to detect genetic variants that complement studies of unrelated individuals. Most statistical methods for family association studies for common variants are single marker based, which test one SNP a time. In this paper, we consider testing the effect of an SNP set, e.g., SNPs in a gene, in family studies, for both continuous and discrete traits. Specifically, we propose a generalized estimating equations (GEEs) based kernel association test, a variance component based testing method, to test for the association between a phenotype and multiple variants in an SNP set jointly using family samples. The proposed approach allows for both continuous and discrete traits, where the correlation among family members is taken into account through the use of an empirical covariance estimator. We derive the theoretical distribution of the proposed statistic under the null and develop analytical methods to calculate the P-values. We also propose an efficient resampling method for correcting for small sample size bias in family studies. The proposed method allows for easily incorporating covariates and SNP-SNP interactions. Simulation studies show that the proposed method properly controls for type I error rates under both random and ascertained sampling schemes in family studies. We demonstrate through simulation studies that our approach has superior performance for association mapping compared to the single marker based minimum P-value GEE test for an SNP-set effect over a range of scenarios. We illustrate the application of the proposed method using data from the Cleveland Family GWAS Study. © 2013 WILEY PERIODICALS, INC.

  3. Paroxysmal atrial fibrillation prediction method with shorter HRV sequences.

    PubMed

    Boon, K H; Khalil-Hani, M; Malarvili, M B; Sia, C W

    2016-10-01

    This paper proposes a method that predicts the onset of paroxysmal atrial fibrillation (PAF), using heart rate variability (HRV) segments that are shorter than those applied in existing methods, while maintaining good prediction accuracy. PAF is a common cardiac arrhythmia that increases the health risk of a patient, and the development of an accurate predictor of the onset of PAF is clinical important because it increases the possibility to stabilize (electrically) and prevent the onset of atrial arrhythmias with different pacing techniques. We investigate the effect of HRV features extracted from different lengths of HRV segments prior to PAF onset with the proposed PAF prediction method. The pre-processing stage of the predictor includes QRS detection, HRV quantification and ectopic beat correction. Time-domain, frequency-domain, non-linear and bispectrum features are then extracted from the quantified HRV. In the feature selection, the HRV feature set and classifier parameters are optimized simultaneously using an optimization procedure based on genetic algorithm (GA). Both full feature set and statistically significant feature subset are optimized by GA respectively. For the statistically significant feature subset, Mann-Whitney U test is used to filter non-statistical significance features that cannot pass the statistical test at 20% significant level. The final stage of our predictor is the classifier that is based on support vector machine (SVM). A 10-fold cross-validation is applied in performance evaluation, and the proposed method achieves 79.3% prediction accuracy using 15-minutes HRV segment. This accuracy is comparable to that achieved by existing methods that use 30-minutes HRV segments, most of which achieves accuracy of around 80%. More importantly, our method significantly outperforms those that applied segments shorter than 30 minutes. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.

  4. Explicit optimization of plan quality measures in intensity-modulated radiation therapy treatment planning.

    PubMed

    Engberg, Lovisa; Forsgren, Anders; Eriksson, Kjell; Hårdemark, Björn

    2017-06-01

    To formulate convex planning objectives of treatment plan multicriteria optimization with explicit relationships to the dose-volume histogram (DVH) statistics used in plan quality evaluation. Conventional planning objectives are designed to minimize the violation of DVH statistics thresholds using penalty functions. Although successful in guiding the DVH curve towards these thresholds, conventional planning objectives offer limited control of the individual points on the DVH curve (doses-at-volume) used to evaluate plan quality. In this study, we abandon the usual penalty-function framework and propose planning objectives that more closely relate to DVH statistics. The proposed planning objectives are based on mean-tail-dose, resulting in convex optimization. We also demonstrate how to adapt a standard optimization method to the proposed formulation in order to obtain a substantial reduction in computational cost. We investigated the potential of the proposed planning objectives as tools for optimizing DVH statistics through juxtaposition with the conventional planning objectives on two patient cases. Sets of treatment plans with differently balanced planning objectives were generated using either the proposed or the conventional approach. Dominance in the sense of better distributed doses-at-volume was observed in plans optimized within the proposed framework. The initial computational study indicates that the DVH statistics are better optimized and more efficiently balanced using the proposed planning objectives than using the conventional approach. © 2017 American Association of Physicists in Medicine.

  5. Ship detection using STFT sea background statistical modeling for large-scale oceansat remote sensing image

    NASA Astrophysics Data System (ADS)

    Wang, Lixia; Pei, Jihong; Xie, Weixin; Liu, Jinyuan

    2018-03-01

    Large-scale oceansat remote sensing images cover a big area sea surface, which fluctuation can be considered as a non-stationary process. Short-Time Fourier Transform (STFT) is a suitable analysis tool for the time varying nonstationary signal. In this paper, a novel ship detection method using 2-D STFT sea background statistical modeling for large-scale oceansat remote sensing images is proposed. First, the paper divides the large-scale oceansat remote sensing image into small sub-blocks, and 2-D STFT is applied to each sub-block individually. Second, the 2-D STFT spectrum of sub-blocks is studied and the obvious different characteristic between sea background and non-sea background is found. Finally, the statistical model for all valid frequency points in the STFT spectrum of sea background is given, and the ship detection method based on the 2-D STFT spectrum modeling is proposed. The experimental result shows that the proposed algorithm can detect ship targets with high recall rate and low missing rate.

  6. Comparing and combining biomarkers as principle surrogates for time-to-event clinical endpoints.

    PubMed

    Gabriel, Erin E; Sachs, Michael C; Gilbert, Peter B

    2015-02-10

    Principal surrogate endpoints are useful as targets for phase I and II trials. In many recent trials, multiple post-randomization biomarkers are measured. However, few statistical methods exist for comparison of or combination of biomarkers as principal surrogates, and none of these methods to our knowledge utilize time-to-event clinical endpoint information. We propose a Weibull model extension of the semi-parametric estimated maximum likelihood method that allows for the inclusion of multiple biomarkers in the same risk model as multivariate candidate principal surrogates. We propose several methods for comparing candidate principal surrogates and evaluating multivariate principal surrogates. These include the time-dependent and surrogate-dependent true and false positive fraction, the time-dependent and the integrated standardized total gain, and the cumulative distribution function of the risk difference. We illustrate the operating characteristics of our proposed methods in simulations and outline how these statistics can be used to evaluate and compare candidate principal surrogates. We use these methods to investigate candidate surrogates in the Diabetes Control and Complications Trial. Copyright © 2014 John Wiley & Sons, Ltd.

  7. Differential gene expression detection and sample classification using penalized linear regression models.

    PubMed

    Wu, Baolin

    2006-02-15

    Differential gene expression detection and sample classification using microarray data have received much research interest recently. Owing to the large number of genes p and small number of samples n (p > n), microarray data analysis poses big challenges for statistical analysis. An obvious problem owing to the 'large p small n' is over-fitting. Just by chance, we are likely to find some non-differentially expressed genes that can classify the samples very well. The idea of shrinkage is to regularize the model parameters to reduce the effects of noise and produce reliable inferences. Shrinkage has been successfully applied in the microarray data analysis. The SAM statistics proposed by Tusher et al. and the 'nearest shrunken centroid' proposed by Tibshirani et al. are ad hoc shrinkage methods. Both methods are simple, intuitive and prove to be useful in empirical studies. Recently Wu proposed the penalized t/F-statistics with shrinkage by formally using the (1) penalized linear regression models for two-class microarray data, showing good performance. In this paper we systematically discussed the use of penalized regression models for analyzing microarray data. We generalize the two-class penalized t/F-statistics proposed by Wu to multi-class microarray data. We formally derive the ad hoc shrunken centroid used by Tibshirani et al. using the (1) penalized regression models. And we show that the penalized linear regression models provide a rigorous and unified statistical framework for sample classification and differential gene expression detection.

  8. An Adaptive Association Test for Multiple Phenotypes with GWAS Summary Statistics.

    PubMed

    Kim, Junghi; Bai, Yun; Pan, Wei

    2015-12-01

    We study the problem of testing for single marker-multiple phenotype associations based on genome-wide association study (GWAS) summary statistics without access to individual-level genotype and phenotype data. For most published GWASs, because obtaining summary data is substantially easier than accessing individual-level phenotype and genotype data, while often multiple correlated traits have been collected, the problem studied here has become increasingly important. We propose a powerful adaptive test and compare its performance with some existing tests. We illustrate its applications to analyses of a meta-analyzed GWAS dataset with three blood lipid traits and another with sex-stratified anthropometric traits, and further demonstrate its potential power gain over some existing methods through realistic simulation studies. We start from the situation with only one set of (possibly meta-analyzed) genome-wide summary statistics, then extend the method to meta-analysis of multiple sets of genome-wide summary statistics, each from one GWAS. We expect the proposed test to be useful in practice as more powerful than or complementary to existing methods. © 2015 WILEY PERIODICALS, INC.

  9. A statistical method (cross-validation) for bone loss region detection after spaceflight

    PubMed Central

    Zhao, Qian; Li, Wenjun; Li, Caixia; Chu, Philip W.; Kornak, John; Lang, Thomas F.

    2010-01-01

    Astronauts experience bone loss after the long spaceflight missions. Identifying specific regions that undergo the greatest losses (e.g. the proximal femur) could reveal information about the processes of bone loss in disuse and disease. Methods for detecting such regions, however, remains an open problem. This paper focuses on statistical methods to detect such regions. We perform statistical parametric mapping to get t-maps of changes in images, and propose a new cross-validation method to select an optimum suprathreshold for forming clusters of pixels. Once these candidate clusters are formed, we use permutation testing of longitudinal labels to derive significant changes. PMID:20632144

  10. Categorical data processing for real estate objects valuation using statistical analysis

    NASA Astrophysics Data System (ADS)

    Parygin, D. S.; Malikov, V. P.; Golubev, A. V.; Sadovnikova, N. P.; Petrova, T. M.; Finogeev, A. G.

    2018-05-01

    Theoretical and practical approaches to the use of statistical methods for studying various properties of infrastructure objects are analyzed in the paper. Methods of forecasting the value of objects are considered. A method for coding categorical variables describing properties of real estate objects is proposed. The analysis of the results of modeling the price of real estate objects using regression analysis and an algorithm based on a comparative approach is carried out.

  11. Ice Water Classification Using Statistical Distribution Based Conditional Random Fields in RADARSAT-2 Dual Polarization Imagery

    NASA Astrophysics Data System (ADS)

    Zhang, Y.; Li, F.; Zhang, S.; Hao, W.; Zhu, T.; Yuan, L.; Xiao, F.

    2017-09-01

    In this paper, Statistical Distribution based Conditional Random Fields (STA-CRF) algorithm is exploited for improving marginal ice-water classification. Pixel level ice concentration is presented as the comparison of methods based on CRF. Furthermore, in order to explore the effective statistical distribution model to be integrated into STA-CRF, five statistical distribution models are investigated. The STA-CRF methods are tested on 2 scenes around Prydz Bay and Adélie Depression, where contain a variety of ice types during melt season. Experimental results indicate that the proposed method can resolve sea ice edge well in Marginal Ice Zone (MIZ) and show a robust distinction of ice and water.

  12. On a Calculus-based Statistics Course for Life Science Students

    PubMed Central

    2010-01-01

    The choice of pedagogy in statistics should take advantage of the quantitative capabilities and scientific background of the students. In this article, we propose a model for a statistics course that assumes student competency in calculus and a broadening knowledge in biology. We illustrate our methods and practices through examples from the curriculum. PMID:20810962

  13. Defect detection of castings in radiography images using a robust statistical feature.

    PubMed

    Zhao, Xinyue; He, Zaixing; Zhang, Shuyou

    2014-01-01

    One of the most commonly used optical methods for defect detection is radiographic inspection. Compared with methods that extract defects directly from the radiography image, model-based methods deal with the case of an object with complex structure well. However, detection of small low-contrast defects in nonuniformly illuminated images is still a major challenge for them. In this paper, we present a new method based on the grayscale arranging pairs (GAP) feature to detect casting defects in radiography images automatically. First, a model is built using pixel pairs with a stable intensity relationship based on the GAP feature from previously acquired images. Second, defects can be extracted by comparing the difference of intensity-difference signs between the input image and the model statistically. The robustness of the proposed method to noise and illumination variations has been verified on casting radioscopic images with defects. The experimental results showed that the average computation time of the proposed method in the testing stage is 28 ms per image on a computer with a Pentium Core 2 Duo 3.00 GHz processor. For the comparison, we also evaluated the performance of the proposed method as well as that of the mixture-of-Gaussian-based and crossing line profile methods. The proposed method achieved 2.7% and 2.0% false negative rates in the noise and illumination variation experiments, respectively.

  14. Assay of potency of the proposed Fifth International Standard for Gas-Gangrene Antitoxin (Perfringens)

    PubMed Central

    Prigge, R.; Micke, H.; Krüger, J.

    1963-01-01

    As part of a collaborative assay of the proposed Fifth International Standard for Gas-Gangrene Antitoxin (Perfringens), five ampoules of the proposed replacement material were assayed in the authors' laboratory against the then current Fourth International Standard. Both in vitro and in vivo methods were used. This paper presents the results and their statistical analysis. The two methods yielded different results which were not likely to have been due to chance, but exact statistical comparison is not possible. It is thought, however, that the differences may be due, at least in part, to differences in the relative proportions of zeta-antitoxin and alpha-antitoxin in the Fourth and Fifth International Standards and the consequent different reactions with the test toxin that was used for titration. PMID:14107746

  15. Gaussian process regression for sensor networks under localization uncertainty

    USGS Publications Warehouse

    Jadaliha, M.; Xu, Yunfei; Choi, Jongeun; Johnson, N.S.; Li, Weiming

    2013-01-01

    In this paper, we formulate Gaussian process regression with observations under the localization uncertainty due to the resource-constrained sensor networks. In our formulation, effects of observations, measurement noise, localization uncertainty, and prior distributions are all correctly incorporated in the posterior predictive statistics. The analytically intractable posterior predictive statistics are proposed to be approximated by two techniques, viz., Monte Carlo sampling and Laplace's method. Such approximation techniques have been carefully tailored to our problems and their approximation error and complexity are analyzed. Simulation study demonstrates that the proposed approaches perform much better than approaches without considering the localization uncertainty properly. Finally, we have applied the proposed approaches on the experimentally collected real data from a dye concentration field over a section of a river and a temperature field of an outdoor swimming pool to provide proof of concept tests and evaluate the proposed schemes in real situations. In both simulation and experimental results, the proposed methods outperform the quick-and-dirty solutions often used in practice.

  16. Bayesian models based on test statistics for multiple hypothesis testing problems.

    PubMed

    Ji, Yuan; Lu, Yiling; Mills, Gordon B

    2008-04-01

    We propose a Bayesian method for the problem of multiple hypothesis testing that is routinely encountered in bioinformatics research, such as the differential gene expression analysis. Our algorithm is based on modeling the distributions of test statistics under both null and alternative hypotheses. We substantially reduce the complexity of the process of defining posterior model probabilities by modeling the test statistics directly instead of modeling the full data. Computationally, we apply a Bayesian FDR approach to control the number of rejections of null hypotheses. To check if our model assumptions for the test statistics are valid for various bioinformatics experiments, we also propose a simple graphical model-assessment tool. Using extensive simulations, we demonstrate the performance of our models and the utility of the model-assessment tool. In the end, we apply the proposed methodology to an siRNA screening and a gene expression experiment.

  17. An Efficient and Reliable Statistical Method for Estimating Functional Connectivity in Large Scale Brain Networks Using Partial Correlation

    PubMed Central

    Wang, Yikai; Kang, Jian; Kemmer, Phebe B.; Guo, Ying

    2016-01-01

    Currently, network-oriented analysis of fMRI data has become an important tool for understanding brain organization and brain networks. Among the range of network modeling methods, partial correlation has shown great promises in accurately detecting true brain network connections. However, the application of partial correlation in investigating brain connectivity, especially in large-scale brain networks, has been limited so far due to the technical challenges in its estimation. In this paper, we propose an efficient and reliable statistical method for estimating partial correlation in large-scale brain network modeling. Our method derives partial correlation based on the precision matrix estimated via Constrained L1-minimization Approach (CLIME), which is a recently developed statistical method that is more efficient and demonstrates better performance than the existing methods. To help select an appropriate tuning parameter for sparsity control in the network estimation, we propose a new Dens-based selection method that provides a more informative and flexible tool to allow the users to select the tuning parameter based on the desired sparsity level. Another appealing feature of the Dens-based method is that it is much faster than the existing methods, which provides an important advantage in neuroimaging applications. Simulation studies show that the Dens-based method demonstrates comparable or better performance with respect to the existing methods in network estimation. We applied the proposed partial correlation method to investigate resting state functional connectivity using rs-fMRI data from the Philadelphia Neurodevelopmental Cohort (PNC) study. Our results show that partial correlation analysis removed considerable between-module marginal connections identified by full correlation analysis, suggesting these connections were likely caused by global effects or common connection to other nodes. Based on partial correlation, we find that the most significant direct connections are between homologous brain locations in the left and right hemisphere. When comparing partial correlation derived under different sparse tuning parameters, an important finding is that the sparse regularization has more shrinkage effects on negative functional connections than on positive connections, which supports previous findings that many of the negative brain connections are due to non-neurophysiological effects. An R package “DensParcorr” can be downloaded from CRAN for implementing the proposed statistical methods. PMID:27242395

  18. An Efficient and Reliable Statistical Method for Estimating Functional Connectivity in Large Scale Brain Networks Using Partial Correlation.

    PubMed

    Wang, Yikai; Kang, Jian; Kemmer, Phebe B; Guo, Ying

    2016-01-01

    Currently, network-oriented analysis of fMRI data has become an important tool for understanding brain organization and brain networks. Among the range of network modeling methods, partial correlation has shown great promises in accurately detecting true brain network connections. However, the application of partial correlation in investigating brain connectivity, especially in large-scale brain networks, has been limited so far due to the technical challenges in its estimation. In this paper, we propose an efficient and reliable statistical method for estimating partial correlation in large-scale brain network modeling. Our method derives partial correlation based on the precision matrix estimated via Constrained L1-minimization Approach (CLIME), which is a recently developed statistical method that is more efficient and demonstrates better performance than the existing methods. To help select an appropriate tuning parameter for sparsity control in the network estimation, we propose a new Dens-based selection method that provides a more informative and flexible tool to allow the users to select the tuning parameter based on the desired sparsity level. Another appealing feature of the Dens-based method is that it is much faster than the existing methods, which provides an important advantage in neuroimaging applications. Simulation studies show that the Dens-based method demonstrates comparable or better performance with respect to the existing methods in network estimation. We applied the proposed partial correlation method to investigate resting state functional connectivity using rs-fMRI data from the Philadelphia Neurodevelopmental Cohort (PNC) study. Our results show that partial correlation analysis removed considerable between-module marginal connections identified by full correlation analysis, suggesting these connections were likely caused by global effects or common connection to other nodes. Based on partial correlation, we find that the most significant direct connections are between homologous brain locations in the left and right hemisphere. When comparing partial correlation derived under different sparse tuning parameters, an important finding is that the sparse regularization has more shrinkage effects on negative functional connections than on positive connections, which supports previous findings that many of the negative brain connections are due to non-neurophysiological effects. An R package "DensParcorr" can be downloaded from CRAN for implementing the proposed statistical methods.

  19. A New Method for Automated Identification and Morphometry of Myelinated Fibers Through Light Microscopy Image Analysis.

    PubMed

    Novas, Romulo Bourget; Fazan, Valeria Paula Sassoli; Felipe, Joaquim Cezar

    2016-02-01

    Nerve morphometry is known to produce relevant information for the evaluation of several phenomena, such as nerve repair, regeneration, implant, transplant, aging, and different human neuropathies. Manual morphometry is laborious, tedious, time consuming, and subject to many sources of error. Therefore, in this paper, we propose a new method for the automated morphometry of myelinated fibers in cross-section light microscopy images. Images from the recurrent laryngeal nerve of adult rats and the vestibulocochlear nerve of adult guinea pigs were used herein. The proposed pipeline for fiber segmentation is based on the techniques of competitive clustering and concavity analysis. The evaluation of the proposed method for segmentation of images was done by comparing the automatic segmentation with the manual segmentation. To further evaluate the proposed method considering morphometric features extracted from the segmented images, the distributions of these features were tested for statistical significant difference. The method achieved a high overall sensitivity and very low false-positive rates per image. We detect no statistical difference between the distribution of the features extracted from the manual and the pipeline segmentations. The method presented a good overall performance, showing widespread potential in experimental and clinical settings allowing large-scale image analysis and, thus, leading to more reliable results.

  20. Detecting associated single-nucleotide polymorphisms on the X chromosome in case control genome-wide association studies.

    PubMed

    Chen, Zhongxue; Ng, Hon Keung Tony; Li, Jing; Liu, Qingzhong; Huang, Hanwen

    2017-04-01

    In the past decade, hundreds of genome-wide association studies have been conducted to detect the significant single-nucleotide polymorphisms that are associated with certain diseases. However, most of the data from the X chromosome were not analyzed and only a few significant associated single-nucleotide polymorphisms from the X chromosome have been identified from genome-wide association studies. This is mainly due to the lack of powerful statistical tests. In this paper, we propose a novel statistical approach that combines the information of single-nucleotide polymorphisms on the X chromosome from both males and females in an efficient way. The proposed approach avoids the need of making strong assumptions about the underlying genetic models. Our proposed statistical test is a robust method that only makes the assumption that the risk allele is the same for both females and males if the single-nucleotide polymorphism is associated with the disease for both genders. Through simulation study and a real data application, we show that the proposed procedure is robust and have excellent performance compared to existing methods. We expect that many more associated single-nucleotide polymorphisms on the X chromosome will be identified if the proposed approach is applied to current available genome-wide association studies data.

  1. Gene- and pathway-based association tests for multiple traits with GWAS summary statistics.

    PubMed

    Kwak, Il-Youp; Pan, Wei

    2017-01-01

    To identify novel genetic variants associated with complex traits and to shed new insights on underlying biology, in addition to the most popular single SNP-single trait association analysis, it would be useful to explore multiple correlated (intermediate) traits at the gene- or pathway-level by mining existing single GWAS or meta-analyzed GWAS data. For this purpose, we present an adaptive gene-based test and a pathway-based test for association analysis of multiple traits with GWAS summary statistics. The proposed tests are adaptive at both the SNP- and trait-levels; that is, they account for possibly varying association patterns (e.g. signal sparsity levels) across SNPs and traits, thus maintaining high power across a wide range of situations. Furthermore, the proposed methods are general: they can be applied to mixed types of traits, and to Z-statistics or P-values as summary statistics obtained from either a single GWAS or a meta-analysis of multiple GWAS. Our numerical studies with simulated and real data demonstrated the promising performance of the proposed methods. The methods are implemented in R package aSPU, freely and publicly available at: https://cran.r-project.org/web/packages/aSPU/ CONTACT: weip@biostat.umn.eduSupplementary information: Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  2. The forecasting of menstruation based on a state-space modeling of basal body temperature time series.

    PubMed

    Fukaya, Keiichi; Kawamori, Ai; Osada, Yutaka; Kitazawa, Masumi; Ishiguro, Makio

    2017-09-20

    Women's basal body temperature (BBT) shows a periodic pattern that associates with menstrual cycle. Although this fact suggests a possibility that daily BBT time series can be useful for estimating the underlying phase state as well as for predicting the length of current menstrual cycle, little attention has been paid to model BBT time series. In this study, we propose a state-space model that involves the menstrual phase as a latent state variable to explain the daily fluctuation of BBT and the menstruation cycle length. Conditional distributions of the phase are obtained by using sequential Bayesian filtering techniques. A predictive distribution of the next menstruation day can be derived based on this conditional distribution and the model, leading to a novel statistical framework that provides a sequentially updated prediction for upcoming menstruation day. We applied this framework to a real data set of women's BBT and menstruation days and compared prediction accuracy of the proposed method with that of previous methods, showing that the proposed method generally provides a better prediction. Because BBT can be obtained with relatively small cost and effort, the proposed method can be useful for women's health management. Potential extensions of this framework as the basis of modeling and predicting events that are associated with the menstrual cycles are discussed. © 2017 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd. © 2017 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd.

  3. Easy and accurate variance estimation of the nonparametric estimator of the partial area under the ROC curve and its application.

    PubMed

    Yu, Jihnhee; Yang, Luge; Vexler, Albert; Hutson, Alan D

    2016-06-15

    The receiver operating characteristic (ROC) curve is a popular technique with applications, for example, investigating an accuracy of a biomarker to delineate between disease and non-disease groups. A common measure of accuracy of a given diagnostic marker is the area under the ROC curve (AUC). In contrast with the AUC, the partial area under the ROC curve (pAUC) looks into the area with certain specificities (i.e., true negative rate) only, and it can be often clinically more relevant than examining the entire ROC curve. The pAUC is commonly estimated based on a U-statistic with the plug-in sample quantile, making the estimator a non-traditional U-statistic. In this article, we propose an accurate and easy method to obtain the variance of the nonparametric pAUC estimator. The proposed method is easy to implement for both one biomarker test and the comparison of two correlated biomarkers because it simply adapts the existing variance estimator of U-statistics. In this article, we show accuracy and other advantages of the proposed variance estimation method by broadly comparing it with previously existing methods. Further, we develop an empirical likelihood inference method based on the proposed variance estimator through a simple implementation. In an application, we demonstrate that, depending on the inferences by either the AUC or pAUC, we can make a different decision on a prognostic ability of a same set of biomarkers. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.

  4. A Novel Bit-level Image Encryption Method Based on Chaotic Map and Dynamic Grouping

    NASA Astrophysics Data System (ADS)

    Zhang, Guo-Ji; Shen, Yan

    2012-10-01

    In this paper, a novel bit-level image encryption method based on dynamic grouping is proposed. In the proposed method, the plain-image is divided into several groups randomly, then permutation-diffusion process on bit level is carried out. The keystream generated by logistic map is related to the plain-image, which confuses the relationship between the plain-image and the cipher-image. The computer simulation results of statistical analysis, information entropy analysis and sensitivity analysis show that the proposed encryption method is secure and reliable enough to be used for communication application.

  5. A log-Weibull spatial scan statistic for time to event data.

    PubMed

    Usman, Iram; Rosychuk, Rhonda J

    2018-06-13

    Spatial scan statistics have been used for the identification of geographic clusters of elevated numbers of cases of a condition such as disease outbreaks. These statistics accompanied by the appropriate distribution can also identify geographic areas with either longer or shorter time to events. Other authors have proposed the spatial scan statistics based on the exponential and Weibull distributions. We propose the log-Weibull as an alternative distribution for the spatial scan statistic for time to events data and compare and contrast the log-Weibull and Weibull distributions through simulation studies. The effect of type I differential censoring and power have been investigated through simulated data. Methods are also illustrated on time to specialist visit data for discharged patients presenting to emergency departments for atrial fibrillation and flutter in Alberta during 2010-2011. We found northern regions of Alberta had longer times to specialist visit than other areas. We proposed the spatial scan statistic for the log-Weibull distribution as a new approach for detecting spatial clusters for time to event data. The simulation studies suggest that the test performs well for log-Weibull data.

  6. A new statistical method for design and analyses of component tolerance

    NASA Astrophysics Data System (ADS)

    Movahedi, Mohammad Mehdi; Khounsiavash, Mohsen; Otadi, Mahmood; Mosleh, Maryam

    2017-03-01

    Tolerancing conducted by design engineers to meet customers' needs is a prerequisite for producing high-quality products. Engineers use handbooks to conduct tolerancing. While use of statistical methods for tolerancing is not something new, engineers often use known distributions, including the normal distribution. Yet, if the statistical distribution of the given variable is unknown, a new statistical method will be employed to design tolerance. In this paper, we use generalized lambda distribution for design and analyses component tolerance. We use percentile method (PM) to estimate the distribution parameters. The findings indicated that, when the distribution of the component data is unknown, the proposed method can be used to expedite the design of component tolerance. Moreover, in the case of assembled sets, more extensive tolerance for each component with the same target performance can be utilized.

  7. Color edges extraction using statistical features and automatic threshold technique: application to the breast cancer cells.

    PubMed

    Ben Chaabane, Salim; Fnaiech, Farhat

    2014-01-23

    Color image segmentation has been so far applied in many areas; hence, recently many different techniques have been developed and proposed. In the medical imaging area, the image segmentation may be helpful to provide assistance to doctor in order to follow-up the disease of a certain patient from the breast cancer processed images. The main objective of this work is to rebuild and also to enhance each cell from the three component images provided by an input image. Indeed, from an initial segmentation obtained using the statistical features and histogram threshold techniques, the resulting segmentation may represent accurately the non complete and pasted cells and enhance them. This allows real help to doctors, and consequently, these cells become clear and easy to be counted. A novel method for color edges extraction based on statistical features and automatic threshold is presented. The traditional edge detector, based on the first and the second order neighborhood, describing the relationship between the current pixel and its neighbors, is extended to the statistical domain. Hence, color edges in an image are obtained by combining the statistical features and the automatic threshold techniques. Finally, on the obtained color edges with specific primitive color, a combination rule is used to integrate the edge results over the three color components. Breast cancer cell images were used to evaluate the performance of the proposed method both quantitatively and qualitatively. Hence, a visual and a numerical assessment based on the probability of correct classification (PC), the false classification (Pf), and the classification accuracy (Sens(%)) are presented and compared with existing techniques. The proposed method shows its superiority in the detection of points which really belong to the cells, and also the facility of counting the number of the processed cells. Computer simulations highlight that the proposed method substantially enhances the segmented image with smaller error rates better than other existing algorithms under the same settings (patterns and parameters). Moreover, it provides high classification accuracy, reaching the rate of 97.94%. Additionally, the segmentation method may be extended to other medical imaging types having similar properties.

  8. Adaptive interference cancel filter for evoked potential using high-order cumulants.

    PubMed

    Lin, Bor-Shyh; Lin, Bor-Shing; Chong, Fok-Ching; Lai, Feipei

    2004-01-01

    This paper is to present evoked potential (EP) processing using adaptive interference cancel (AIC) filter with second and high order cumulants. In conventional ensemble averaging method, people have to conduct repetitively experiments to record the required data. Recently, the use of AIC structure with second statistics in processing EP has proved more efficiency than traditional averaging method, but it is sensitive to both of the reference signal statistics and the choice of step size. Thus, we proposed higher order statistics-based AIC method to improve these disadvantages. This study was experimented in somatosensory EP corrupted with EEG. Gradient type algorithm is used in AIC method. Comparisons with AIC filter on second, third, fourth order statistics are also presented in this paper. We observed that AIC filter with third order statistics has better convergent performance for EP processing and is not sensitive to the selection of step size and reference input.

  9. DISSCO: direct imputation of summary statistics allowing covariates.

    PubMed

    Xu, Zheng; Duan, Qing; Yan, Song; Chen, Wei; Li, Mingyao; Lange, Ethan; Li, Yun

    2015-08-01

    Imputation of individual level genotypes at untyped markers using an external reference panel of genotyped or sequenced individuals has become standard practice in genetic association studies. Direct imputation of summary statistics can also be valuable, for example in meta-analyses where individual level genotype data are not available. Two methods (DIST and ImpG-Summary/LD), that assume a multivariate Gaussian distribution for the association summary statistics, have been proposed for imputing association summary statistics. However, both methods assume that the correlations between association summary statistics are the same as the correlations between the corresponding genotypes. This assumption can be violated in the presence of confounding covariates. We analytically show that in the absence of covariates, correlation among association summary statistics is indeed the same as that among the corresponding genotypes, thus serving as a theoretical justification for the recently proposed methods. We continue to prove that in the presence of covariates, correlation among association summary statistics becomes the partial correlation of the corresponding genotypes controlling for covariates. We therefore develop direct imputation of summary statistics allowing covariates (DISSCO). We consider two real-life scenarios where the correlation and partial correlation likely make practical difference: (i) association studies in admixed populations; (ii) association studies in presence of other confounding covariate(s). Application of DISSCO to real datasets under both scenarios shows at least comparable, if not better, performance compared with existing correlation-based methods, particularly for lower frequency variants. For example, DISSCO can reduce the absolute deviation from the truth by 3.9-15.2% for variants with minor allele frequency <5%. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  10. A variational approach to liver segmentation using statistics from multiple sources

    NASA Astrophysics Data System (ADS)

    Zheng, Shenhai; Fang, Bin; Li, Laquan; Gao, Mingqi; Wang, Yi

    2018-01-01

    Medical image segmentation plays an important role in digital medical research, and therapy planning and delivery. However, the presence of noise and low contrast renders automatic liver segmentation an extremely challenging task. In this study, we focus on a variational approach to liver segmentation in computed tomography scan volumes in a semiautomatic and slice-by-slice manner. In this method, one slice is selected and its connected component liver region is determined manually to initialize the subsequent automatic segmentation process. From this guiding slice, we execute the proposed method downward to the last one and upward to the first one, respectively. A segmentation energy function is proposed by combining the statistical shape prior, global Gaussian intensity analysis, and enforced local statistical feature under the level set framework. During segmentation, the shape of the liver shape is estimated by minimization of this function. The improved Chan-Vese model is used to refine the shape to capture the long and narrow regions of the liver. The proposed method was verified on two independent public databases, the 3D-IRCADb and the SLIVER07. Among all the tested methods, our method yielded the best volumetric overlap error (VOE) of 6.5 +/- 2.8 % , the best root mean square symmetric surface distance (RMSD) of 2.1 +/- 0.8 mm, the best maximum symmetric surface distance (MSD) of 18.9 +/- 8.3 mm in 3D-IRCADb dataset, and the best average symmetric surface distance (ASD) of 0.8 +/- 0.5 mm, the best RMSD of 1.5 +/- 1.1 mm in SLIVER07 dataset, respectively. The results of the quantitative comparison show that the proposed liver segmentation method achieves competitive segmentation performance with state-of-the-art techniques.

  11. First arrival time picking for microseismic data based on DWSW algorithm

    NASA Astrophysics Data System (ADS)

    Li, Yue; Wang, Yue; Lin, Hongbo; Zhong, Tie

    2018-03-01

    The first arrival time picking is a crucial step in microseismic data processing. When the signal-to-noise ratio (SNR) is low, however, it is difficult to get the first arrival time accurately with traditional methods. In this paper, we propose the double-sliding-window SW (DWSW) method based on the Shapiro-Wilk (SW) test. The DWSW method is used to detect the first arrival time by making full use of the differences between background noise and effective signals in the statistical properties. Specifically speaking, we obtain the moment corresponding to the maximum as the first arrival time of microseismic data when the statistic of our method reaches its maximum. Hence, in our method, there is no need to select the threshold, which makes the algorithm more facile when the SNR of microseismic data is low. To verify the reliability of the proposed method, a series of experiments is performed on both synthetic and field microseismic data. Our method is compared with the traditional short-time and long-time average (STA/LTA) method, the Akaike information criterion, and the kurtosis method. Analysis results indicate that the accuracy rate of the proposed method is superior to that of the other three methods when the SNR is as low as - 10 dB.

  12. K2 and K2*: efficient alignment-free sequence similarity measurement based on Kendall statistics.

    PubMed

    Lin, Jie; Adjeroh, Donald A; Jiang, Bing-Hua; Jiang, Yue

    2018-05-15

    Alignment-free sequence comparison methods can compute the pairwise similarity between a huge number of sequences much faster than sequence-alignment based methods. We propose a new non-parametric alignment-free sequence comparison method, called K2, based on the Kendall statistics. Comparing to the other state-of-the-art alignment-free comparison methods, K2 demonstrates competitive performance in generating the phylogenetic tree, in evaluating functionally related regulatory sequences, and in computing the edit distance (similarity/dissimilarity) between sequences. Furthermore, the K2 approach is much faster than the other methods. An improved method, K2*, is also proposed, which is able to determine the appropriate algorithmic parameter (length) automatically, without first considering different values. Comparative analysis with the state-of-the-art alignment-free sequence similarity methods demonstrates the superiority of the proposed approaches, especially with increasing sequence length, or increasing dataset sizes. The K2 and K2* approaches are implemented in the R language as a package and is freely available for open access (http://community.wvu.edu/daadjeroh/projects/K2/K2_1.0.tar.gz). yueljiang@163.com. Supplementary data are available at Bioinformatics online.

  13. Simple method for quick estimation of aquifer hydrogeological parameters

    NASA Astrophysics Data System (ADS)

    Ma, C.; Li, Y. Y.

    2017-08-01

    Development of simple and accurate methods to determine the aquifer hydrogeological parameters was of importance for groundwater resources assessment and management. Aiming at the present issue of estimating aquifer parameters based on some data of the unsteady pumping test, a fitting function of Theis well function was proposed using fitting optimization method and then a unitary linear regression equation was established. The aquifer parameters could be obtained by solving coefficients of the regression equation. The application of the proposed method was illustrated, using two published data sets. By the error statistics and analysis on the pumping drawdown, it showed that the method proposed in this paper yielded quick and accurate estimates of the aquifer parameters. The proposed method could reliably identify the aquifer parameters from long distance observed drawdowns and early drawdowns. It was hoped that the proposed method in this paper would be helpful for practicing hydrogeologists and hydrologists.

  14. A statistical method for measuring activation of gene regulatory networks.

    PubMed

    Esteves, Gustavo H; Reis, Luiz F L

    2018-06-13

    Gene expression data analysis is of great importance for modern molecular biology, given our ability to measure the expression profiles of thousands of genes and enabling studies rooted in systems biology. In this work, we propose a simple statistical model for the activation measuring of gene regulatory networks, instead of the traditional gene co-expression networks. We present the mathematical construction of a statistical procedure for testing hypothesis regarding gene regulatory network activation. The real probability distribution for the test statistic is evaluated by a permutation based study. To illustrate the functionality of the proposed methodology, we also present a simple example based on a small hypothetical network and the activation measuring of two KEGG networks, both based on gene expression data collected from gastric and esophageal samples. The two KEGG networks were also analyzed for a public database, available through NCBI-GEO, presented as Supplementary Material. This method was implemented in an R package that is available at the BioConductor project website under the name maigesPack.

  15. Testing for independence in J×K contingency tables with complex sample survey data.

    PubMed

    Lipsitz, Stuart R; Fitzmaurice, Garrett M; Sinha, Debajyoti; Hevelone, Nathanael; Giovannucci, Edward; Hu, Jim C

    2015-09-01

    The test of independence of row and column variables in a (J×K) contingency table is a widely used statistical test in many areas of application. For complex survey samples, use of the standard Pearson chi-squared test is inappropriate due to correlation among units within the same cluster. Rao and Scott (1981, Journal of the American Statistical Association 76, 221-230) proposed an approach in which the standard Pearson chi-squared statistic is multiplied by a design effect to adjust for the complex survey design. Unfortunately, this test fails to exist when one of the observed cell counts equals zero. Even with the large samples typical of many complex surveys, zero cell counts can occur for rare events, small domains, or contingency tables with a large number of cells. Here, we propose Wald and score test statistics for independence based on weighted least squares estimating equations. In contrast to the Rao-Scott test statistic, the proposed Wald and score test statistics always exist. In simulations, the score test is found to perform best with respect to type I error. The proposed method is motivated by, and applied to, post surgical complications data from the United States' Nationwide Inpatient Sample (NIS) complex survey of hospitals in 2008. © 2015, The International Biometric Society.

  16. Person Fit Based on Statistical Process Control in an Adaptive Testing Environment. Research Report 98-13.

    ERIC Educational Resources Information Center

    van Krimpen-Stoop, Edith M. L. A.; Meijer, Rob R.

    Person-fit research in the context of paper-and-pencil tests is reviewed, and some specific problems regarding person fit in the context of computerized adaptive testing (CAT) are discussed. Some new methods are proposed to investigate person fit in a CAT environment. These statistics are based on Statistical Process Control (SPC) theory. A…

  17. Automated segmentation of ultrasonic breast lesions using statistical texture classification and active contour based on probability distance.

    PubMed

    Liu, Bo; Cheng, H D; Huang, Jianhua; Tian, Jiawei; Liu, Jiafeng; Tang, Xianglong

    2009-08-01

    Because of its complicated structure, low signal/noise ratio, low contrast and blurry boundaries, fully automated segmentation of a breast ultrasound (BUS) image is a difficult task. In this paper, a novel segmentation method for BUS images without human intervention is proposed. Unlike most published approaches, the proposed method handles the segmentation problem by using a two-step strategy: ROI generation and ROI segmentation. First, a well-trained texture classifier categorizes the tissues into different classes, and the background knowledge rules are used for selecting the regions of interest (ROIs) from them. Second, a novel probability distance-based active contour model is applied for segmenting the ROIs and finding the accurate positions of the breast tumors. The active contour model combines both global statistical information and local edge information, using a level set approach. The proposed segmentation method was performed on 103 BUS images (48 benign and 55 malignant). To validate the performance, the results were compared with the corresponding tumor regions marked by an experienced radiologist. Three error metrics, true-positive ratio (TP), false-negative ratio (FN) and false-positive ratio (FP) were used for measuring the performance of the proposed method. The final results (TP = 91.31%, FN = 8.69% and FP = 7.26%) demonstrate that the proposed method can segment BUS images efficiently, quickly and automatically.

  18. FARVATX: FAmily-based Rare Variant Association Test for X-linked genes

    PubMed Central

    Choi, Sungkyoung; Lee, Sungyoung; Qiao, Dandi; Hardin, Megan; Cho, Michael H.; Silverman, Edwin K; Park, Taesung; Won, Sungho

    2016-01-01

    Although the X chromosome has many genes that are functionally related to human diseases, the complicated biological properties of the X chromosome have prevented efficient genetic association analyses, and only a few significantly associated X-linked variants have been reported for complex traits. For instance, dosage compensation of X-linked genes is often achieved via the inactivation of one allele in each X-linked variant in females; however, some X-linked variants can escape this X chromosome inactivation. Efficient genetic analyses cannot be conducted without prior knowledge about the gene expression process of X-linked variants, and misspecified information can lead to power loss. In this report, we propose new statistical methods for rare X-linked variant genetic association analysis of dichotomous phenotypes with family-based samples. The proposed methods are computationally efficient and can complete X-linked analyses within a few hours. Simulation studies demonstrate the statistical efficiency of the proposed methods, which were then applied to rare-variant association analysis of the X chromosome in chronic obstructive pulmonary disease (COPD). Some promising significant X-linked genes were identified, illustrating the practical importance of the proposed methods. PMID:27325607

  19. FARVATX: Family-Based Rare Variant Association Test for X-Linked Genes.

    PubMed

    Choi, Sungkyoung; Lee, Sungyoung; Qiao, Dandi; Hardin, Megan; Cho, Michael H; Silverman, Edwin K; Park, Taesung; Won, Sungho

    2016-09-01

    Although the X chromosome has many genes that are functionally related to human diseases, the complicated biological properties of the X chromosome have prevented efficient genetic association analyses, and only a few significantly associated X-linked variants have been reported for complex traits. For instance, dosage compensation of X-linked genes is often achieved via the inactivation of one allele in each X-linked variant in females; however, some X-linked variants can escape this X chromosome inactivation. Efficient genetic analyses cannot be conducted without prior knowledge about the gene expression process of X-linked variants, and misspecified information can lead to power loss. In this report, we propose new statistical methods for rare X-linked variant genetic association analysis of dichotomous phenotypes with family-based samples. The proposed methods are computationally efficient and can complete X-linked analyses within a few hours. Simulation studies demonstrate the statistical efficiency of the proposed methods, which were then applied to rare-variant association analysis of the X chromosome in chronic obstructive pulmonary disease. Some promising significant X-linked genes were identified, illustrating the practical importance of the proposed methods. © 2016 WILEY PERIODICALS, INC.

  20. A Comparison of Methods for Estimating the Determinant of High-Dimensional Covariance Matrix.

    PubMed

    Hu, Zongliang; Dong, Kai; Dai, Wenlin; Tong, Tiejun

    2017-09-21

    The determinant of the covariance matrix for high-dimensional data plays an important role in statistical inference and decision. It has many real applications including statistical tests and information theory. Due to the statistical and computational challenges with high dimensionality, little work has been proposed in the literature for estimating the determinant of high-dimensional covariance matrix. In this paper, we estimate the determinant of the covariance matrix using some recent proposals for estimating high-dimensional covariance matrix. Specifically, we consider a total of eight covariance matrix estimation methods for comparison. Through extensive simulation studies, we explore and summarize some interesting comparison results among all compared methods. We also provide practical guidelines based on the sample size, the dimension, and the correlation of the data set for estimating the determinant of high-dimensional covariance matrix. Finally, from a perspective of the loss function, the comparison study in this paper may also serve as a proxy to assess the performance of the covariance matrix estimation.

  1. The applications of statistical quantification techniques in nanomechanics and nanoelectronics.

    PubMed

    Mai, Wenjie; Deng, Xinwei

    2010-10-08

    Although nanoscience and nanotechnology have been developing for approximately two decades and have achieved numerous breakthroughs, the experimental results from nanomaterials with a higher noise level and poorer repeatability than those from bulk materials still remain as a practical issue, and challenge many techniques of quantification of nanomaterials. This work proposes a physical-statistical modeling approach and a global fitting statistical method to use all the available discrete data or quasi-continuous curves to quantify a few targeted physical parameters, which can provide more accurate, efficient and reliable parameter estimates, and give reasonable physical explanations. In the resonance method for measuring the elastic modulus of ZnO nanowires (Zhou et al 2006 Solid State Commun. 139 222-6), our statistical technique gives E = 128.33 GPa instead of the original E = 108 GPa, and unveils a negative bias adjustment f(0). The causes are suggested by the systematic bias in measuring the length of the nanowires. In the electronic measurement of the resistivity of a Mo nanowire (Zach et al 2000 Science 290 2120-3), the proposed new method automatically identified the importance of accounting for the Ohmic contact resistance in the model of the Ohmic behavior in nanoelectronics experiments. The 95% confidence interval of resistivity in the proposed one-step procedure is determined to be 3.57 +/- 0.0274 x 10( - 5) ohm cm, which should be a more reliable and precise estimate. The statistical quantification technique should find wide applications in obtaining better estimations from various systematic errors and biased effects that become more significant at the nanoscale.

  2. Trend extraction using empirical mode decomposition and statistical empirical mode decomposition: Case study: Kuala Lumpur stock market

    NASA Astrophysics Data System (ADS)

    Jaber, Abobaker M.

    2014-12-01

    Two nonparametric methods for prediction and modeling of financial time series signals are proposed. The proposed techniques are designed to handle non-stationary and non-linearity behave and to extract meaningful signals for reliable prediction. Due to Fourier Transform (FT), the methods select significant decomposed signals that will be employed for signal prediction. The proposed techniques developed by coupling Holt-winter method with Empirical Mode Decomposition (EMD) and it is Extending the scope of empirical mode decomposition by smoothing (SEMD). To show performance of proposed techniques, we analyze daily closed price of Kuala Lumpur stock market index.

  3. Testing the statistical compatibility of independent data sets

    NASA Astrophysics Data System (ADS)

    Maltoni, M.; Schwetz, T.

    2003-08-01

    We discuss a goodness-of-fit method which tests the compatibility between statistically independent data sets. The method gives sensible results even in cases where the χ2 minima of the individual data sets are very low or when several parameters are fitted to a large number of data points. In particular, it avoids the problem that a possible disagreement between data sets becomes diluted by data points which are insensitive to the crucial parameters. A formal derivation of the probability distribution function for the proposed test statistics is given, based on standard theorems of statistics. The application of the method is illustrated on data from neutrino oscillation experiments, and its complementarity to the standard goodness-of-fit is discussed.

  4. An adaptive multi-feature segmentation model for infrared image

    NASA Astrophysics Data System (ADS)

    Zhang, Tingting; Han, Jin; Zhang, Yi; Bai, Lianfa

    2016-04-01

    Active contour models (ACM) have been extensively applied to image segmentation, conventional region-based active contour models only utilize global or local single feature information to minimize the energy functional to drive the contour evolution. Considering the limitations of original ACMs, an adaptive multi-feature segmentation model is proposed to handle infrared images with blurred boundaries and low contrast. In the proposed model, several essential local statistic features are introduced to construct a multi-feature signed pressure function (MFSPF). In addition, we draw upon the adaptive weight coefficient to modify the level set formulation, which is formed by integrating MFSPF with local statistic features and signed pressure function with global information. Experimental results demonstrate that the proposed method can make up for the inadequacy of the original method and get desirable results in segmenting infrared images.

  5. A Geometrical-Statistical Approach to Outlier Removal for TDOA Measurements

    NASA Astrophysics Data System (ADS)

    Compagnoni, Marco; Pini, Alessia; Canclini, Antonio; Bestagini, Paolo; Antonacci, Fabio; Tubaro, Stefano; Sarti, Augusto

    2017-08-01

    The curse of outlier measurements in estimation problems is a well known issue in a variety of fields. Therefore, outlier removal procedures, which enables the identification of spurious measurements within a set, have been developed for many different scenarios and applications. In this paper, we propose a statistically motivated outlier removal algorithm for time differences of arrival (TDOAs), or equivalently range differences (RD), acquired at sensor arrays. The method exploits the TDOA-space formalism and works by only knowing relative sensor positions. As the proposed method is completely independent from the application for which measurements are used, it can be reliably used to identify outliers within a set of TDOA/RD measurements in different fields (e.g. acoustic source localization, sensor synchronization, radar, remote sensing, etc.). The proposed outlier removal algorithm is validated by means of synthetic simulations and real experiments.

  6. Data-Driven Learning of Q-Matrix

    PubMed Central

    Liu, Jingchen; Xu, Gongjun; Ying, Zhiliang

    2013-01-01

    The recent surge of interests in cognitive assessment has led to developments of novel statistical models for diagnostic classification. Central to many such models is the well-known Q-matrix, which specifies the item–attribute relationships. This article proposes a data-driven approach to identification of the Q-matrix and estimation of related model parameters. A key ingredient is a flexible T-matrix that relates the Q-matrix to response patterns. The flexibility of the T-matrix allows the construction of a natural criterion function as well as a computationally amenable algorithm. Simulations results are presented to demonstrate usefulness and applicability of the proposed method. Extension to handling of the Q-matrix with partial information is presented. The proposed method also provides a platform on which important statistical issues, such as hypothesis testing and model selection, may be formally addressed. PMID:23926363

  7. A vessel segmentation method for multi-modality angiographic images based on multi-scale filtering and statistical models.

    PubMed

    Lu, Pei; Xia, Jun; Li, Zhicheng; Xiong, Jing; Yang, Jian; Zhou, Shoujun; Wang, Lei; Chen, Mingyang; Wang, Cheng

    2016-11-08

    Accurate segmentation of blood vessels plays an important role in the computer-aided diagnosis and interventional treatment of vascular diseases. The statistical method is an important component of effective vessel segmentation; however, several limitations discourage the segmentation effect, i.e., dependence of the image modality, uneven contrast media, bias field, and overlapping intensity distribution of the object and background. In addition, the mixture models of the statistical methods are constructed relaying on the characteristics of the image histograms. Thus, it is a challenging issue for the traditional methods to be available in vessel segmentation from multi-modality angiographic images. To overcome these limitations, a flexible segmentation method with a fixed mixture model has been proposed for various angiography modalities. Our method mainly consists of three parts. Firstly, multi-scale filtering algorithm was used on the original images to enhance vessels and suppress noises. As a result, the filtered data achieved a new statistical characteristic. Secondly, a mixture model formed by three probabilistic distributions (two Exponential distributions and one Gaussian distribution) was built to fit the histogram curve of the filtered data, where the expectation maximization (EM) algorithm was used for parameters estimation. Finally, three-dimensional (3D) Markov random field (MRF) were employed to improve the accuracy of pixel-wise classification and posterior probability estimation. To quantitatively evaluate the performance of the proposed method, two phantoms simulating blood vessels with different tubular structures and noises have been devised. Meanwhile, four clinical angiographic data sets from different human organs have been used to qualitatively validate the method. To further test the performance, comparison tests between the proposed method and the traditional ones have been conducted on two different brain magnetic resonance angiography (MRA) data sets. The results of the phantoms were satisfying, e.g., the noise was greatly suppressed, the percentages of the misclassified voxels, i.e., the segmentation error ratios, were no more than 0.3%, and the Dice similarity coefficients (DSCs) were above 94%. According to the opinions of clinical vascular specialists, the vessels in various data sets were extracted with high accuracy since complete vessel trees were extracted while lesser non-vessels and background were falsely classified as vessel. In the comparison experiments, the proposed method showed its superiority in accuracy and robustness for extracting vascular structures from multi-modality angiographic images with complicated background noises. The experimental results demonstrated that our proposed method was available for various angiographic data. The main reason was that the constructed mixture probability model could unitarily classify vessel object from the multi-scale filtered data of various angiography images. The advantages of the proposed method lie in the following aspects: firstly, it can extract the vessels with poor angiography quality, since the multi-scale filtering algorithm can improve the vessel intensity in the circumstance such as uneven contrast media and bias field; secondly, it performed well for extracting the vessels in multi-modality angiographic images despite various signal-noises; and thirdly, it was implemented with better accuracy, and robustness than the traditional methods. Generally, these traits declare that the proposed method would have significant clinical application.

  8. Optimal Weights Mixed Filter for removing mixture of Gaussian and impulse noises

    PubMed Central

    Grama, Ion; Liu, Quansheng

    2017-01-01

    In this paper we consider the problem of restoration of a image contaminated by a mixture of Gaussian and impulse noises. We propose a new statistic called ROADGI which improves the well-known Rank-Ordered Absolute Differences (ROAD) statistic for detecting points contaminated with the impulse noise in this context. Combining ROADGI statistic with the method of weights optimization we obtain a new algorithm called Optimal Weights Mixed Filter (OWMF) to deal with the mixed noise. Our simulation results show that the proposed filter is effective for mixed noises, as well as for single impulse noise and for single Gaussian noise. PMID:28692667

  9. Optimal Weights Mixed Filter for removing mixture of Gaussian and impulse noises.

    PubMed

    Jin, Qiyu; Grama, Ion; Liu, Quansheng

    2017-01-01

    In this paper we consider the problem of restoration of a image contaminated by a mixture of Gaussian and impulse noises. We propose a new statistic called ROADGI which improves the well-known Rank-Ordered Absolute Differences (ROAD) statistic for detecting points contaminated with the impulse noise in this context. Combining ROADGI statistic with the method of weights optimization we obtain a new algorithm called Optimal Weights Mixed Filter (OWMF) to deal with the mixed noise. Our simulation results show that the proposed filter is effective for mixed noises, as well as for single impulse noise and for single Gaussian noise.

  10. Statistics attack on `quantum private comparison with a malicious third party' and its improvement

    NASA Astrophysics Data System (ADS)

    Gu, Jun; Ho, Chih-Yung; Hwang, Tzonelih

    2018-02-01

    Recently, Sun et al. (Quantum Inf Process:14:2125-2133, 2015) proposed a quantum private comparison protocol allowing two participants to compare the equality of their secrets via a malicious third party (TP). They designed an interesting trap comparison method to prevent the TP from knowing the final comparison result. However, this study shows that the malicious TP can use the statistics attack to reveal the comparison result. A simple modification is hence proposed to solve this problem.

  11. Statistical Performances of Resistive Active Power Splitter

    NASA Astrophysics Data System (ADS)

    Lalléchère, Sébastien; Ravelo, Blaise; Thakur, Atul

    2016-03-01

    In this paper, the synthesis and sensitivity analysis of an active power splitter (PWS) is proposed. It is based on the active cell composed of a Field Effect Transistor in cascade with shunted resistor at the input and the output (resistive amplifier topology). The PWS uncertainty versus resistance tolerances is suggested by using stochastic method. Furthermore, with the proposed topology, we can control easily the device gain while varying a resistance. This provides useful tool to analyse the statistical sensitivity of the system in uncertain environment.

  12. Testing independence of bivariate interval-censored data using modified Kendall's tau statistic.

    PubMed

    Kim, Yuneung; Lim, Johan; Park, DoHwan

    2015-11-01

    In this paper, we study a nonparametric procedure to test independence of bivariate interval censored data; for both current status data (case 1 interval-censored data) and case 2 interval-censored data. To do it, we propose a score-based modification of the Kendall's tau statistic for bivariate interval-censored data. Our modification defines the Kendall's tau statistic with expected numbers of concordant and disconcordant pairs of data. The performance of the modified approach is illustrated by simulation studies and application to the AIDS study. We compare our method to alternative approaches such as the two-stage estimation method by Sun et al. (Scandinavian Journal of Statistics, 2006) and the multiple imputation method by Betensky and Finkelstein (Statistics in Medicine, 1999b). © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  13. Improved method for selection of the NOAEL.

    PubMed

    Calabrese, E J; Baldwin, L A

    1994-02-01

    The paper proposes that the NOAEL be defined as the highest dosage tested that is statistically significantly different from the control group while also being statistically significantly different from the LOAEL. This new definition requires that the NOAEL be defined from two points of reference rather than the current approach (i.e., single point of reference) in which the NOAEL represents only the highest dosage not statistically significantly different from the control group. This proposal is necessary in order to differentiate NOAELs which are statistically distinguishable from the LOAEL. Under the new regime only those satisfying both criteria would be designated a true NOAEL while those satisfying only one criteria (i.e., not statistically significant different from the control group) would be designated a "quasi" NOAEL and handled differently (i.e., via an uncertainty factor) for risk assessment purposes.

  14. Quantile Regression for Recurrent Gap Time Data

    PubMed Central

    Luo, Xianghua; Huang, Chiung-Yu; Wang, Lan

    2014-01-01

    Summary Evaluating covariate effects on gap times between successive recurrent events is of interest in many medical and public health studies. While most existing methods for recurrent gap time analysis focus on modeling the hazard function of gap times, a direct interpretation of the covariate effects on the gap times is not available through these methods. In this article, we consider quantile regression that can provide direct assessment of covariate effects on the quantiles of the gap time distribution. Following the spirit of the weighted risk-set method by Luo and Huang (2011, Statistics in Medicine 30, 301–311), we extend the martingale-based estimating equation method considered by Peng and Huang (2008, Journal of the American Statistical Association 103, 637–649) for univariate survival data to analyze recurrent gap time data. The proposed estimation procedure can be easily implemented in existing software for univariate censored quantile regression. Uniform consistency and weak convergence of the proposed estimators are established. Monte Carlo studies demonstrate the effectiveness of the proposed method. An application to data from the Danish Psychiatric Central Register is presented to illustrate the methods developed in this article. PMID:23489055

  15. Empirical Bayes scan statistics for detecting clusters of disease risk variants in genetic studies.

    PubMed

    McCallum, Kenneth J; Ionita-Laza, Iuliana

    2015-12-01

    Recent developments of high-throughput genomic technologies offer an unprecedented detailed view of the genetic variation in various human populations, and promise to lead to significant progress in understanding the genetic basis of complex diseases. Despite this tremendous advance in data generation, it remains very challenging to analyze and interpret these data due to their sparse and high-dimensional nature. Here, we propose novel applications and new developments of empirical Bayes scan statistics to identify genomic regions significantly enriched with disease risk variants. We show that the proposed empirical Bayes methodology can be substantially more powerful than existing scan statistics methods especially so in the presence of many non-disease risk variants, and in situations when there is a mixture of risk and protective variants. Furthermore, the empirical Bayes approach has greater flexibility to accommodate covariates such as functional prediction scores and additional biomarkers. As proof-of-concept we apply the proposed methods to a whole-exome sequencing study for autism spectrum disorders and identify several promising candidate genes. © 2015, The International Biometric Society.

  16. THE SCREENING AND RANKING ALGORITHM FOR CHANGE-POINTS DETECTION IN MULTIPLE SAMPLES

    PubMed Central

    Song, Chi; Min, Xiaoyi; Zhang, Heping

    2016-01-01

    The chromosome copy number variation (CNV) is the deviation of genomic regions from their normal copy number states, which may associate with many human diseases. Current genetic studies usually collect hundreds to thousands of samples to study the association between CNV and diseases. CNVs can be called by detecting the change-points in mean for sequences of array-based intensity measurements. Although multiple samples are of interest, the majority of the available CNV calling methods are single sample based. Only a few multiple sample methods have been proposed using scan statistics that are computationally intensive and designed toward either common or rare change-points detection. In this paper, we propose a novel multiple sample method by adaptively combining the scan statistic of the screening and ranking algorithm (SaRa), which is computationally efficient and is able to detect both common and rare change-points. We prove that asymptotically this method can find the true change-points with almost certainty and show in theory that multiple sample methods are superior to single sample methods when shared change-points are of interest. Additionally, we report extensive simulation studies to examine the performance of our proposed method. Finally, using our proposed method as well as two competing approaches, we attempt to detect CNVs in the data from the Primary Open-Angle Glaucoma Genes and Environment study, and conclude that our method is faster and requires less information while our ability to detect the CNVs is comparable or better. PMID:28090239

  17. A statistical evaluation of the effects of gender differences in assessment of acute inhalation toxicity

    PubMed Central

    Price, Charlotte; Stallard, Nigel; Creton, Stuart; Indans, Ian; Guest, Robert; Griffiths, David; Edwards, Philippa

    2010-01-01

    Acute inhalation toxicity of chemicals has conventionally been assessed by the median lethal concentration (LC50) test (organisation for economic co-operation and development (OECD) TG 403). Two new methods, the recently adopted acute toxic class method (ATC; OECD TG 436) and a proposed fixed concentration procedure (FCP), have recently been considered, but statistical evaluations of these methods did not investigate the influence of differential sensitivity between male and female rats on the outcomes. This paper presents an analysis of data from the assessment of acute inhalation toxicity for 56 substances. Statistically significant differences between the LC50 for males and females were found for 16 substances, with greater than 10-fold differences in the LC50 for two substances. The paper also reports a statistical evaluation of the three test methods in the presence of unanticipated gender differences. With TG 403, a gender difference leads to a slightly greater chance of under-classification. This is also the case for the ATC method, but more pronounced than for TG 403, with misclassification of nearly all substances from Globally Harmonised System (GHS) class 3 into class 4. As the FCP uses females only, if females are more sensitive, the classification is unchanged. If males are more sensitive, the procedure may lead to under-classification. Additional research on modification of the FCP is thus proposed. PMID:20488841

  18. A Modified Moore Approach to Teaching Mathematical Statistics: An Inquiry Based Learning Technique to Teaching Mathematical Statistics

    ERIC Educational Resources Information Center

    McLoughlin, M. Padraig M. M.

    2008-01-01

    The author of this paper submits the thesis that learning requires doing; only through inquiry is learning achieved, and hence this paper proposes a programme of use of a modified Moore method in a Probability and Mathematical Statistics (PAMS) course sequence to teach students PAMS. Furthermore, the author of this paper opines that set theory…

  19. Molecular activity prediction by means of supervised subspace projection based ensembles of classifiers.

    PubMed

    Cerruela García, G; García-Pedrajas, N; Luque Ruiz, I; Gómez-Nieto, M Á

    2018-03-01

    This paper proposes a method for molecular activity prediction in QSAR studies using ensembles of classifiers constructed by means of two supervised subspace projection methods, namely nonparametric discriminant analysis (NDA) and hybrid discriminant analysis (HDA). We studied the performance of the proposed ensembles compared to classical ensemble methods using four molecular datasets and eight different models for the representation of the molecular structure. Using several measures and statistical tests for classifier comparison, we observe that our proposal improves the classification results with respect to classical ensemble methods. Therefore, we show that ensembles constructed using supervised subspace projections offer an effective way of creating classifiers in cheminformatics.

  20. Statistical shape analysis using 3D Poisson equation--A quantitatively validated approach.

    PubMed

    Gao, Yi; Bouix, Sylvain

    2016-05-01

    Statistical shape analysis has been an important area of research with applications in biology, anatomy, neuroscience, agriculture, paleontology, etc. Unfortunately, the proposed methods are rarely quantitatively evaluated, and as shown in recent studies, when they are evaluated, significant discrepancies exist in their outputs. In this work, we concentrate on the problem of finding the consistent location of deformation between two population of shapes. We propose a new shape analysis algorithm along with a framework to perform a quantitative evaluation of its performance. Specifically, the algorithm constructs a Signed Poisson Map (SPoM) by solving two Poisson equations on the volumetric shapes of arbitrary topology, and statistical analysis is then carried out on the SPoMs. The method is quantitatively evaluated on synthetic shapes and applied on real shape data sets in brain structures. Copyright © 2016 Elsevier B.V. All rights reserved.

  1. Convolutional auto-encoder for image denoising of ultra-low-dose CT.

    PubMed

    Nishio, Mizuho; Nagashima, Chihiro; Hirabayashi, Saori; Ohnishi, Akinori; Sasaki, Kaori; Sagawa, Tomoyuki; Hamada, Masayuki; Yamashita, Tatsuo

    2017-08-01

    The purpose of this study was to validate a patch-based image denoising method for ultra-low-dose CT images. Neural network with convolutional auto-encoder and pairs of standard-dose CT and ultra-low-dose CT image patches were used for image denoising. The performance of the proposed method was measured by using a chest phantom. Standard-dose and ultra-low-dose CT images of the chest phantom were acquired. The tube currents for standard-dose and ultra-low-dose CT were 300 and 10 mA, respectively. Ultra-low-dose CT images were denoised with our proposed method using neural network, large-scale nonlocal mean, and block-matching and 3D filtering. Five radiologists and three technologists assessed the denoised ultra-low-dose CT images visually and recorded their subjective impressions of streak artifacts, noise other than streak artifacts, visualization of pulmonary vessels, and overall image quality. For the streak artifacts, noise other than streak artifacts, and visualization of pulmonary vessels, the results of our proposed method were statistically better than those of block-matching and 3D filtering (p-values < 0.05). On the other hand, the difference in the overall image quality between our proposed method and block-matching and 3D filtering was not statistically significant (p-value = 0.07272). The p-values obtained between our proposed method and large-scale nonlocal mean were all less than 0.05. Neural network with convolutional auto-encoder could be trained using pairs of standard-dose and ultra-low-dose CT image patches. According to the visual assessment by radiologists and technologists, the performance of our proposed method was superior to that of large-scale nonlocal mean and block-matching and 3D filtering.

  2. Generalized disequilibrium test for association in qualitative traits incorporating imprinting effects based on extended pedigrees.

    PubMed

    Li, Jian-Long; Wang, Peng; Fung, Wing Kam; Zhou, Ji-Yuan

    2017-10-16

    For dichotomous traits, the generalized disequilibrium test with the moment estimate of the variance (GDT-ME) is a powerful family-based association method. Genomic imprinting is an important epigenetic phenomenon and currently, there has been increasing interest of incorporating imprinting to improve the test power of association analysis. However, GDT-ME does not take imprinting effects into account, and it has not been investigated whether it can be used for association analysis when the effects indeed exist. In this article, based on a novel decomposition of the genotype score according to the paternal or maternal source of the allele, we propose the generalized disequilibrium test with imprinting (GDTI) for complete pedigrees without any missing genotypes. Then, we extend GDTI and GDT-ME to accommodate incomplete pedigrees with some pedigrees having missing genotypes, by using a Monte Carlo (MC) sampling and estimation scheme to infer missing genotypes given available genotypes in each pedigree, denoted by MCGDTI and MCGDT-ME, respectively. The proposed GDTI and MCGDTI methods evaluate the differences of the paternal as well as maternal allele scores for all discordant relative pairs in a pedigree, including beyond first-degree relative pairs. Advantages of the proposed GDTI and MCGDTI test statistics over existing methods are demonstrated by simulation studies under various simulation settings and by application to the rheumatoid arthritis dataset. Simulation results show that the proposed tests control the size well under the null hypothesis of no association, and outperform the existing methods under various imprinting effect models. The existing GDT-ME and the proposed MCGDT-ME can be used to test for association even when imprinting effects exist. For the application to the rheumatoid arthritis data, compared to the existing methods, MCGDTI identifies more loci statistically significantly associated with the disease. Under complete and incomplete imprinting effect models, our proposed GDTI and MCGDTI methods, by considering the information on imprinting effects and all discordant relative pairs within each pedigree, outperform all the existing test statistics and MCGDTI can recapture much of the missing information. Therefore, MCGDTI is recommended in practice.

  3. Statistical CT noise reduction with multiscale decomposition and penalized weighted least squares in the projection domain

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Tang Shaojie; Tang Xiangyang; School of Automation, Xi'an University of Posts and Telecommunications, Xi'an, Shaanxi 710121

    2012-09-15

    Purposes: The suppression of noise in x-ray computed tomography (CT) imaging is of clinical relevance for diagnostic image quality and the potential for radiation dose saving. Toward this purpose, statistical noise reduction methods in either the image or projection domain have been proposed, which employ a multiscale decomposition to enhance the performance of noise suppression while maintaining image sharpness. Recognizing the advantages of noise suppression in the projection domain, the authors propose a projection domain multiscale penalized weighted least squares (PWLS) method, in which the angular sampling rate is explicitly taken into consideration to account for the possible variation ofmore » interview sampling rate in advanced clinical or preclinical applications. Methods: The projection domain multiscale PWLS method is derived by converting an isotropic diffusion partial differential equation in the image domain into the projection domain, wherein a multiscale decomposition is carried out. With adoption of the Markov random field or soft thresholding objective function, the projection domain multiscale PWLS method deals with noise at each scale. To compensate for the degradation in image sharpness caused by the projection domain multiscale PWLS method, an edge enhancement is carried out following the noise reduction. The performance of the proposed method is experimentally evaluated and verified using the projection data simulated by computer and acquired by a CT scanner. Results: The preliminary results show that the proposed projection domain multiscale PWLS method outperforms the projection domain single-scale PWLS method and the image domain multiscale anisotropic diffusion method in noise reduction. In addition, the proposed method can preserve image sharpness very well while the occurrence of 'salt-and-pepper' noise and mosaic artifacts can be avoided. Conclusions: Since the interview sampling rate is taken into account in the projection domain multiscale decomposition, the proposed method is anticipated to be useful in advanced clinical and preclinical applications where the interview sampling rate varies.« less

  4. Local multiplicity adjustment for the spatial scan statistic using the Gumbel distribution.

    PubMed

    Gangnon, Ronald E

    2012-03-01

    The spatial scan statistic is an important and widely used tool for cluster detection. It is based on the simultaneous evaluation of the statistical significance of the maximum likelihood ratio test statistic over a large collection of potential clusters. In most cluster detection problems, there is variation in the extent of local multiplicity across the study region. For example, using a fixed maximum geographic radius for clusters, urban areas typically have many overlapping potential clusters, whereas rural areas have relatively few. The spatial scan statistic does not account for local multiplicity variation. We describe a previously proposed local multiplicity adjustment based on a nested Bonferroni correction and propose a novel adjustment based on a Gumbel distribution approximation to the distribution of a local scan statistic. We compare the performance of all three statistics in terms of power and a novel unbiased cluster detection criterion. These methods are then applied to the well-known New York leukemia dataset and a Wisconsin breast cancer incidence dataset. © 2011, The International Biometric Society.

  5. Local multiplicity adjustment for the spatial scan statistic using the Gumbel distribution

    PubMed Central

    Gangnon, Ronald E.

    2011-01-01

    Summary The spatial scan statistic is an important and widely used tool for cluster detection. It is based on the simultaneous evaluation of the statistical significance of the maximum likelihood ratio test statistic over a large collection of potential clusters. In most cluster detection problems, there is variation in the extent of local multiplicity across the study region. For example, using a fixed maximum geographic radius for clusters, urban areas typically have many overlapping potential clusters, while rural areas have relatively few. The spatial scan statistic does not account for local multiplicity variation. We describe a previously proposed local multiplicity adjustment based on a nested Bonferroni correction and propose a novel adjustment based on a Gumbel distribution approximation to the distribution of a local scan statistic. We compare the performance of all three statistics in terms of power and a novel unbiased cluster detection criterion. These methods are then applied to the well-known New York leukemia dataset and a Wisconsin breast cancer incidence dataset. PMID:21762118

  6. Proposal for a recovery prediction method for patients affected by acute mediastinitis

    PubMed Central

    2012-01-01

    Background An attempt to find a prediction method of death risk in patients affected by acute mediastinitis. There is not such a tool described in available literature for that serious disease. Methods The study comprised 44 consecutive cases of acute mediastinitis. General anamnesis and biochemical data were included. Factor analysis was used to extract the risk characteristic for the patients. The most valuable results were obtained for 8 parameters which were selected for further statistical analysis (all collected during few hours after admission). Three factors reached Eigenvalue >1. Clinical explanations of these combined statistical factors are: Factor1 - proteinic status (serum total protein, albumin, and hemoglobin level), Factor2 - inflammatory status (white blood cells, CRP, procalcitonin), and Factor3 - general risk (age, number of coexisting diseases). Threshold values of prediction factors were estimated by means of statistical analysis (factor analysis, Statgraphics Centurion XVI). Results The final prediction result for the patients is constructed as simultaneous evaluation of all factor scores. High probability of death should be predicted if factor 1 value decreases with simultaneous increase of factors 2 and 3. The diagnostic power of the proposed method was revealed to be high [sensitivity =90%, specificity =64%], for Factor1 [SNC = 87%, SPC = 79%]; for Factor2 [SNC = 87%, SPC = 50%] and for Factor3 [SNC = 73%, SPC = 71%]. Conclusion The proposed prediction method seems a useful emergency signal during acute mediastinitis control in affected patients. PMID:22574625

  7. Incorporating Functional Annotations for Fine-Mapping Causal Variants in a Bayesian Framework Using Summary Statistics.

    PubMed

    Chen, Wenan; McDonnell, Shannon K; Thibodeau, Stephen N; Tillmans, Lori S; Schaid, Daniel J

    2016-11-01

    Functional annotations have been shown to improve both the discovery power and fine-mapping accuracy in genome-wide association studies. However, the optimal strategy to incorporate the large number of existing annotations is still not clear. In this study, we propose a Bayesian framework to incorporate functional annotations in a systematic manner. We compute the maximum a posteriori solution and use cross validation to find the optimal penalty parameters. By extending our previous fine-mapping method CAVIARBF into this framework, we require only summary statistics as input. We also derived an exact calculation of Bayes factors using summary statistics for quantitative traits, which is necessary when a large proportion of trait variance is explained by the variants of interest, such as in fine mapping expression quantitative trait loci (eQTL). We compared the proposed method with PAINTOR using different strategies to combine annotations. Simulation results show that the proposed method achieves the best accuracy in identifying causal variants among the different strategies and methods compared. We also find that for annotations with moderate effects from a large annotation pool, screening annotations individually and then combining the top annotations can produce overly optimistic results. We applied these methods on two real data sets: a meta-analysis result of lipid traits and a cis-eQTL study of normal prostate tissues. For the eQTL data, incorporating annotations significantly increased the number of potential causal variants with high probabilities. Copyright © 2016 by the Genetics Society of America.

  8. A space-time scan statistic for detecting emerging outbreaks.

    PubMed

    Tango, Toshiro; Takahashi, Kunihiko; Kohriyama, Kazuaki

    2011-03-01

    As a major analytical method for outbreak detection, Kulldorff's space-time scan statistic (2001, Journal of the Royal Statistical Society, Series A 164, 61-72) has been implemented in many syndromic surveillance systems. Since, however, it is based on circular windows in space, it has difficulty correctly detecting actual noncircular clusters. Takahashi et al. (2008, International Journal of Health Geographics 7, 14) proposed a flexible space-time scan statistic with the capability of detecting noncircular areas. It seems to us, however, that the detection of the most likely cluster defined in these space-time scan statistics is not the same as the detection of localized emerging disease outbreaks because the former compares the observed number of cases with the conditional expected number of cases. In this article, we propose a new space-time scan statistic which compares the observed number of cases with the unconditional expected number of cases, takes a time-to-time variation of Poisson mean into account, and implements an outbreak model to capture localized emerging disease outbreaks more timely and correctly. The proposed models are illustrated with data from weekly surveillance of the number of absentees in primary schools in Kitakyushu-shi, Japan, 2006. © 2010, The International Biometric Society.

  9. Time-variant random interval natural frequency analysis of structures

    NASA Astrophysics Data System (ADS)

    Wu, Binhua; Wu, Di; Gao, Wei; Song, Chongmin

    2018-02-01

    This paper presents a new robust method namely, unified interval Chebyshev-based random perturbation method, to tackle hybrid random interval structural natural frequency problem. In the proposed approach, random perturbation method is implemented to furnish the statistical features (i.e., mean and standard deviation) and Chebyshev surrogate model strategy is incorporated to formulate the statistical information of natural frequency with regards to the interval inputs. The comprehensive analysis framework combines the superiority of both methods in a way that computational cost is dramatically reduced. This presented method is thus capable of investigating the day-to-day based time-variant natural frequency of structures accurately and efficiently under concrete intrinsic creep effect with probabilistic and interval uncertain variables. The extreme bounds of the mean and standard deviation of natural frequency are captured through the embedded optimization strategy within the analysis procedure. Three particularly motivated numerical examples with progressive relationship in perspective of both structure type and uncertainty variables are demonstrated to justify the computational applicability, accuracy and efficiency of the proposed method.

  10. Statistical evaluation of forecasts

    NASA Astrophysics Data System (ADS)

    Mader, Malenka; Mader, Wolfgang; Gluckman, Bruce J.; Timmer, Jens; Schelter, Björn

    2014-08-01

    Reliable forecasts of extreme but rare events, such as earthquakes, financial crashes, and epileptic seizures, would render interventions and precautions possible. Therefore, forecasting methods have been developed which intend to raise an alarm if an extreme event is about to occur. In order to statistically validate the performance of a prediction system, it must be compared to the performance of a random predictor, which raises alarms independent of the events. Such a random predictor can be obtained by bootstrapping or analytically. We propose an analytic statistical framework which, in contrast to conventional methods, allows for validating independently the sensitivity and specificity of a forecasting method. Moreover, our method accounts for the periods during which an event has to remain absent or occur after a respective forecast.

  11. Non-linear scaling of a musculoskeletal model of the lower limb using statistical shape models.

    PubMed

    Nolte, Daniel; Tsang, Chui Kit; Zhang, Kai Yu; Ding, Ziyun; Kedgley, Angela E; Bull, Anthony M J

    2016-10-03

    Accurate muscle geometry for musculoskeletal models is important to enable accurate subject-specific simulations. Commonly, linear scaling is used to obtain individualised muscle geometry. More advanced methods include non-linear scaling using segmented bone surfaces and manual or semi-automatic digitisation of muscle paths from medical images. In this study, a new scaling method combining non-linear scaling with reconstructions of bone surfaces using statistical shape modelling is presented. Statistical Shape Models (SSMs) of femur and tibia/fibula were used to reconstruct bone surfaces of nine subjects. Reference models were created by morphing manually digitised muscle paths to mean shapes of the SSMs using non-linear transformations and inter-subject variability was calculated. Subject-specific models of muscle attachment and via points were created from three reference models. The accuracy was evaluated by calculating the differences between the scaled and manually digitised models. The points defining the muscle paths showed large inter-subject variability at the thigh and shank - up to 26mm; this was found to limit the accuracy of all studied scaling methods. Errors for the subject-specific muscle point reconstructions of the thigh could be decreased by 9% to 20% by using the non-linear scaling compared to a typical linear scaling method. We conclude that the proposed non-linear scaling method is more accurate than linear scaling methods. Thus, when combined with the ability to reconstruct bone surfaces from incomplete or scattered geometry data using statistical shape models our proposed method is an alternative to linear scaling methods. Copyright © 2016 The Author. Published by Elsevier Ltd.. All rights reserved.

  12. a Probability-Based Statistical Method to Extract Water Body of TM Images with Missing Information

    NASA Astrophysics Data System (ADS)

    Lian, Shizhong; Chen, Jiangping; Luo, Minghai

    2016-06-01

    Water information cannot be accurately extracted using TM images because true information is lost in some images because of blocking clouds and missing data stripes, thereby water information cannot be accurately extracted. Water is continuously distributed in natural conditions; thus, this paper proposed a new method of water body extraction based on probability statistics to improve the accuracy of water information extraction of TM images with missing information. Different disturbing information of clouds and missing data stripes are simulated. Water information is extracted using global histogram matching, local histogram matching, and the probability-based statistical method in the simulated images. Experiments show that smaller Areal Error and higher Boundary Recall can be obtained using this method compared with the conventional methods.

  13. Identifying Pleiotropic Genes in Genome-Wide Association Studies for Multivariate Phenotypes with Mixed Measurement Scales

    PubMed Central

    Williams, L. Keoki; Buu, Anne

    2017-01-01

    We propose a multivariate genome-wide association test for mixed continuous, binary, and ordinal phenotypes. A latent response model is used to estimate the correlation between phenotypes with different measurement scales so that the empirical distribution of the Fisher’s combination statistic under the null hypothesis is estimated efficiently. The simulation study shows that our proposed correlation estimation methods have high levels of accuracy. More importantly, our approach conservatively estimates the variance of the test statistic so that the type I error rate is controlled. The simulation also shows that the proposed test maintains the power at the level very close to that of the ideal analysis based on known latent phenotypes while controlling the type I error. In contrast, conventional approaches–dichotomizing all observed phenotypes or treating them as continuous variables–could either reduce the power or employ a linear regression model unfit for the data. Furthermore, the statistical analysis on the database of the Study of Addiction: Genetics and Environment (SAGE) demonstrates that conducting a multivariate test on multiple phenotypes can increase the power of identifying markers that may not be, otherwise, chosen using marginal tests. The proposed method also offers a new approach to analyzing the Fagerström Test for Nicotine Dependence as multivariate phenotypes in genome-wide association studies. PMID:28081206

  14. Superstatistics analysis of the ion current distribution function: Met3PbCl influence study.

    PubMed

    Miśkiewicz, Janusz; Trela, Zenon; Przestalski, Stanisław; Karcz, Waldemar

    2010-09-01

    A novel analysis of ion current time series is proposed. It is shown that higher (second, third and fourth) statistical moments of the ion current probability distribution function (PDF) can yield new information about ion channel properties. The method is illustrated on a two-state model where the PDF of the compound states are given by normal distributions. The proposed method was applied to the analysis of the SV cation channels of vacuolar membrane of Beta vulgaris and the influence of trimethyllead chloride (Met(3)PbCl) on the ion current probability distribution. Ion currents were measured by patch-clamp technique. It was shown that Met(3)PbCl influences the variance of the open-state ion current but does not alter the PDF of the closed-state ion current. Incorporation of higher statistical moments into the standard investigation of ion channel properties is proposed.

  15. `New insight into statistical hydrology' preface to the special issue

    NASA Astrophysics Data System (ADS)

    Kochanek, Krzysztof

    2018-04-01

    Statistical methods are still the basic tool for investigating random, extreme events occurring in hydrosphere. On 21-22 September 2017, in Warsaw (Poland) the international workshop of the Statistical Hydrology (StaHy) 2017 took place under the auspices of the International Association of Hydrological Sciences. The authors of the presentations proposed to publish their research results in the Special Issue of the Acta Geophysica-`New Insight into Statistical Hydrology'. Five papers were selected for publication, touching on the most crucial issues of statistical methodology in hydrology.

  16. A statistical assessment of differences and equivalences between genetically modified and reference plant varieties

    PubMed Central

    2011-01-01

    Background Safety assessment of genetically modified organisms is currently often performed by comparative evaluation. However, natural variation of plant characteristics between commercial varieties is usually not considered explicitly in the statistical computations underlying the assessment. Results Statistical methods are described for the assessment of the difference between a genetically modified (GM) plant variety and a conventional non-GM counterpart, and for the assessment of the equivalence between the GM variety and a group of reference plant varieties which have a history of safe use. It is proposed to present the results of both difference and equivalence testing for all relevant plant characteristics simultaneously in one or a few graphs, as an aid for further interpretation in safety assessment. A procedure is suggested to derive equivalence limits from the observed results for the reference plant varieties using a specific implementation of the linear mixed model. Three different equivalence tests are defined to classify any result in one of four equivalence classes. The performance of the proposed methods is investigated by a simulation study, and the methods are illustrated on compositional data from a field study on maize grain. Conclusions A clear distinction of practical relevance is shown between difference and equivalence testing. The proposed tests are shown to have appropriate performance characteristics by simulation, and the proposed simultaneous graphical representation of results was found to be helpful for the interpretation of results from a practical field trial data set. PMID:21324199

  17. Event time analysis of longitudinal neuroimage data.

    PubMed

    Sabuncu, Mert R; Bernal-Rusiel, Jorge L; Reuter, Martin; Greve, Douglas N; Fischl, Bruce

    2014-08-15

    This paper presents a method for the statistical analysis of the associations between longitudinal neuroimaging measurements, e.g., of cortical thickness, and the timing of a clinical event of interest, e.g., disease onset. The proposed approach consists of two steps, the first of which employs a linear mixed effects (LME) model to capture temporal variation in serial imaging data. The second step utilizes the extended Cox regression model to examine the relationship between time-dependent imaging measurements and the timing of the event of interest. We demonstrate the proposed method both for the univariate analysis of image-derived biomarkers, e.g., the volume of a structure of interest, and the exploratory mass-univariate analysis of measurements contained in maps, such as cortical thickness and gray matter density. The mass-univariate method employs a recently developed spatial extension of the LME model. We applied our method to analyze structural measurements computed using FreeSurfer, a widely used brain Magnetic Resonance Image (MRI) analysis software package. We provide a quantitative and objective empirical evaluation of the statistical performance of the proposed method on longitudinal data from subjects suffering from Mild Cognitive Impairment (MCI) at baseline. Copyright © 2014 Elsevier Inc. All rights reserved.

  18. A method for modeling co-occurrence propensity of clinical codes with application to ICD-10-PCS auto-coding.

    PubMed

    Subotin, Michael; Davis, Anthony R

    2016-09-01

    Natural language processing methods for medical auto-coding, or automatic generation of medical billing codes from electronic health records, generally assign each code independently of the others. They may thus assign codes for closely related procedures or diagnoses to the same document, even when they do not tend to occur together in practice, simply because the right choice can be difficult to infer from the clinical narrative. We propose a method that injects awareness of the propensities for code co-occurrence into this process. First, a model is trained to estimate the conditional probability that one code is assigned by a human coder, given than another code is known to have been assigned to the same document. Then, at runtime, an iterative algorithm is used to apply this model to the output of an existing statistical auto-coder to modify the confidence scores of the codes. We tested this method in combination with a primary auto-coder for International Statistical Classification of Diseases-10 procedure codes, achieving a 12% relative improvement in F-score over the primary auto-coder baseline. The proposed method can be used, with appropriate features, in combination with any auto-coder that generates codes with different levels of confidence. The promising results obtained for International Statistical Classification of Diseases-10 procedure codes suggest that the proposed method may have wider applications in auto-coding. © The Author 2016. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  19. Reversed inverse regression for the univariate linear calibration and its statistical properties derived using a new methodology

    NASA Astrophysics Data System (ADS)

    Kang, Pilsang; Koo, Changhoi; Roh, Hokyu

    2017-11-01

    Since simple linear regression theory was established at the beginning of the 1900s, it has been used in a variety of fields. Unfortunately, it cannot be used directly for calibration. In practical calibrations, the observed measurements (the inputs) are subject to errors, and hence they vary, thus violating the assumption that the inputs are fixed. Therefore, in the case of calibration, the regression line fitted using the method of least squares is not consistent with the statistical properties of simple linear regression as already established based on this assumption. To resolve this problem, "classical regression" and "inverse regression" have been proposed. However, they do not completely resolve the problem. As a fundamental solution, we introduce "reversed inverse regression" along with a new methodology for deriving its statistical properties. In this study, the statistical properties of this regression are derived using the "error propagation rule" and the "method of simultaneous error equations" and are compared with those of the existing regression approaches. The accuracy of the statistical properties thus derived is investigated in a simulation study. We conclude that the newly proposed regression and methodology constitute the complete regression approach for univariate linear calibrations.

  20. HYPOTHESIS SETTING AND ORDER STATISTIC FOR ROBUST GENOMIC META-ANALYSIS.

    PubMed

    Song, Chi; Tseng, George C

    2014-01-01

    Meta-analysis techniques have been widely developed and applied in genomic applications, especially for combining multiple transcriptomic studies. In this paper, we propose an order statistic of p-values ( r th ordered p-value, rOP) across combined studies as the test statistic. We illustrate different hypothesis settings that detect gene markers differentially expressed (DE) "in all studies", "in the majority of studies", or "in one or more studies", and specify rOP as a suitable method for detecting DE genes "in the majority of studies". We develop methods to estimate the parameter r in rOP for real applications. Statistical properties such as its asymptotic behavior and a one-sided testing correction for detecting markers of concordant expression changes are explored. Power calculation and simulation show better performance of rOP compared to classical Fisher's method, Stouffer's method, minimum p-value method and maximum p-value method under the focused hypothesis setting. Theoretically, rOP is found connected to the naïve vote counting method and can be viewed as a generalized form of vote counting with better statistical properties. The method is applied to three microarray meta-analysis examples including major depressive disorder, brain cancer and diabetes. The results demonstrate rOP as a more generalizable, robust and sensitive statistical framework to detect disease-related markers.

  1. A general statistical test for correlations in a finite-length time series.

    PubMed

    Hanson, Jeffery A; Yang, Haw

    2008-06-07

    The statistical properties of the autocorrelation function from a time series composed of independently and identically distributed stochastic variables has been studied. Analytical expressions for the autocorrelation function's variance have been derived. It has been found that two common ways of calculating the autocorrelation, moving-average and Fourier transform, exhibit different uncertainty characteristics. For periodic time series, the Fourier transform method is preferred because it gives smaller uncertainties that are uniform through all time lags. Based on these analytical results, a statistically robust method has been proposed to test the existence of correlations in a time series. The statistical test is verified by computer simulations and an application to single-molecule fluorescence spectroscopy is discussed.

  2. Simplified estimation of age-specific reference intervals for skewed data.

    PubMed

    Wright, E M; Royston, P

    1997-12-30

    Age-specific reference intervals are commonly used in medical screening and clinical practice, where interest lies in the detection of extreme values. Many different statistical approaches have been published on this topic. The advantages of a parametric method are that they necessarily produce smooth centile curves, the entire density is estimated and an explicit formula is available for the centiles. The method proposed here is a simplified version of a recent approach proposed by Royston and Wright. Basic transformations of the data and multiple regression techniques are combined to model the mean, standard deviation and skewness. Using these simple tools, which are implemented in almost all statistical computer packages, age-specific reference intervals may be obtained. The scope of the method is illustrated by fitting models to several real data sets and assessing each model using goodness-of-fit techniques.

  3. Three-dimensional holoscopic image coding scheme using high-efficiency video coding with kernel-based minimum mean-square-error estimation

    NASA Astrophysics Data System (ADS)

    Liu, Deyang; An, Ping; Ma, Ran; Yang, Chao; Shen, Liquan; Li, Kai

    2016-07-01

    Three-dimensional (3-D) holoscopic imaging, also known as integral imaging, light field imaging, or plenoptic imaging, can provide natural and fatigue-free 3-D visualization. However, a large amount of data is required to represent the 3-D holoscopic content. Therefore, efficient coding schemes for this particular type of image are needed. A 3-D holoscopic image coding scheme with kernel-based minimum mean square error (MMSE) estimation is proposed. In the proposed scheme, the coding block is predicted by an MMSE estimator under statistical modeling. In order to obtain the signal statistical behavior, kernel density estimation (KDE) is utilized to estimate the probability density function of the statistical modeling. As bandwidth estimation (BE) is a key issue in the KDE problem, we also propose a BE method based on kernel trick. The experimental results demonstrate that the proposed scheme can achieve a better rate-distortion performance and a better visual rendering quality.

  4. Statistical Methodology for the Analysis of Repeated Duration Data in Behavioral Studies

    ERIC Educational Resources Information Center

    Letué, Frédérique; Martinez, Marie-José; Samson, Adeline; Vilain, Anne; Vilain, Coriandre

    2018-01-01

    Purpose: Repeated duration data are frequently used in behavioral studies. Classical linear or log-linear mixed models are often inadequate to analyze such data, because they usually consist of nonnegative and skew-distributed variables. Therefore, we recommend use of a statistical methodology specific to duration data. Method: We propose a…

  5. Using Information Technology in Teaching of Business Statistics in Nigeria Business School

    ERIC Educational Resources Information Center

    Hamadu, Dallah; Adeleke, Ismaila; Ehie, Ike

    2011-01-01

    This paper discusses the use of Microsoft Excel software in the teaching of statistics in the Faculty of Business Administration at the University of Lagos, Nigeria. Problems associated with existing traditional methods are identified and a novel pedagogy using Excel is proposed. The advantages of using this software over other specialized…

  6. Identifying pleiotropic genes in genome-wide association studies from related subjects using the linear mixed model and Fisher combination function.

    PubMed

    Yang, James J; Williams, L Keoki; Buu, Anne

    2017-08-24

    A multivariate genome-wide association test is proposed for analyzing data on multivariate quantitative phenotypes collected from related subjects. The proposed method is a two-step approach. The first step models the association between the genotype and marginal phenotype using a linear mixed model. The second step uses the correlation between residuals of the linear mixed model to estimate the null distribution of the Fisher combination test statistic. The simulation results show that the proposed method controls the type I error rate and is more powerful than the marginal tests across different population structures (admixed or non-admixed) and relatedness (related or independent). The statistical analysis on the database of the Study of Addiction: Genetics and Environment (SAGE) demonstrates that applying the multivariate association test may facilitate identification of the pleiotropic genes contributing to the risk for alcohol dependence commonly expressed by four correlated phenotypes. This study proposes a multivariate method for identifying pleiotropic genes while adjusting for cryptic relatedness and population structure between subjects. The two-step approach is not only powerful but also computationally efficient even when the number of subjects and the number of phenotypes are both very large.

  7. Analytical investigation of different mathematical approaches utilizing manipulation of ratio spectra

    NASA Astrophysics Data System (ADS)

    Osman, Essam Eldin A.

    2018-01-01

    This work represents a comparative study of different approaches of manipulating ratio spectra, applied on a binary mixture of ciprofloxacin HCl and dexamethasone sodium phosphate co-formulated as ear drops. The proposed new spectrophotometric methods are: ratio difference spectrophotometric method (RDSM), amplitude center method (ACM), first derivative of the ratio spectra (1DD) and mean centering of ratio spectra (MCR). The proposed methods were checked using laboratory-prepared mixtures and were successfully applied for the analysis of pharmaceutical formulation containing the cited drugs. The proposed methods were validated according to the ICH guidelines. A comparative study was conducted between those methods regarding simplicity, limitations and sensitivity. The obtained results were statistically compared with those obtained from the reported HPLC method, showing no significant difference with respect to accuracy and precision.

  8. Assessing Discriminative Performance at External Validation of Clinical Prediction Models

    PubMed Central

    Nieboer, Daan; van der Ploeg, Tjeerd; Steyerberg, Ewout W.

    2016-01-01

    Introduction External validation studies are essential to study the generalizability of prediction models. Recently a permutation test, focusing on discrimination as quantified by the c-statistic, was proposed to judge whether a prediction model is transportable to a new setting. We aimed to evaluate this test and compare it to previously proposed procedures to judge any changes in c-statistic from development to external validation setting. Methods We compared the use of the permutation test to the use of benchmark values of the c-statistic following from a previously proposed framework to judge transportability of a prediction model. In a simulation study we developed a prediction model with logistic regression on a development set and validated them in the validation set. We concentrated on two scenarios: 1) the case-mix was more heterogeneous and predictor effects were weaker in the validation set compared to the development set, and 2) the case-mix was less heterogeneous in the validation set and predictor effects were identical in the validation and development set. Furthermore we illustrated the methods in a case study using 15 datasets of patients suffering from traumatic brain injury. Results The permutation test indicated that the validation and development set were homogenous in scenario 1 (in almost all simulated samples) and heterogeneous in scenario 2 (in 17%-39% of simulated samples). Previously proposed benchmark values of the c-statistic and the standard deviation of the linear predictors correctly pointed at the more heterogeneous case-mix in scenario 1 and the less heterogeneous case-mix in scenario 2. Conclusion The recently proposed permutation test may provide misleading results when externally validating prediction models in the presence of case-mix differences between the development and validation population. To correctly interpret the c-statistic found at external validation it is crucial to disentangle case-mix differences from incorrect regression coefficients. PMID:26881753

  9. Personalised news filtering and recommendation system using Chi-square statistics-based K-nearest neighbour (χ2SB-KNN) model

    NASA Astrophysics Data System (ADS)

    Adeniyi, D. A.; Wei, Z.; Yang, Y.

    2017-10-01

    Recommendation problem has been extensively studied by researchers in the field of data mining, database and information retrieval. This study presents the design and realisation of an automated, personalised news recommendations system based on Chi-square statistics-based K-nearest neighbour (χ2SB-KNN) model. The proposed χ2SB-KNN model has the potential to overcome computational complexity and information overloading problems, reduces runtime and speeds up execution process through the use of critical value of χ2 distribution. The proposed recommendation engine can alleviate scalability challenges through combined online pattern discovery and pattern matching for real-time recommendations. This work also showcases the development of a novel method of feature selection referred to as Data Discretisation-Based feature selection method. This is used for selecting the best features for the proposed χ2SB-KNN algorithm at the preprocessing stage of the classification procedures. The implementation of the proposed χ2SB-KNN model is achieved through the use of a developed in-house Java program on an experimental website called OUC newsreaders' website. Finally, we compared the performance of our system with two baseline methods which are traditional Euclidean distance K-nearest neighbour and Naive Bayesian techniques. The result shows a significant improvement of our method over the baseline methods studied.

  10. Sparse Additive Ordinary Differential Equations for Dynamic Gene Regulatory Network Modeling.

    PubMed

    Wu, Hulin; Lu, Tao; Xue, Hongqi; Liang, Hua

    2014-04-02

    The gene regulation network (GRN) is a high-dimensional complex system, which can be represented by various mathematical or statistical models. The ordinary differential equation (ODE) model is one of the popular dynamic GRN models. High-dimensional linear ODE models have been proposed to identify GRNs, but with a limitation of the linear regulation effect assumption. In this article, we propose a sparse additive ODE (SA-ODE) model, coupled with ODE estimation methods and adaptive group LASSO techniques, to model dynamic GRNs that could flexibly deal with nonlinear regulation effects. The asymptotic properties of the proposed method are established and simulation studies are performed to validate the proposed approach. An application example for identifying the nonlinear dynamic GRN of T-cell activation is used to illustrate the usefulness of the proposed method.

  11. Memory Effects and Nonequilibrium Correlations in the Dynamics of Open Quantum Systems

    NASA Astrophysics Data System (ADS)

    Morozov, V. G.

    2018-01-01

    We propose a systematic approach to the dynamics of open quantum systems in the framework of Zubarev's nonequilibrium statistical operator method. The approach is based on the relation between ensemble means of the Hubbard operators and the matrix elements of the reduced statistical operator of an open quantum system. This key relation allows deriving master equations for open systems following a scheme conceptually identical to the scheme used to derive kinetic equations for distribution functions. The advantage of the proposed formalism is that some relevant dynamical correlations between an open system and its environment can be taken into account. To illustrate the method, we derive a non-Markovian master equation containing the contribution of nonequilibrium correlations associated with energy conservation.

  12. Kappa statistic for the clustered dichotomous responses from physicians and patients

    PubMed Central

    Kang, Chaeryon; Qaqish, Bahjat; Monaco, Jane; Sheridan, Stacey L.; Cai, Jianwen

    2013-01-01

    The bootstrap method for estimating the standard error of the kappa statistic in the presence of clustered data is evaluated. Such data arise, for example, in assessing agreement between physicians and their patients regarding their understanding of the physician-patient interaction and discussions. We propose a computationally efficient procedure for generating correlated dichotomous responses for physicians and assigned patients for simulation studies. The simulation result demonstrates that the proposed bootstrap method produces better estimate of the standard error and better coverage performance compared to the asymptotic standard error estimate that ignores dependence among patients within physicians with at least a moderately large number of clusters. An example of an application to a coronary heart disease prevention study is presented. PMID:23533082

  13. Hypothesis testing for band size detection of high-dimensional banded precision matrices.

    PubMed

    An, Baiguo; Guo, Jianhua; Liu, Yufeng

    2014-06-01

    Many statistical analysis procedures require a good estimator for a high-dimensional covariance matrix or its inverse, the precision matrix. When the precision matrix is banded, the Cholesky-based method often yields a good estimator of the precision matrix. One important aspect of this method is determination of the band size of the precision matrix. In practice, crossvalidation is commonly used; however, we show that crossvalidation not only is computationally intensive but can be very unstable. In this paper, we propose a new hypothesis testing procedure to determine the band size in high dimensions. Our proposed test statistic is shown to be asymptotically normal under the null hypothesis, and its theoretical power is studied. Numerical examples demonstrate the effectiveness of our testing procedure.

  14. Small sample estimation of the reliability function for technical products

    NASA Astrophysics Data System (ADS)

    Lyamets, L. L.; Yakimenko, I. V.; Kanishchev, O. A.; Bliznyuk, O. A.

    2017-12-01

    It is demonstrated that, in the absence of big statistic samples obtained as a result of testing complex technical products for failure, statistic estimation of the reliability function of initial elements can be made by the moments method. A formal description of the moments method is given and its advantages in the analysis of small censored samples are discussed. A modified algorithm is proposed for the implementation of the moments method with the use of only the moments at which the failures of initial elements occur.

  15. Android malware detection based on evolutionary super-network

    NASA Astrophysics Data System (ADS)

    Yan, Haisheng; Peng, Lingling

    2018-04-01

    In the paper, an android malware detection method based on evolutionary super-network is proposed in order to improve the precision of android malware detection. Chi square statistics method is used for selecting characteristics on the basis of analyzing android authority. Boolean weighting is utilized for calculating characteristic weight. Processed characteristic vector is regarded as the system training set and test set; hyper edge alternative strategy is used for training super-network classification model, thereby classifying test set characteristic vectors, and it is compared with traditional classification algorithm. The results show that the detection method proposed in the paper is close to or better than traditional classification algorithm. The proposed method belongs to an effective Android malware detection means.

  16. Inference of median difference based on the Box-Cox model in randomized clinical trials.

    PubMed

    Maruo, K; Isogawa, N; Gosho, M

    2015-05-10

    In randomized clinical trials, many medical and biological measurements are not normally distributed and are often skewed. The Box-Cox transformation is a powerful procedure for comparing two treatment groups for skewed continuous variables in terms of a statistical test. However, it is difficult to directly estimate and interpret the location difference between the two groups on the original scale of the measurement. We propose a helpful method that infers the difference of the treatment effect on the original scale in a more easily interpretable form. We also provide statistical analysis packages that consistently include an estimate of the treatment effect, covariance adjustments, standard errors, and statistical hypothesis tests. The simulation study that focuses on randomized parallel group clinical trials with two treatment groups indicates that the performance of the proposed method is equivalent to or better than that of the existing non-parametric approaches in terms of the type-I error rate and power. We illustrate our method with cluster of differentiation 4 data in an acquired immune deficiency syndrome clinical trial. Copyright © 2015 John Wiley & Sons, Ltd.

  17. Fully automatic segmentation of the femur from 3D-CT images using primitive shape recognition and statistical shape models.

    PubMed

    Ben Younes, Lassad; Nakajima, Yoshikazu; Saito, Toki

    2014-03-01

    Femur segmentation is well established and widely used in computer-assisted orthopedic surgery. However, most of the robust segmentation methods such as statistical shape models (SSM) require human intervention to provide an initial position for the SSM. In this paper, we propose to overcome this problem and provide a fully automatic femur segmentation method for CT images based on primitive shape recognition and SSM. Femur segmentation in CT scans was performed using primitive shape recognition based on a robust algorithm such as the Hough transform and RANdom SAmple Consensus. The proposed method is divided into 3 steps: (1) detection of the femoral head as sphere and the femoral shaft as cylinder in the SSM and the CT images, (2) rigid registration between primitives of SSM and CT image to initialize the SSM into the CT image, and (3) fitting of the SSM to the CT image edge using an affine transformation followed by a nonlinear fitting. The automated method provided good results even with a high number of outliers. The difference of segmentation error between the proposed automatic initialization method and a manual initialization method is less than 1 mm. The proposed method detects primitive shape position to initialize the SSM into the target image. Based on primitive shapes, this method overcomes the problem of inter-patient variability. Moreover, the results demonstrate that our method of primitive shape recognition can be used for 3D SSM initialization to achieve fully automatic segmentation of the femur.

  18. Max-AUC Feature Selection in Computer-Aided Detection of Polyps in CT Colonography

    PubMed Central

    Xu, Jian-Wu; Suzuki, Kenji

    2014-01-01

    We propose a feature selection method based on a sequential forward floating selection (SFFS) procedure to improve the performance of a classifier in computerized detection of polyps in CT colonography (CTC). The feature selection method is coupled with a nonlinear support vector machine (SVM) classifier. Unlike the conventional linear method based on Wilks' lambda, the proposed method selected the most relevant features that would maximize the area under the receiver operating characteristic curve (AUC), which directly maximizes classification performance, evaluated based on AUC value, in the computer-aided detection (CADe) scheme. We presented two variants of the proposed method with different stopping criteria used in the SFFS procedure. The first variant searched all feature combinations allowed in the SFFS procedure and selected the subsets that maximize the AUC values. The second variant performed a statistical test at each step during the SFFS procedure, and it was terminated if the increase in the AUC value was not statistically significant. The advantage of the second variant is its lower computational cost. To test the performance of the proposed method, we compared it against the popular stepwise feature selection method based on Wilks' lambda for a colonic-polyp database (25 polyps and 2624 nonpolyps). We extracted 75 morphologic, gray-level-based, and texture features from the segmented lesion candidate regions. The two variants of the proposed feature selection method chose 29 and 7 features, respectively. Two SVM classifiers trained with these selected features yielded a 96% by-polyp sensitivity at false-positive (FP) rates of 4.1 and 6.5 per patient, respectively. Experiments showed a significant improvement in the performance of the classifier with the proposed feature selection method over that with the popular stepwise feature selection based on Wilks' lambda that yielded 18.0 FPs per patient at the same sensitivity level. PMID:24608058

  19. Max-AUC feature selection in computer-aided detection of polyps in CT colonography.

    PubMed

    Xu, Jian-Wu; Suzuki, Kenji

    2014-03-01

    We propose a feature selection method based on a sequential forward floating selection (SFFS) procedure to improve the performance of a classifier in computerized detection of polyps in CT colonography (CTC). The feature selection method is coupled with a nonlinear support vector machine (SVM) classifier. Unlike the conventional linear method based on Wilks' lambda, the proposed method selected the most relevant features that would maximize the area under the receiver operating characteristic curve (AUC), which directly maximizes classification performance, evaluated based on AUC value, in the computer-aided detection (CADe) scheme. We presented two variants of the proposed method with different stopping criteria used in the SFFS procedure. The first variant searched all feature combinations allowed in the SFFS procedure and selected the subsets that maximize the AUC values. The second variant performed a statistical test at each step during the SFFS procedure, and it was terminated if the increase in the AUC value was not statistically significant. The advantage of the second variant is its lower computational cost. To test the performance of the proposed method, we compared it against the popular stepwise feature selection method based on Wilks' lambda for a colonic-polyp database (25 polyps and 2624 nonpolyps). We extracted 75 morphologic, gray-level-based, and texture features from the segmented lesion candidate regions. The two variants of the proposed feature selection method chose 29 and 7 features, respectively. Two SVM classifiers trained with these selected features yielded a 96% by-polyp sensitivity at false-positive (FP) rates of 4.1 and 6.5 per patient, respectively. Experiments showed a significant improvement in the performance of the classifier with the proposed feature selection method over that with the popular stepwise feature selection based on Wilks' lambda that yielded 18.0 FPs per patient at the same sensitivity level.

  20. Rapid and Accurate Multiple Testing Correction and Power Estimation for Millions of Correlated Markers

    PubMed Central

    Han, Buhm; Kang, Hyun Min; Eskin, Eleazar

    2009-01-01

    With the development of high-throughput sequencing and genotyping technologies, the number of markers collected in genetic association studies is growing rapidly, increasing the importance of methods for correcting for multiple hypothesis testing. The permutation test is widely considered the gold standard for accurate multiple testing correction, but it is often computationally impractical for these large datasets. Recently, several studies proposed efficient alternative approaches to the permutation test based on the multivariate normal distribution (MVN). However, they cannot accurately correct for multiple testing in genome-wide association studies for two reasons. First, these methods require partitioning of the genome into many disjoint blocks and ignore all correlations between markers from different blocks. Second, the true null distribution of the test statistic often fails to follow the asymptotic distribution at the tails of the distribution. We propose an accurate and efficient method for multiple testing correction in genome-wide association studies—SLIDE. Our method accounts for all correlation within a sliding window and corrects for the departure of the true null distribution of the statistic from the asymptotic distribution. In simulations using the Wellcome Trust Case Control Consortium data, the error rate of SLIDE's corrected p-values is more than 20 times smaller than the error rate of the previous MVN-based methods' corrected p-values, while SLIDE is orders of magnitude faster than the permutation test and other competing methods. We also extend the MVN framework to the problem of estimating the statistical power of an association study with correlated markers and propose an efficient and accurate power estimation method SLIP. SLIP and SLIDE are available at http://slide.cs.ucla.edu. PMID:19381255

  1. Guided SAR image despeckling with probabilistic non local weights

    NASA Astrophysics Data System (ADS)

    Gokul, Jithin; Nair, Madhu S.; Rajan, Jeny

    2017-12-01

    SAR images are generally corrupted by granular disturbances called speckle, which makes visual analysis and detail extraction a difficult task. Non Local despeckling techniques with probabilistic similarity has been a recent trend in SAR despeckling. To achieve effective speckle suppression without compromising detail preservation, we propose an improvement for the existing Generalized Guided Filter with Bayesian Non-Local Means (GGF-BNLM) method. The proposed method (Guided SAR Image Despeckling with Probabilistic Non Local Weights) replaces parametric constants based on heuristics in GGF-BNLM method with dynamically derived values based on the image statistics for weight computation. Proposed changes make GGF-BNLM method adaptive and as a result, significant improvement is achieved in terms of performance. Experimental analysis on SAR images shows excellent speckle reduction without compromising feature preservation when compared to GGF-BNLM method. Results are also compared with other state-of-the-art and classic SAR depseckling techniques to demonstrate the effectiveness of the proposed method.

  2. 40 CFR Appendix A to Part 63 - Test Methods

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... components by a different analyst). 3.3Surrogate Reference Materials. The analyst may use surrogate compounds... the variance of the proposed method is significantly different from that of the validated method by... variables can be determined in eight experiments rather than 128 (W.J. Youden, Statistical Manual of the...

  3. Online Updating of Statistical Inference in the Big Data Setting.

    PubMed

    Schifano, Elizabeth D; Wu, Jing; Wang, Chun; Yan, Jun; Chen, Ming-Hui

    2016-01-01

    We present statistical methods for big data arising from online analytical processing, where large amounts of data arrive in streams and require fast analysis without storage/access to the historical data. In particular, we develop iterative estimating algorithms and statistical inferences for linear models and estimating equations that update as new data arrive. These algorithms are computationally efficient, minimally storage-intensive, and allow for possible rank deficiencies in the subset design matrices due to rare-event covariates. Within the linear model setting, the proposed online-updating framework leads to predictive residual tests that can be used to assess the goodness-of-fit of the hypothesized model. We also propose a new online-updating estimator under the estimating equation setting. Theoretical properties of the goodness-of-fit tests and proposed estimators are examined in detail. In simulation studies and real data applications, our estimator compares favorably with competing approaches under the estimating equation setting.

  4. Towards a new tool for the evaluation of the quality of ultrasound compressed images.

    PubMed

    Delgorge, Cécile; Rosenberger, Christophe; Poisson, Gérard; Vieyres, Pierre

    2006-11-01

    This paper presents a new tool for the evaluation of ultrasound image compression. The goal is to measure the image quality as easily as with a statistical criterion, and with the same reliability as the one provided by the medical assessment. An initial experiment is proposed to medical experts and represents our reference value for the comparison of evaluation criteria. Twenty-one statistical criteria are selected from the literature. A cumulative absolute similarity measure is defined as a distance between the criterion to evaluate and the reference value. A first fusion method based on a linear combination of criteria is proposed to improve the results obtained by each of them separately. The second proposed approach combines different statistical criteria and uses the medical assessment in a training phase with a support vector machine. Some experimental results are given and show the benefit of fusion.

  5. Quantum-Like Bayesian Networks for Modeling Decision Making

    PubMed Central

    Moreira, Catarina; Wichert, Andreas

    2016-01-01

    In this work, we explore an alternative quantum structure to perform quantum probabilistic inferences to accommodate the paradoxical findings of the Sure Thing Principle. We propose a Quantum-Like Bayesian Network, which consists in replacing classical probabilities by quantum probability amplitudes. However, since this approach suffers from the problem of exponential growth of quantum parameters, we also propose a similarity heuristic that automatically fits quantum parameters through vector similarities. This makes the proposed model general and predictive in contrast to the current state of the art models, which cannot be generalized for more complex decision scenarios and that only provide an explanatory nature for the observed paradoxes. In the end, the model that we propose consists in a nonparametric method for estimating inference effects from a statistical point of view. It is a statistical model that is simpler than the previous quantum dynamic and quantum-like models proposed in the literature. We tested the proposed network with several empirical data from the literature, mainly from the Prisoner's Dilemma game and the Two Stage Gambling game. The results obtained show that the proposed quantum Bayesian Network is a general method that can accommodate violations of the laws of classical probability theory and make accurate predictions regarding human decision-making in these scenarios. PMID:26858669

  6. A new method to address verification bias in studies of clinical screening tests: cervical cancer screening assays as an example.

    PubMed

    Xue, Xiaonan; Kim, Mimi Y; Castle, Philip E; Strickler, Howard D

    2014-03-01

    Studies to evaluate clinical screening tests often face the problem that the "gold standard" diagnostic approach is costly and/or invasive. It is therefore common to verify only a subset of negative screening tests using the gold standard method. However, undersampling the screen negatives can lead to substantial overestimation of the sensitivity and underestimation of the specificity of the diagnostic test. Our objective was to develop a simple and accurate statistical method to address this "verification bias." We developed a weighted generalized estimating equation approach to estimate, in a single model, the accuracy (eg, sensitivity/specificity) of multiple assays and simultaneously compare results between assays while addressing verification bias. This approach can be implemented using standard statistical software. Simulations were conducted to assess the proposed method. An example is provided using a cervical cancer screening trial that compared the accuracy of human papillomavirus and Pap tests, with histologic data as the gold standard. The proposed approach performed well in estimating and comparing the accuracy of multiple assays in the presence of verification bias. The proposed approach is an easy to apply and accurate method for addressing verification bias in studies of multiple screening methods. Copyright © 2014 Elsevier Inc. All rights reserved.

  7. An extended data mining method for identifying differentially expressed assay-specific signatures in functional genomic studies.

    PubMed

    Rollins, Derrick K; Teh, Ailing

    2010-12-17

    Microarray data sets provide relative expression levels for thousands of genes for a small number, in comparison, of different experimental conditions called assays. Data mining techniques are used to extract specific information of genes as they relate to the assays. The multivariate statistical technique of principal component analysis (PCA) has proven useful in providing effective data mining methods. This article extends the PCA approach of Rollins et al. to the development of ranking genes of microarray data sets that express most differently between two biologically different grouping of assays. This method is evaluated on real and simulated data and compared to a current approach on the basis of false discovery rate (FDR) and statistical power (SP) which is the ability to correctly identify important genes. This work developed and evaluated two new test statistics based on PCA and compared them to a popular method that is not PCA based. Both test statistics were found to be effective as evaluated in three case studies: (i) exposing E. coli cells to two different ethanol levels; (ii) application of myostatin to two groups of mice; and (iii) a simulated data study derived from the properties of (ii). The proposed method (PM) effectively identified critical genes in these studies based on comparison with the current method (CM). The simulation study supports higher identification accuracy for PM over CM for both proposed test statistics when the gene variance is constant and for one of the test statistics when the gene variance is non-constant. PM compares quite favorably to CM in terms of lower FDR and much higher SP. Thus, PM can be quite effective in producing accurate signatures from large microarray data sets for differential expression between assays groups identified in a preliminary step of the PCA procedure and is, therefore, recommended for use in these applications.

  8. Dynamic principle for ensemble control tools.

    PubMed

    Samoletov, A; Vasiev, B

    2017-11-28

    Dynamical equations describing physical systems in contact with a thermal bath are commonly extended by mathematical tools called "thermostats." These tools are designed for sampling ensembles in statistical mechanics. Here we propose a dynamic principle underlying a range of thermostats which is derived using fundamental laws of statistical physics and ensures invariance of the canonical measure. The principle covers both stochastic and deterministic thermostat schemes. Our method has a clear advantage over a range of proposed and widely used thermostat schemes that are based on formal mathematical reasoning. Following the derivation of the proposed principle, we show its generality and illustrate its applications including design of temperature control tools that differ from the Nosé-Hoover-Langevin scheme.

  9. Random phase encoding for optical security

    NASA Astrophysics Data System (ADS)

    Wang, RuiKang K.; Watson, Ian A.; Chatwin, Christopher R.

    1996-09-01

    A new optical encoding method for security applications is proposed. The encoded image (encrypted into the security products) is merely a random phase image statistically and randomly generated by a random number generator using a computer, which contains no information from the reference pattern (stored for verification) or the frequency plane filter (a phase-only function for decoding). The phase function in the frequency plane is obtained using a modified phase retrieval algorithm. The proposed method uses two phase-only functions (images) at both the input and frequency planes of the optical processor leading to maximum optical efficiency. Computer simulation shows that the proposed method is robust for optical security applications.

  10. Mean-Reverting Portfolio With Budget Constraint

    NASA Astrophysics Data System (ADS)

    Zhao, Ziping; Palomar, Daniel P.

    2018-05-01

    This paper considers the mean-reverting portfolio design problem arising from statistical arbitrage in the financial markets. We first propose a general problem formulation aimed at finding a portfolio of underlying component assets by optimizing a mean-reversion criterion characterizing the mean-reversion strength, taking into consideration the variance of the portfolio and an investment budget constraint. Then several specific problems are considered based on the general formulation, and efficient algorithms are proposed. Numerical results on both synthetic and market data show that our proposed mean-reverting portfolio design methods can generate consistent profits and outperform the traditional design methods and the benchmark methods in the literature.

  11. Reproducible detection of disease-associated markers from gene expression data.

    PubMed

    Omae, Katsuhiro; Komori, Osamu; Eguchi, Shinto

    2016-08-18

    Detection of disease-associated markers plays a crucial role in gene screening for biological studies. Two-sample test statistics, such as the t-statistic, are widely used to rank genes based on gene expression data. However, the resultant gene ranking is often not reproducible among different data sets. Such irreproducibility may be caused by disease heterogeneity. When we divided data into two subsets, we found that the signs of the two t-statistics were often reversed. Focusing on such instability, we proposed a sign-sum statistic that counts the signs of the t-statistics for all possible subsets. The proposed method excludes genes affected by heterogeneity, thereby improving the reproducibility of gene ranking. We compared the sign-sum statistic with the t-statistic by a theoretical evaluation of the upper confidence limit. Through simulations and applications to real data sets, we show that the sign-sum statistic exhibits superior performance. We derive the sign-sum statistic for getting a robust gene ranking. The sign-sum statistic gives more reproducible ranking than the t-statistic. Using simulated data sets we show that the sign-sum statistic excludes hetero-type genes well. Also for the real data sets, the sign-sum statistic performs well in a viewpoint of ranking reproducibility.

  12. CHOBS: Color Histogram of Block Statistics for Automatic Bleeding Detection in Wireless Capsule Endoscopy Video.

    PubMed

    Ghosh, Tonmoy; Fattah, Shaikh Anowarul; Wahid, Khan A

    2018-01-01

    Wireless capsule endoscopy (WCE) is the most advanced technology to visualize whole gastrointestinal (GI) tract in a non-invasive way. But the major disadvantage here, it takes long reviewing time, which is very laborious as continuous manual intervention is necessary. In order to reduce the burden of the clinician, in this paper, an automatic bleeding detection method for WCE video is proposed based on the color histogram of block statistics, namely CHOBS. A single pixel in WCE image may be distorted due to the capsule motion in the GI tract. Instead of considering individual pixel values, a block surrounding to that individual pixel is chosen for extracting local statistical features. By combining local block features of three different color planes of RGB color space, an index value is defined. A color histogram, which is extracted from those index values, provides distinguishable color texture feature. A feature reduction technique utilizing color histogram pattern and principal component analysis is proposed, which can drastically reduce the feature dimension. For bleeding zone detection, blocks are classified using extracted local features that do not incorporate any computational burden for feature extraction. From extensive experimentation on several WCE videos and 2300 images, which are collected from a publicly available database, a very satisfactory bleeding frame and zone detection performance is achieved in comparison to that obtained by some of the existing methods. In the case of bleeding frame detection, the accuracy, sensitivity, and specificity obtained from proposed method are 97.85%, 99.47%, and 99.15%, respectively, and in the case of bleeding zone detection, 95.75% of precision is achieved. The proposed method offers not only low feature dimension but also highly satisfactory bleeding detection performance, which even can effectively detect bleeding frame and zone in a continuous WCE video data.

  13. Statistical Analysis of the Uncertainty in Pre-Flight Aerodynamic Database of a Hypersonic Vehicle

    NASA Astrophysics Data System (ADS)

    Huh, Lynn

    The objective of the present research was to develop a new method to derive the aerodynamic coefficients and the associated uncertainties for flight vehicles via post- flight inertial navigation analysis using data from the inertial measurement unit. Statistical estimates of vehicle state and aerodynamic coefficients are derived using Monte Carlo simulation. Trajectory reconstruction using the inertial navigation system (INS) is a simple and well used method. However, deriving realistic uncertainties in the reconstructed state and any associated parameters is not so straight forward. Extended Kalman filters, batch minimum variance estimation and other approaches have been used. However, these methods generally depend on assumed physical models, assumed statistical distributions (usually Gaussian) or have convergence issues for non-linear problems. The approach here assumes no physical models, is applicable to any statistical distribution, and does not have any convergence issues. The new approach obtains the statistics directly from a sufficient number of Monte Carlo samples using only the generally well known gyro and accelerometer specifications and could be applied to the systems of non-linear form and non-Gaussian distribution. When redundant data are available, the set of Monte Carlo simulations are constrained to satisfy the redundant data within the uncertainties specified for the additional data. The proposed method was applied to validate the uncertainty in the pre-flight aerodynamic database of the X-43A Hyper-X research vehicle. In addition to gyro and acceleration data, the actual flight data include redundant measurements of position and velocity from the global positioning system (GPS). The criteria derived from the blend of the GPS and INS accuracy was used to select valid trajectories for statistical analysis. The aerodynamic coefficients were derived from the selected trajectories by either direct extraction method based on the equations in dynamics, or by the inquiry of the pre-flight aerodynamic database. After the application of the proposed method to the case of the X-43A Hyper-X research vehicle, it was found that 1) there were consistent differences in the aerodynamic coefficients from the pre-flight aerodynamic database and post-flight analysis, 2) the pre-flight estimation of the pitching moment coefficients was significantly different from the post-flight analysis, 3) the type of distribution of the states from the Monte Carlo simulation were affected by that of the perturbation parameters, 4) the uncertainties in the pre-flight model were overestimated, 5) the range where the aerodynamic coefficients from the pre-flight aerodynamic database and post-flight analysis are in closest agreement is between Mach *.* and *.* and more data points may be needed between Mach * and ** in the pre-flight aerodynamic database, 6) selection criterion for valid trajectories from the Monte Carlo simulations was mostly driven by the horizontal velocity error, 7) the selection criterion must be based on reasonable model to ensure the validity of the statistics from the proposed method, and 8) the results from the proposed method applied to the two different flights with the identical geometry and similar flight profile were consistent.

  14. Statistical Methods and Tools for Uxo Characterization (SERDP Final Technical Report)

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Pulsipher, Brent A.; Gilbert, Richard O.; Wilson, John E.

    2004-11-15

    The Strategic Environmental Research and Development Program (SERDP) issued a statement of need for FY01 titled Statistical Sampling for Unexploded Ordnance (UXO) Site Characterization that solicited proposals to develop statistically valid sampling protocols for cost-effective, practical, and reliable investigation of sites contaminated with UXO; protocols that could be validated through subsequent field demonstrations. The SERDP goal was the development of a sampling strategy for which a fraction of the site is initially surveyed by geophysical detectors to confidently identify clean areas and subsections (target areas, TAs) that had elevated densities of anomalous geophysical detector readings that could indicate the presencemore » of UXO. More detailed surveys could then be conducted to search the identified TAs for UXO. SERDP funded three projects: those proposed by the Pacific Northwest National Laboratory (PNNL) (SERDP Project No. UXO 1199), Sandia National Laboratory (SNL), and Oak Ridge National Laboratory (ORNL). The projects were closely coordinated to minimize duplication of effort and facilitate use of shared algorithms where feasible. This final report for PNNL Project 1199 describes the methods developed by PNNL to address SERDP's statement-of-need for the development of statistically-based geophysical survey methods for sites where 100% surveys are unattainable or cost prohibitive.« less

  15. Sample size and power considerations in network meta-analysis

    PubMed Central

    2012-01-01

    Background Network meta-analysis is becoming increasingly popular for establishing comparative effectiveness among multiple interventions for the same disease. Network meta-analysis inherits all methodological challenges of standard pairwise meta-analysis, but with increased complexity due to the multitude of intervention comparisons. One issue that is now widely recognized in pairwise meta-analysis is the issue of sample size and statistical power. This issue, however, has so far only received little attention in network meta-analysis. To date, no approaches have been proposed for evaluating the adequacy of the sample size, and thus power, in a treatment network. Findings In this article, we develop easy-to-use flexible methods for estimating the ‘effective sample size’ in indirect comparison meta-analysis and network meta-analysis. The effective sample size for a particular treatment comparison can be interpreted as the number of patients in a pairwise meta-analysis that would provide the same degree and strength of evidence as that which is provided in the indirect comparison or network meta-analysis. We further develop methods for retrospectively estimating the statistical power for each comparison in a network meta-analysis. We illustrate the performance of the proposed methods for estimating effective sample size and statistical power using data from a network meta-analysis on interventions for smoking cessation including over 100 trials. Conclusion The proposed methods are easy to use and will be of high value to regulatory agencies and decision makers who must assess the strength of the evidence supporting comparative effectiveness estimates. PMID:22992327

  16. Dynamic association rules for gene expression data analysis.

    PubMed

    Chen, Shu-Chuan; Tsai, Tsung-Hsien; Chung, Cheng-Han; Li, Wen-Hsiung

    2015-10-14

    The purpose of gene expression analysis is to look for the association between regulation of gene expression levels and phenotypic variations. This association based on gene expression profile has been used to determine whether the induction/repression of genes correspond to phenotypic variations including cell regulations, clinical diagnoses and drug development. Statistical analyses on microarray data have been developed to resolve gene selection issue. However, these methods do not inform us of causality between genes and phenotypes. In this paper, we propose the dynamic association rule algorithm (DAR algorithm) which helps ones to efficiently select a subset of significant genes for subsequent analysis. The DAR algorithm is based on association rules from market basket analysis in marketing. We first propose a statistical way, based on constructing a one-sided confidence interval and hypothesis testing, to determine if an association rule is meaningful. Based on the proposed statistical method, we then developed the DAR algorithm for gene expression data analysis. The method was applied to analyze four microarray datasets and one Next Generation Sequencing (NGS) dataset: the Mice Apo A1 dataset, the whole genome expression dataset of mouse embryonic stem cells, expression profiling of the bone marrow of Leukemia patients, Microarray Quality Control (MAQC) data set and the RNA-seq dataset of a mouse genomic imprinting study. A comparison of the proposed method with the t-test on the expression profiling of the bone marrow of Leukemia patients was conducted. We developed a statistical way, based on the concept of confidence interval, to determine the minimum support and minimum confidence for mining association relationships among items. With the minimum support and minimum confidence, one can find significant rules in one single step. The DAR algorithm was then developed for gene expression data analysis. Four gene expression datasets showed that the proposed DAR algorithm not only was able to identify a set of differentially expressed genes that largely agreed with that of other methods, but also provided an efficient and accurate way to find influential genes of a disease. In the paper, the well-established association rule mining technique from marketing has been successfully modified to determine the minimum support and minimum confidence based on the concept of confidence interval and hypothesis testing. It can be applied to gene expression data to mine significant association rules between gene regulation and phenotype. The proposed DAR algorithm provides an efficient way to find influential genes that underlie the phenotypic variance.

  17. A Complete Color Normalization Approach to Histopathology Images Using Color Cues Computed From Saturation-Weighted Statistics.

    PubMed

    Li, Xingyu; Plataniotis, Konstantinos N

    2015-07-01

    In digital histopathology, tasks of segmentation and disease diagnosis are achieved by quantitative analysis of image content. However, color variation in image samples makes it challenging to produce reliable results. This paper introduces a complete normalization scheme to address the problem of color variation in histopathology images jointly caused by inconsistent biopsy staining and nonstandard imaging condition. Method : Different from existing normalization methods that either address partial cause of color variation or lump them together, our method identifies causes of color variation based on a microscopic imaging model and addresses inconsistency in biopsy imaging and staining by an illuminant normalization module and a spectral normalization module, respectively. In evaluation, we use two public datasets that are representative of histopathology images commonly received in clinics to examine the proposed method from the aspects of robustness to system settings, performance consistency against achromatic pixels, and normalization effectiveness in terms of histological information preservation. As the saturation-weighted statistics proposed in this study generates stable and reliable color cues for stain normalization, our scheme is robust to system parameters and insensitive to image content and achromatic colors. Extensive experimentation suggests that our approach outperforms state-of-the-art normalization methods as the proposed method is the only approach that succeeds to preserve histological information after normalization. The proposed color normalization solution would be useful to mitigate effects of color variation in pathology images on subsequent quantitative analysis.

  18. Simultaneous determination of some cholesterol-lowering drugs in their binary mixture by novel spectrophotometric methods

    NASA Astrophysics Data System (ADS)

    Lotfy, Hayam Mahmoud; Hegazy, Maha Abdel Monem

    2013-09-01

    Four simple, specific, accurate and precise spectrophotometric methods manipulating ratio spectra were developed and validated for simultaneous determination of simvastatin (SM) and ezetimibe (EZ) namely; extended ratio subtraction (EXRSM), simultaneous ratio subtraction (SRSM), ratio difference (RDSM) and absorption factor (AFM). The proposed spectrophotometric procedures do not require any preliminary separation step. The accuracy, precision and linearity ranges of the proposed methods were determined, and the methods were validated and the specificity was assessed by analyzing synthetic mixtures containing the cited drugs. The four methods were applied for the determination of the cited drugs in tablets and the obtained results were statistically compared with each other and with those of a reported HPLC method. The comparison showed that there is no significant difference between the proposed methods and the reported method regarding both accuracy and precision.

  19. Scene-based nonuniformity correction using local constant statistics.

    PubMed

    Zhang, Chao; Zhao, Wenyi

    2008-06-01

    In scene-based nonuniformity correction, the statistical approach assumes all possible values of the true-scene pixel are seen at each pixel location. This global-constant-statistics assumption does not distinguish fixed pattern noise from spatial variations in the average image. This often causes the "ghosting" artifacts in the corrected images since the existing spatial variations are treated as noises. We introduce a new statistical method to reduce the ghosting artifacts. Our method proposes a local-constant statistics that assumes that the temporal signal distribution is not constant at each pixel but is locally true. This considers statistically a constant distribution in a local region around each pixel but uneven distribution in a larger scale. Under the assumption that the fixed pattern noise concentrates in a higher spatial-frequency domain than the distribution variation, we apply a wavelet method to the gain and offset image of the noise and separate out the pattern noise from the spatial variations in the temporal distribution of the scene. We compare the results to the global-constant-statistics method using a clean sequence with large artificial pattern noises. We also apply the method to a challenging CCD video sequence and a LWIR sequence to show how effective it is in reducing noise and the ghosting artifacts.

  20. Safe semi-supervised learning based on weighted likelihood.

    PubMed

    Kawakita, Masanori; Takeuchi, Jun'ichi

    2014-05-01

    We are interested in developing a safe semi-supervised learning that works in any situation. Semi-supervised learning postulates that n(') unlabeled data are available in addition to n labeled data. However, almost all of the previous semi-supervised methods require additional assumptions (not only unlabeled data) to make improvements on supervised learning. If such assumptions are not met, then the methods possibly perform worse than supervised learning. Sokolovska, Cappé, and Yvon (2008) proposed a semi-supervised method based on a weighted likelihood approach. They proved that this method asymptotically never performs worse than supervised learning (i.e., it is safe) without any assumption. Their method is attractive because it is easy to implement and is potentially general. Moreover, it is deeply related to a certain statistical paradox. However, the method of Sokolovska et al. (2008) assumes a very limited situation, i.e., classification, discrete covariates, n(')→∞ and a maximum likelihood estimator. In this paper, we extend their method by modifying the weight. We prove that our proposal is safe in a significantly wide range of situations as long as n≤n('). Further, we give a geometrical interpretation of the proof of safety through the relationship with the above-mentioned statistical paradox. Finally, we show that the above proposal is asymptotically safe even when n(')

  1. A new statistical approach to climate change detection and attribution

    NASA Astrophysics Data System (ADS)

    Ribes, Aurélien; Zwiers, Francis W.; Azaïs, Jean-Marc; Naveau, Philippe

    2017-01-01

    We propose here a new statistical approach to climate change detection and attribution that is based on additive decomposition and simple hypothesis testing. Most current statistical methods for detection and attribution rely on linear regression models where the observations are regressed onto expected response patterns to different external forcings. These methods do not use physical information provided by climate models regarding the expected response magnitudes to constrain the estimated responses to the forcings. Climate modelling uncertainty is difficult to take into account with regression based methods and is almost never treated explicitly. As an alternative to this approach, our statistical model is only based on the additivity assumption; the proposed method does not regress observations onto expected response patterns. We introduce estimation and testing procedures based on likelihood maximization, and show that climate modelling uncertainty can easily be accounted for. Some discussion is provided on how to practically estimate the climate modelling uncertainty based on an ensemble of opportunity. Our approach is based on the " models are statistically indistinguishable from the truth" paradigm, where the difference between any given model and the truth has the same distribution as the difference between any pair of models, but other choices might also be considered. The properties of this approach are illustrated and discussed based on synthetic data. Lastly, the method is applied to the linear trend in global mean temperature over the period 1951-2010. Consistent with the last IPCC assessment report, we find that most of the observed warming over this period (+0.65 K) is attributable to anthropogenic forcings (+0.67 ± 0.12 K, 90 % confidence range), with a very limited contribution from natural forcings (-0.01± 0.02 K).

  2. Quantifying (dis)agreement between direct detection experiments in a halo-independent way

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Feldstein, Brian; Kahlhoefer, Felix, E-mail: brian.feldstein@physics.ox.ac.uk, E-mail: felix.kahlhoefer@physics.ox.ac.uk

    We propose an improved method to study recent and near-future dark matter direct detection experiments with small numbers of observed events. Our method determines in a quantitative and halo-independent way whether the experiments point towards a consistent dark matter signal and identifies the best-fit dark matter parameters. To achieve true halo independence, we apply a recently developed method based on finding the velocity distribution that best describes a given set of data. For a quantitative global analysis we construct a likelihood function suitable for small numbers of events, which allows us to determine the best-fit particle physics properties of darkmore » matter considering all experiments simultaneously. Based on this likelihood function we propose a new test statistic that quantifies how well the proposed model fits the data and how large the tension between different direct detection experiments is. We perform Monte Carlo simulations in order to determine the probability distribution function of this test statistic and to calculate the p-value for both the dark matter hypothesis and the background-only hypothesis.« less

  3. Improved score statistics for meta-analysis in single-variant and gene-level association studies.

    PubMed

    Yang, Jingjing; Chen, Sai; Abecasis, Gonçalo

    2018-06-01

    Meta-analysis is now an essential tool for genetic association studies, allowing them to combine large studies and greatly accelerating the pace of genetic discovery. Although the standard meta-analysis methods perform equivalently as the more cumbersome joint analysis under ideal settings, they result in substantial power loss under unbalanced settings with various case-control ratios. Here, we investigate the power loss problem by the standard meta-analysis methods for unbalanced studies, and further propose novel meta-analysis methods performing equivalently to the joint analysis under both balanced and unbalanced settings. We derive improved meta-score-statistics that can accurately approximate the joint-score-statistics with combined individual-level data, for both linear and logistic regression models, with and without covariates. In addition, we propose a novel approach to adjust for population stratification by correcting for known population structures through minor allele frequencies. In the simulated gene-level association studies under unbalanced settings, our method recovered up to 85% power loss caused by the standard methods. We further showed the power gain of our methods in gene-level tests with 26 unbalanced studies of age-related macular degeneration . In addition, we took the meta-analysis of three unbalanced studies of type 2 diabetes as an example to discuss the challenges of meta-analyzing multi-ethnic samples. In summary, our improved meta-score-statistics with corrections for population stratification can be used to construct both single-variant and gene-level association studies, providing a useful framework for ensuring well-powered, convenient, cross-study analyses. © 2018 WILEY PERIODICALS, INC.

  4. Approximations to the distribution of a test statistic in covariance structure analysis: A comprehensive study.

    PubMed

    Wu, Hao

    2018-05-01

    In structural equation modelling (SEM), a robust adjustment to the test statistic or to its reference distribution is needed when its null distribution deviates from a χ 2 distribution, which usually arises when data do not follow a multivariate normal distribution. Unfortunately, existing studies on this issue typically focus on only a few methods and neglect the majority of alternative methods in statistics. Existing simulation studies typically consider only non-normal distributions of data that either satisfy asymptotic robustness or lead to an asymptotic scaled χ 2 distribution. In this work we conduct a comprehensive study that involves both typical methods in SEM and less well-known methods from the statistics literature. We also propose the use of several novel non-normal data distributions that are qualitatively different from the non-normal distributions widely used in existing studies. We found that several under-studied methods give the best performance under specific conditions, but the Satorra-Bentler method remains the most viable method for most situations. © 2017 The British Psychological Society.

  5. Improving UWB-Based Localization in IoT Scenarios with Statistical Models of Distance Error.

    PubMed

    Monica, Stefania; Ferrari, Gianluigi

    2018-05-17

    Interest in the Internet of Things (IoT) is rapidly increasing, as the number of connected devices is exponentially growing. One of the application scenarios envisaged for IoT technologies involves indoor localization and context awareness. In this paper, we focus on a localization approach that relies on a particular type of communication technology, namely Ultra Wide Band (UWB). UWB technology is an attractive choice for indoor localization, owing to its high accuracy. Since localization algorithms typically rely on estimated inter-node distances, the goal of this paper is to evaluate the improvement brought by a simple (linear) statistical model of the distance error. On the basis of an extensive experimental measurement campaign, we propose a general analytical framework, based on a Least Square (LS) method, to derive a novel statistical model for the range estimation error between a pair of UWB nodes. The proposed statistical model is then applied to improve the performance of a few illustrative localization algorithms in various realistic scenarios. The obtained experimental results show that the use of the proposed statistical model improves the accuracy of the considered localization algorithms with a reduction of the localization error up to 66%.

  6. Heteroscedastic Tests Statistics for One-Way Analysis of Variance: The Trimmed Means and Hall's Transformation Conjunction

    ERIC Educational Resources Information Center

    Luh, Wei-Ming; Guo, Jiin-Huarng

    2005-01-01

    To deal with nonnormal and heterogeneous data for the one-way fixed effect analysis of variance model, the authors adopted a trimmed means method in conjunction with Hall's invertible transformation into a heteroscedastic test statistic (Alexander-Govern test or Welch test). The results of simulation experiments showed that the proposed technique…

  7. Application of binomial and multinomial probability statistics to the sampling design process of a global grain tracing and recall system

    USDA-ARS?s Scientific Manuscript database

    Small, coded, pill-sized tracers embedded in grain are proposed as a method for grain traceability. A sampling process for a grain traceability system was designed and investigated by applying probability statistics using a science-based sampling approach to collect an adequate number of tracers fo...

  8. Robust functional statistics applied to Probability Density Function shape screening of sEMG data.

    PubMed

    Boudaoud, S; Rix, H; Al Harrach, M; Marin, F

    2014-01-01

    Recent studies pointed out possible shape modifications of the Probability Density Function (PDF) of surface electromyographical (sEMG) data according to several contexts like fatigue and muscle force increase. Following this idea, criteria have been proposed to monitor these shape modifications mainly using High Order Statistics (HOS) parameters like skewness and kurtosis. In experimental conditions, these parameters are confronted with small sample size in the estimation process. This small sample size induces errors in the estimated HOS parameters restraining real-time and precise sEMG PDF shape monitoring. Recently, a functional formalism, the Core Shape Model (CSM), has been used to analyse shape modifications of PDF curves. In this work, taking inspiration from CSM method, robust functional statistics are proposed to emulate both skewness and kurtosis behaviors. These functional statistics combine both kernel density estimation and PDF shape distances to evaluate shape modifications even in presence of small sample size. Then, the proposed statistics are tested, using Monte Carlo simulations, on both normal and Log-normal PDFs that mimic observed sEMG PDF shape behavior during muscle contraction. According to the obtained results, the functional statistics seem to be more robust than HOS parameters to small sample size effect and more accurate in sEMG PDF shape screening applications.

  9. The Relationship Between Procrastination, Learning Strategies and Statistics Anxiety Among Iranian College Students: A Canonical Correlation Analysis

    PubMed Central

    Vahedi, Shahrum; Farrokhi, Farahman; Gahramani, Farahnaz; Issazadegan, Ali

    2012-01-01

    Objective: Approximately 66-80%of graduate students experience statistics anxiety and some researchers propose that many students identify statistics courses as the most anxiety-inducing courses in their academic curriculums. As such, it is likely that statistics anxiety is, in part, responsible for many students delaying enrollment in these courses for as long as possible. This paper proposes a canonical model by treating academic procrastination (AP), learning strategies (LS) as predictor variables and statistics anxiety (SA) as explained variables. Methods: A questionnaire survey was used for data collection and 246-college female student participated in this study. To examine the mutually independent relations between procrastination, learning strategies and statistics anxiety variables, a canonical correlation analysis was computed. Results: Findings show that two canonical functions were statistically significant. The set of variables (metacognitive self-regulation, source management, preparing homework, preparing for test and preparing term papers) helped predict changes of statistics anxiety with respect to fearful behavior, Attitude towards math and class, Performance, but not Anxiety. Conclusion: These findings could be used in educational and psychological interventions in the context of statistics anxiety reduction. PMID:24644468

  10. Statistical methods for conducting agreement (comparison of clinical tests) and precision (repeatability or reproducibility) studies in optometry and ophthalmology.

    PubMed

    McAlinden, Colm; Khadka, Jyoti; Pesudovs, Konrad

    2011-07-01

    The ever-expanding choice of ocular metrology and imaging equipment has driven research into the validity of their measurements. Consequently, studies of the agreement between two instruments or clinical tests have proliferated in the ophthalmic literature. It is important that researchers apply the appropriate statistical tests in agreement studies. Correlation coefficients are hazardous and should be avoided. The 'limits of agreement' method originally proposed by Altman and Bland in 1983 is the statistical procedure of choice. Its step-by-step use and practical considerations in relation to optometry and ophthalmology are detailed in addition to sample size considerations and statistical approaches to precision (repeatability or reproducibility) estimates. Ophthalmic & Physiological Optics © 2011 The College of Optometrists.

  11. No-reference image quality assessment based on statistics of convolution feature maps

    NASA Astrophysics Data System (ADS)

    Lv, Xiaoxin; Qin, Min; Chen, Xiaohui; Wei, Guo

    2018-04-01

    We propose a Convolutional Feature Maps (CFM) driven approach to accurately predict image quality. Our motivation bases on the finding that the Nature Scene Statistic (NSS) features on convolution feature maps are significantly sensitive to distortion degree of an image. In our method, a Convolutional Neural Network (CNN) is trained to obtain kernels for generating CFM. We design a forward NSS layer which performs on CFM to better extract NSS features. The quality aware features derived from the output of NSS layer is effective to describe the distortion type and degree an image suffered. Finally, a Support Vector Regression (SVR) is employed in our No-Reference Image Quality Assessment (NR-IQA) model to predict a subjective quality score of a distorted image. Experiments conducted on two public databases demonstrate the promising performance of the proposed method is competitive to state of the art NR-IQA methods.

  12. Fuzzy interval Finite Element/Statistical Energy Analysis for mid-frequency analysis of built-up systems with mixed fuzzy and interval parameters

    NASA Astrophysics Data System (ADS)

    Yin, Hui; Yu, Dejie; Yin, Shengwen; Xia, Baizhan

    2016-10-01

    This paper introduces mixed fuzzy and interval parametric uncertainties into the FE components of the hybrid Finite Element/Statistical Energy Analysis (FE/SEA) model for mid-frequency analysis of built-up systems, thus an uncertain ensemble combining non-parametric with mixed fuzzy and interval parametric uncertainties comes into being. A fuzzy interval Finite Element/Statistical Energy Analysis (FIFE/SEA) framework is proposed to obtain the uncertain responses of built-up systems, which are described as intervals with fuzzy bounds, termed as fuzzy-bounded intervals (FBIs) in this paper. Based on the level-cut technique, a first-order fuzzy interval perturbation FE/SEA (FFIPFE/SEA) and a second-order fuzzy interval perturbation FE/SEA method (SFIPFE/SEA) are developed to handle the mixed parametric uncertainties efficiently. FFIPFE/SEA approximates the response functions by the first-order Taylor series, while SFIPFE/SEA improves the accuracy by considering the second-order items of Taylor series, in which all the mixed second-order items are neglected. To further improve the accuracy, a Chebyshev fuzzy interval method (CFIM) is proposed, in which the Chebyshev polynomials is used to approximate the response functions. The FBIs are eventually reconstructed by assembling the extrema solutions at all cut levels. Numerical results on two built-up systems verify the effectiveness of the proposed methods.

  13. Detection of crossover time scales in multifractal detrended fluctuation analysis

    NASA Astrophysics Data System (ADS)

    Ge, Erjia; Leung, Yee

    2013-04-01

    Fractal is employed in this paper as a scale-based method for the identification of the scaling behavior of time series. Many spatial and temporal processes exhibiting complex multi(mono)-scaling behaviors are fractals. One of the important concepts in fractals is crossover time scale(s) that separates distinct regimes having different fractal scaling behaviors. A common method is multifractal detrended fluctuation analysis (MF-DFA). The detection of crossover time scale(s) is, however, relatively subjective since it has been made without rigorous statistical procedures and has generally been determined by eye balling or subjective observation. Crossover time scales such determined may be spurious and problematic. It may not reflect the genuine underlying scaling behavior of a time series. The purpose of this paper is to propose a statistical procedure to model complex fractal scaling behaviors and reliably identify the crossover time scales under MF-DFA. The scaling-identification regression model, grounded on a solid statistical foundation, is first proposed to describe multi-scaling behaviors of fractals. Through the regression analysis and statistical inference, we can (1) identify the crossover time scales that cannot be detected by eye-balling observation, (2) determine the number and locations of the genuine crossover time scales, (3) give confidence intervals for the crossover time scales, and (4) establish the statistically significant regression model depicting the underlying scaling behavior of a time series. To substantive our argument, the regression model is applied to analyze the multi-scaling behaviors of avian-influenza outbreaks, water consumption, daily mean temperature, and rainfall of Hong Kong. Through the proposed model, we can have a deeper understanding of fractals in general and a statistical approach to identify multi-scaling behavior under MF-DFA in particular.

  14. A comparative study of smart spectrophotometric methods for simultaneous determination of sitagliptin phosphate and metformin hydrochloride in their binary mixture.

    PubMed

    Lotfy, Hayam M; Mohamed, Dalia; Mowaka, Shereen

    2015-01-01

    Simple, specific, accurate and precise spectrophotometric methods were developed and validated for the simultaneous determination of the oral antidiabetic drugs; sitagliptin phosphate (STG) and metformin hydrochloride (MET) in combined pharmaceutical formulations. Three methods were manipulating ratio spectra namely; ratio difference (RD), ratio subtraction (RS) and a novel approach of induced amplitude modulation (IAM) methods. The first two methods were used for determination of STG, while MET was directly determined by measuring its absorbance at λmax 232 nm. However, (IAM) was used for the simultaneous determination of both drugs. Moreover, another three methods were developed based on derivative spectroscopy followed by mathematical manipulation steps namely; amplitude factor (P-factor), amplitude subtraction (AS) and modified amplitude subtraction (MAS). In addition, in this work the novel sample enrichment technique named spectrum addition was adopted. The proposed spectrophotometric methods did not require any preliminary separation step. The accuracy, precision and linearity ranges of the proposed methods were determined. The selectivity of the developed methods was investigated by analyzing laboratory prepared mixtures of the drugs and their combined pharmaceutical formulations. Standard deviation values were less than 1.5 in the assay of raw materials and tablets. The obtained results were statistically compared to that of a reported spectrophotometric method. The statistical comparison showed that there was no significant difference between the proposed methods and the reported one regarding both accuracy and precision. Copyright © 2015 Elsevier B.V. All rights reserved.

  15. Indirect potentiometric titration of ascorbic acid in pharmaceutical preparations using copper based mercury film electrode.

    PubMed

    Abdul Kamal Nazer, Meeran Mohideen; Hameed, Abdul Rahman Shahul; Riyazuddin, Patel

    2004-01-01

    A simple and rapid potentiometric method for the estimation of ascorbic acid in pharmaceutical dosage forms has been developed. The method is based on treating ascorbic acid with iodine and titration of the iodide produced equivalent to ascorbic acid with silver nitrate using Copper Based Mercury Film Electrode (CBMFE) as an indicator electrode. Interference study was carried to check possible interference of usual excipients and other vitamins. The precision and accuracy of the method was assessed by the application of lack-of-fit test and other statistical methods. The results of the proposed method and British Pharmacopoeia method were compared using F and t-statistical tests of significance.

  16. Fragment size distribution statistics in dynamic fragmentation of laser shock-loaded tin

    NASA Astrophysics Data System (ADS)

    He, Weihua; Xin, Jianting; Zhao, Yongqiang; Chu, Genbai; Xi, Tao; Shui, Min; Lu, Feng; Gu, Yuqiu

    2017-06-01

    This work investigates the geometric statistics method to characterize the size distribution of tin fragments produced in the laser shock-loaded dynamic fragmentation process. In the shock experiments, the ejection of the tin sample with etched V-shape groove in the free surface are collected by the soft recovery technique. Subsequently, the produced fragments are automatically detected with the fine post-shot analysis techniques including the X-ray micro-tomography and the improved watershed method. To characterize the size distributions of the fragments, a theoretical random geometric statistics model based on Poisson mixtures is derived for dynamic heterogeneous fragmentation problem, which reveals linear combinational exponential distribution. The experimental data related to fragment size distributions of the laser shock-loaded tin sample are examined with the proposed theoretical model, and its fitting performance is compared with that of other state-of-the-art fragment size distribution models. The comparison results prove that our proposed model can provide far more reasonable fitting result for the laser shock-loaded tin.

  17. Sound source measurement by using a passive sound insulation and a statistical approach

    NASA Astrophysics Data System (ADS)

    Dragonetti, Raffaele; Di Filippo, Sabato; Mercogliano, Francesco; Romano, Rosario A.

    2015-10-01

    This paper describes a measurement technique developed by the authors that allows carrying out acoustic measurements inside noisy environments reducing background noise effects. The proposed method is based on the integration of a traditional passive noise insulation system with a statistical approach. The latter is applied to signals picked up by usual sensors (microphones and accelerometers) equipping the passive sound insulation system. The statistical approach allows improving of the sound insulation given only by the passive sound insulation system at low frequency. The developed measurement technique has been validated by means of numerical simulations and measurements carried out inside a real noisy environment. For the case-studies here reported, an average improvement of about 10 dB has been obtained in a frequency range up to about 250 Hz. Considerations on the lower sound pressure level that can be measured by applying the proposed method and the measurement error related to its application are reported as well.

  18. Systematic Error Modeling and Bias Estimation

    PubMed Central

    Zhang, Feihu; Knoll, Alois

    2016-01-01

    This paper analyzes the statistic properties of the systematic error in terms of range and bearing during the transformation process. Furthermore, we rely on a weighted nonlinear least square method to calculate the biases based on the proposed models. The results show the high performance of the proposed approach for error modeling and bias estimation. PMID:27213386

  19. Efficient Regressions via Optimally Combining Quantile Information*

    PubMed Central

    Zhao, Zhibiao; Xiao, Zhijie

    2014-01-01

    We develop a generally applicable framework for constructing efficient estimators of regression models via quantile regressions. The proposed method is based on optimally combining information over multiple quantiles and can be applied to a broad range of parametric and nonparametric settings. When combining information over a fixed number of quantiles, we derive an upper bound on the distance between the efficiency of the proposed estimator and the Fisher information. As the number of quantiles increases, this upper bound decreases and the asymptotic variance of the proposed estimator approaches the Cramér-Rao lower bound under appropriate conditions. In the case of non-regular statistical estimation, the proposed estimator leads to super-efficient estimation. We illustrate the proposed method for several widely used regression models. Both asymptotic theory and Monte Carlo experiments show the superior performance over existing methods. PMID:25484481

  20. New methods of testing nonlinear hypothesis using iterative NLLS estimator

    NASA Astrophysics Data System (ADS)

    Mahaboob, B.; Venkateswarlu, B.; Mokeshrayalu, G.; Balasiddamuni, P.

    2017-11-01

    This research paper discusses the method of testing nonlinear hypothesis using iterative Nonlinear Least Squares (NLLS) estimator. Takeshi Amemiya [1] explained this method. However in the present research paper, a modified Wald test statistic due to Engle, Robert [6] is proposed to test the nonlinear hypothesis using iterative NLLS estimator. An alternative method for testing nonlinear hypothesis using iterative NLLS estimator based on nonlinear hypothesis using iterative NLLS estimator based on nonlinear studentized residuals has been proposed. In this research article an innovative method of testing nonlinear hypothesis using iterative restricted NLLS estimator is derived. Pesaran and Deaton [10] explained the methods of testing nonlinear hypothesis. This paper uses asymptotic properties of nonlinear least squares estimator proposed by Jenrich [8]. The main purpose of this paper is to provide very innovative methods of testing nonlinear hypothesis using iterative NLLS estimator, iterative NLLS estimator based on nonlinear studentized residuals and iterative restricted NLLS estimator. Eakambaram et al. [12] discussed least absolute deviation estimations versus nonlinear regression model with heteroscedastic errors and also they studied the problem of heteroscedasticity with reference to nonlinear regression models with suitable illustration. William Grene [13] examined the interaction effect in nonlinear models disused by Ai and Norton [14] and suggested ways to examine the effects that do not involve statistical testing. Peter [15] provided guidelines for identifying composite hypothesis and addressing the probability of false rejection for multiple hypotheses.

  1. 3D Markov Process for Traffic Flow Prediction in Real-Time.

    PubMed

    Ko, Eunjeong; Ahn, Jinyoung; Kim, Eun Yi

    2016-01-25

    Recently, the correct estimation of traffic flow has begun to be considered an essential component in intelligent transportation systems. In this paper, a new statistical method to predict traffic flows using time series analyses and geometric correlations is proposed. The novelty of the proposed method is two-fold: (1) a 3D heat map is designed to describe the traffic conditions between roads, which can effectively represent the correlations between spatially- and temporally-adjacent traffic states; and (2) the relationship between the adjacent roads on the spatiotemporal domain is represented by cliques in MRF and the clique parameters are obtained by example-based learning. In order to assess the validity of the proposed method, it is tested using data from expressway traffic that are provided by the Korean Expressway Corporation, and the performance of the proposed method is compared with existing approaches. The results demonstrate that the proposed method can predict traffic conditions with an accuracy of 85%, and this accuracy can be improved further.

  2. 3D Markov Process for Traffic Flow Prediction in Real-Time

    PubMed Central

    Ko, Eunjeong; Ahn, Jinyoung; Kim, Eun Yi

    2016-01-01

    Recently, the correct estimation of traffic flow has begun to be considered an essential component in intelligent transportation systems. In this paper, a new statistical method to predict traffic flows using time series analyses and geometric correlations is proposed. The novelty of the proposed method is two-fold: (1) a 3D heat map is designed to describe the traffic conditions between roads, which can effectively represent the correlations between spatially- and temporally-adjacent traffic states; and (2) the relationship between the adjacent roads on the spatiotemporal domain is represented by cliques in MRF and the clique parameters are obtained by example-based learning. In order to assess the validity of the proposed method, it is tested using data from expressway traffic that are provided by the Korean Expressway Corporation, and the performance of the proposed method is compared with existing approaches. The results demonstrate that the proposed method can predict traffic conditions with an accuracy of 85%, and this accuracy can be improved further. PMID:26821025

  3. Statistical inference for the additive hazards model under outcome-dependent sampling.

    PubMed

    Yu, Jichang; Liu, Yanyan; Sandler, Dale P; Zhou, Haibo

    2015-09-01

    Cost-effective study design and proper inference procedures for data from such designs are always of particular interests to study investigators. In this article, we propose a biased sampling scheme, an outcome-dependent sampling (ODS) design for survival data with right censoring under the additive hazards model. We develop a weighted pseudo-score estimator for the regression parameters for the proposed design and derive the asymptotic properties of the proposed estimator. We also provide some suggestions for using the proposed method by evaluating the relative efficiency of the proposed method against simple random sampling design and derive the optimal allocation of the subsamples for the proposed design. Simulation studies show that the proposed ODS design is more powerful than other existing designs and the proposed estimator is more efficient than other estimators. We apply our method to analyze a cancer study conducted at NIEHS, the Cancer Incidence and Mortality of Uranium Miners Study, to study the risk of radon exposure to cancer.

  4. Statistical inference for the additive hazards model under outcome-dependent sampling

    PubMed Central

    Yu, Jichang; Liu, Yanyan; Sandler, Dale P.; Zhou, Haibo

    2015-01-01

    Cost-effective study design and proper inference procedures for data from such designs are always of particular interests to study investigators. In this article, we propose a biased sampling scheme, an outcome-dependent sampling (ODS) design for survival data with right censoring under the additive hazards model. We develop a weighted pseudo-score estimator for the regression parameters for the proposed design and derive the asymptotic properties of the proposed estimator. We also provide some suggestions for using the proposed method by evaluating the relative efficiency of the proposed method against simple random sampling design and derive the optimal allocation of the subsamples for the proposed design. Simulation studies show that the proposed ODS design is more powerful than other existing designs and the proposed estimator is more efficient than other estimators. We apply our method to analyze a cancer study conducted at NIEHS, the Cancer Incidence and Mortality of Uranium Miners Study, to study the risk of radon exposure to cancer. PMID:26379363

  5. Kappa statistic for clustered dichotomous responses from physicians and patients.

    PubMed

    Kang, Chaeryon; Qaqish, Bahjat; Monaco, Jane; Sheridan, Stacey L; Cai, Jianwen

    2013-09-20

    The bootstrap method for estimating the standard error of the kappa statistic in the presence of clustered data is evaluated. Such data arise, for example, in assessing agreement between physicians and their patients regarding their understanding of the physician-patient interaction and discussions. We propose a computationally efficient procedure for generating correlated dichotomous responses for physicians and assigned patients for simulation studies. The simulation result demonstrates that the proposed bootstrap method produces better estimate of the standard error and better coverage performance compared with the asymptotic standard error estimate that ignores dependence among patients within physicians with at least a moderately large number of clusters. We present an example of an application to a coronary heart disease prevention study. Copyright © 2013 John Wiley & Sons, Ltd.

  6. EEG Sleep Stages Classification Based on Time Domain Features and Structural Graph Similarity.

    PubMed

    Diykh, Mohammed; Li, Yan; Wen, Peng

    2016-11-01

    The electroencephalogram (EEG) signals are commonly used in diagnosing and treating sleep disorders. Many existing methods for sleep stages classification mainly depend on the analysis of EEG signals in time or frequency domain to obtain a high classification accuracy. In this paper, the statistical features in time domain, the structural graph similarity and the K-means (SGSKM) are combined to identify six sleep stages using single channel EEG signals. Firstly, each EEG segment is partitioned into sub-segments. The size of a sub-segment is determined empirically. Secondly, statistical features are extracted, sorted into different sets of features and forwarded to the SGSKM to classify EEG sleep stages. We have also investigated the relationships between sleep stages and the time domain features of the EEG data used in this paper. The experimental results show that the proposed method yields better classification results than other four existing methods and the support vector machine (SVM) classifier. A 95.93% average classification accuracy is achieved by using the proposed method.

  7. The expectancy-value muddle in the theory of planned behaviour - and some proposed solutions.

    PubMed

    French, David P; Hankins, Matthew

    2003-02-01

    The authors of the Theories of Reasoned Action and Planned Behaviour recommended a method for statistically analysing the relationships between beliefs and the Attitude, Subjective Norm, and Perceived Behavioural Control constructs. This method has been used in the overwhelming majority of studies using these theories. However, there is a growing awareness that this method yields statistically uninterpretable results (Evans, 1991). Despite this, the use of this method is continuing, as is uninformed interpretation of this problematic research literature. This is probably due to the lack of a simple account of where the problem lies, and the large number of alternatives available. This paper therefore summarizes the problem as simply as possible, gives consideration to the conclusions that can be validly drawn from studies that contain this problem, and critically reviews the many alternatives that have been proposed to address this problem. Different techniques are identified as being suitable, according to the purpose of the specific research project.

  8. Multivariate fault isolation of batch processes via variable selection in partial least squares discriminant analysis.

    PubMed

    Yan, Zhengbing; Kuang, Te-Hui; Yao, Yuan

    2017-09-01

    In recent years, multivariate statistical monitoring of batch processes has become a popular research topic, wherein multivariate fault isolation is an important step aiming at the identification of the faulty variables contributing most to the detected process abnormality. Although contribution plots have been commonly used in statistical fault isolation, such methods suffer from the smearing effect between correlated variables. In particular, in batch process monitoring, the high autocorrelations and cross-correlations that exist in variable trajectories make the smearing effect unavoidable. To address such a problem, a variable selection-based fault isolation method is proposed in this research, which transforms the fault isolation problem into a variable selection problem in partial least squares discriminant analysis and solves it by calculating a sparse partial least squares model. As different from the traditional methods, the proposed method emphasizes the relative importance of each process variable. Such information may help process engineers in conducting root-cause diagnosis. Copyright © 2017 ISA. Published by Elsevier Ltd. All rights reserved.

  9. Methods of Improving Speech Intelligibility for Listeners with Hearing Resolution Deficit

    PubMed Central

    2012-01-01

    Abstract Methods developed for real-time time scale modification (TSM) of speech signal are presented. They are based on the non-uniform, speech rate depended SOLA algorithm (Synchronous Overlap and Add). Influence of the proposed method on the intelligibility of speech was investigated for two separate groups of listeners, i.e. hearing impaired children and elderly listeners. It was shown that for the speech with average rate equal to or higher than 6.48 vowels/s, all of the proposed methods have statistically significant impact on the improvement of speech intelligibility for hearing impaired children with reduced hearing resolution and one of the proposed methods significantly improves comprehension of speech in the group of elderly listeners with reduced hearing resolution. Virtual slides http://www.diagnosticpathology.diagnomx.eu/vs/2065486371761991 PMID:23009662

  10. Meta-analysis of quantitative pleiotropic traits for next-generation sequencing with multivariate functional linear models

    PubMed Central

    Chiu, Chi-yang; Jung, Jeesun; Chen, Wei; Weeks, Daniel E; Ren, Haobo; Boehnke, Michael; Amos, Christopher I; Liu, Aiyi; Mills, James L; Ting Lee, Mei-ling; Xiong, Momiao; Fan, Ruzong

    2017-01-01

    To analyze next-generation sequencing data, multivariate functional linear models are developed for a meta-analysis of multiple studies to connect genetic variant data to multiple quantitative traits adjusting for covariates. The goal is to take the advantage of both meta-analysis and pleiotropic analysis in order to improve power and to carry out a unified association analysis of multiple studies and multiple traits of complex disorders. Three types of approximate F -distributions based on Pillai–Bartlett trace, Hotelling–Lawley trace, and Wilks's Lambda are introduced to test for association between multiple quantitative traits and multiple genetic variants. Simulation analysis is performed to evaluate false-positive rates and power of the proposed tests. The proposed methods are applied to analyze lipid traits in eight European cohorts. It is shown that it is more advantageous to perform multivariate analysis than univariate analysis in general, and it is more advantageous to perform meta-analysis of multiple studies instead of analyzing the individual studies separately. The proposed models require individual observations. The value of the current paper can be seen at least for two reasons: (a) the proposed methods can be applied to studies that have individual genotype data; (b) the proposed methods can be used as a criterion for future work that uses summary statistics to build test statistics to meta-analyze the data. PMID:28000696

  11. Meta-analysis of quantitative pleiotropic traits for next-generation sequencing with multivariate functional linear models.

    PubMed

    Chiu, Chi-Yang; Jung, Jeesun; Chen, Wei; Weeks, Daniel E; Ren, Haobo; Boehnke, Michael; Amos, Christopher I; Liu, Aiyi; Mills, James L; Ting Lee, Mei-Ling; Xiong, Momiao; Fan, Ruzong

    2017-02-01

    To analyze next-generation sequencing data, multivariate functional linear models are developed for a meta-analysis of multiple studies to connect genetic variant data to multiple quantitative traits adjusting for covariates. The goal is to take the advantage of both meta-analysis and pleiotropic analysis in order to improve power and to carry out a unified association analysis of multiple studies and multiple traits of complex disorders. Three types of approximate F -distributions based on Pillai-Bartlett trace, Hotelling-Lawley trace, and Wilks's Lambda are introduced to test for association between multiple quantitative traits and multiple genetic variants. Simulation analysis is performed to evaluate false-positive rates and power of the proposed tests. The proposed methods are applied to analyze lipid traits in eight European cohorts. It is shown that it is more advantageous to perform multivariate analysis than univariate analysis in general, and it is more advantageous to perform meta-analysis of multiple studies instead of analyzing the individual studies separately. The proposed models require individual observations. The value of the current paper can be seen at least for two reasons: (a) the proposed methods can be applied to studies that have individual genotype data; (b) the proposed methods can be used as a criterion for future work that uses summary statistics to build test statistics to meta-analyze the data.

  12. Contribution of artificial intelligence to the knowledge of prognostic factors in laryngeal carcinoma.

    PubMed

    Zapater, E; Moreno, S; Fortea, M A; Campos, A; Armengot, M; Basterra, J

    2000-11-01

    Many studies have investigated prognostic factors in laryngeal carcinoma, with sometimes conflicting results. Apart from the importance of environmental factors, the different statistical methods employed may have influenced such discrepancies. A program based on artificial intelligence techniques is designed to determine the prognostic factors in a series of 122 laryngeal carcinomas. The results obtained are compared with those derived from two classical statistical methods (Cox regression and mortality tables). Tumor location was found to be the most important prognostic factor by all methods. The proposed intelligent system is found to be a sound method capable of detecting exceptional cases.

  13. Air Quality Forecasting through Different Statistical and Artificial Intelligence Techniques

    NASA Astrophysics Data System (ADS)

    Mishra, D.; Goyal, P.

    2014-12-01

    Urban air pollution forecasting has emerged as an acute problem in recent years because there are sever environmental degradation due to increase in harmful air pollutants in the ambient atmosphere. In this study, there are different types of statistical as well as artificial intelligence techniques are used for forecasting and analysis of air pollution over Delhi urban area. These techniques are principle component analysis (PCA), multiple linear regression (MLR) and artificial neural network (ANN) and the forecasting are observed in good agreement with the observed concentrations through Central Pollution Control Board (CPCB) at different locations in Delhi. But such methods suffers from disadvantages like they provide limited accuracy as they are unable to predict the extreme points i.e. the pollution maximum and minimum cut-offs cannot be determined using such approach. Also, such methods are inefficient approach for better output forecasting. But with the advancement in technology and research, an alternative to the above traditional methods has been proposed i.e. the coupling of statistical techniques with artificial Intelligence (AI) can be used for forecasting purposes. The coupling of PCA, ANN and fuzzy logic is used for forecasting of air pollutant over Delhi urban area. The statistical measures e.g., correlation coefficient (R), normalized mean square error (NMSE), fractional bias (FB) and index of agreement (IOA) of the proposed model are observed in better agreement with the all other models. Hence, the coupling of statistical and artificial intelligence can be use for the forecasting of air pollutant over urban area.

  14. Allelic-based gene-gene interaction associated with quantitative traits.

    PubMed

    Jung, Jeesun; Sun, Bin; Kwon, Deukwoo; Koller, Daniel L; Foroud, Tatiana M

    2009-05-01

    Recent studies have shown that quantitative phenotypes may be influenced not only by multiple single nucleotide polymorphisms (SNPs) within a gene but also by the interaction between SNPs at unlinked genes. We propose a new statistical approach that can detect gene-gene interactions at the allelic level which contribute to the phenotypic variation in a quantitative trait. By testing for the association of allelic combinations at multiple unlinked loci with a quantitative trait, we can detect the SNP allelic interaction whether or not it can be detected as a main effect. Our proposed method assigns a score to unrelated subjects according to their allelic combination inferred from observed genotypes at two or more unlinked SNPs, and then tests for the association of the allelic score with a quantitative trait. To investigate the statistical properties of the proposed method, we performed a simulation study to estimate type I error rates and power and demonstrated that this allelic approach achieves greater power than the more commonly used genotypic approach to test for gene-gene interaction. As an example, the proposed method was applied to data obtained as part of a candidate gene study of sodium retention by the kidney. We found that this method detects an interaction between the calcium-sensing receptor gene (CaSR), the chloride channel gene (CLCNKB) and the Na, K, 2Cl cotransporter gene (CLC12A1) that contributes to variation in diastolic blood pressure.

  15. On using summary statistics from an external calibration sample to correct for covariate measurement error.

    PubMed

    Guo, Ying; Little, Roderick J; McConnell, Daniel S

    2012-01-01

    Covariate measurement error is common in epidemiologic studies. Current methods for correcting measurement error with information from external calibration samples are insufficient to provide valid adjusted inferences. We consider the problem of estimating the regression of an outcome Y on covariates X and Z, where Y and Z are observed, X is unobserved, but a variable W that measures X with error is observed. Information about measurement error is provided in an external calibration sample where data on X and W (but not Y and Z) are recorded. We describe a method that uses summary statistics from the calibration sample to create multiple imputations of the missing values of X in the regression sample, so that the regression coefficients of Y on X and Z and associated standard errors can be estimated using simple multiple imputation combining rules, yielding valid statistical inferences under the assumption of a multivariate normal distribution. The proposed method is shown by simulation to provide better inferences than existing methods, namely the naive method, classical calibration, and regression calibration, particularly for correction for bias and achieving nominal confidence levels. We also illustrate our method with an example using linear regression to examine the relation between serum reproductive hormone concentrations and bone mineral density loss in midlife women in the Michigan Bone Health and Metabolism Study. Existing methods fail to adjust appropriately for bias due to measurement error in the regression setting, particularly when measurement error is substantial. The proposed method corrects this deficiency.

  16. Validating Coherence Measurements Using Aligned and Unaligned Coherence Functions

    NASA Technical Reports Server (NTRS)

    Miles, Jeffrey Hilton

    2006-01-01

    This paper describes a novel approach based on the use of coherence functions and statistical theory for sensor validation in a harsh environment. By the use of aligned and unaligned coherence functions and statistical theory one can test for sensor degradation, total sensor failure or changes in the signal. This advanced diagnostic approach and the novel data processing methodology discussed provides a single number that conveys this information. This number as calculated with standard statistical procedures for comparing the means of two distributions is compared with results obtained using Yuen's robust statistical method to create confidence intervals. Examination of experimental data from Kulite pressure transducers mounted in a Pratt & Whitney PW4098 combustor using spectrum analysis methods on aligned and unaligned time histories has verified the effectiveness of the proposed method. All the procedures produce good results which demonstrates how robust the technique is.

  17. Proposal for a biometrics of the cortical surface: a statistical method for relative surface distance metrics

    NASA Astrophysics Data System (ADS)

    Bookstein, Fred L.

    1995-08-01

    Recent advances in computational geometry have greatly extended the range of neuroanatomical questions that can be approached by rigorous quantitative methods. One of the major current challenges in this area is to describe the variability of human cortical surface form and its implications for individual differences in neurophysiological functioning. Existing techniques for representation of stochastically invaginated surfaces do not conduce to the necessary parametric statistical summaries. In this paper, following a hint from David Van Essen and Heather Drury, I sketch a statistical method customized for the constraints of this complex data type. Cortical surface form is represented by its Riemannian metric tensor and averaged according to parameters of a smooth averaged surface. Sulci are represented by integral trajectories of the smaller principal strains of this metric, and their statistics follow the statistics of that relative metric. The diagrams visualizing this tensor analysis look like alligator leather but summarize all aspects of cortical surface form in between the principal sulci, the reliable ones; no flattening is required.

  18. Accelerating separable footprint (SF) forward and back projection on GPU

    NASA Astrophysics Data System (ADS)

    Xie, Xiaobin; McGaffin, Madison G.; Long, Yong; Fessler, Jeffrey A.; Wen, Minhua; Lin, James

    2017-03-01

    Statistical image reconstruction (SIR) methods for X-ray CT can improve image quality and reduce radiation dosages over conventional reconstruction methods, such as filtered back projection (FBP). However, SIR methods require much longer computation time. The separable footprint (SF) forward and back projection technique simplifies the calculation of intersecting volumes of image voxels and finite-size beams in a way that is both accurate and efficient for parallel implementation. We propose a new method to accelerate the SF forward and back projection on GPU with NVIDIA's CUDA environment. For the forward projection, we parallelize over all detector cells. For the back projection, we parallelize over all 3D image voxels. The simulation results show that the proposed method is faster than the acceleration method of the SF projectors proposed by Wu and Fessler.13 We further accelerate the proposed method using multiple GPUs. The results show that the computation time is reduced approximately proportional to the number of GPUs.

  19. Jointly learning word embeddings using a corpus and a knowledge base

    PubMed Central

    Bollegala, Danushka; Maehara, Takanori; Kawarabayashi, Ken-ichi

    2018-01-01

    Methods for representing the meaning of words in vector spaces purely using the information distributed in text corpora have proved to be very valuable in various text mining and natural language processing (NLP) tasks. However, these methods still disregard the valuable semantic relational structure between words in co-occurring contexts. These beneficial semantic relational structures are contained in manually-created knowledge bases (KBs) such as ontologies and semantic lexicons, where the meanings of words are represented by defining the various relationships that exist among those words. We combine the knowledge in both a corpus and a KB to learn better word embeddings. Specifically, we propose a joint word representation learning method that uses the knowledge in the KBs, and simultaneously predicts the co-occurrences of two words in a corpus context. In particular, we use the corpus to define our objective function subject to the relational constrains derived from the KB. We further utilise the corpus co-occurrence statistics to propose two novel approaches, Nearest Neighbour Expansion (NNE) and Hedged Nearest Neighbour Expansion (HNE), that dynamically expand the KB and therefore derive more constraints that guide the optimisation process. Our experimental results over a wide-range of benchmark tasks demonstrate that the proposed method statistically significantly improves the accuracy of the word embeddings learnt. It outperforms a corpus-only baseline and reports an improvement of a number of previously proposed methods that incorporate corpora and KBs in both semantic similarity prediction and word analogy detection tasks. PMID:29529052

  20. Generalized Factorial Moments

    NASA Astrophysics Data System (ADS)

    Bialas, A.

    2004-02-01

    It is shown that the method of eliminating the statistical fluctuations from event-by-event analysis proposed recently by Fu and Liu can be rewritten in a compact form involving the generalized factorial moments.

  1. Texture Classification by Texton: Statistical versus Binary

    PubMed Central

    Guo, Zhenhua; Zhang, Zhongcheng; Li, Xiu; Li, Qin; You, Jane

    2014-01-01

    Using statistical textons for texture classification has shown great success recently. The maximal response 8 (Statistical_MR8), image patch (Statistical_Joint) and locally invariant fractal (Statistical_Fractal) are typical statistical texton algorithms and state-of-the-art texture classification methods. However, there are two limitations when using these methods. First, it needs a training stage to build a texton library, thus the recognition accuracy will be highly depended on the training samples; second, during feature extraction, local feature is assigned to a texton by searching for the nearest texton in the whole library, which is time consuming when the library size is big and the dimension of feature is high. To address the above two issues, in this paper, three binary texton counterpart methods were proposed, Binary_MR8, Binary_Joint, and Binary_Fractal. These methods do not require any training step but encode local feature into binary representation directly. The experimental results on the CUReT, UIUC and KTH-TIPS databases show that binary texton could get sound results with fast feature extraction, especially when the image size is not big and the quality of image is not poor. PMID:24520346

  2. Development and validation of multivariate calibration methods for simultaneous estimation of Paracetamol, Enalapril maleate and hydrochlorothiazide in pharmaceutical dosage form

    NASA Astrophysics Data System (ADS)

    Singh, Veena D.; Daharwal, Sanjay J.

    2017-01-01

    Three multivariate calibration spectrophotometric methods were developed for simultaneous estimation of Paracetamol (PARA), Enalapril maleate (ENM) and Hydrochlorothiazide (HCTZ) in tablet dosage form; namely multi-linear regression calibration (MLRC), trilinear regression calibration method (TLRC) and classical least square (CLS) method. The selectivity of the proposed methods were studied by analyzing the laboratory prepared ternary mixture and successfully applied in their combined dosage form. The proposed methods were validated as per ICH guidelines and good accuracy; precision and specificity were confirmed within the concentration range of 5-35 μg mL- 1, 5-40 μg mL- 1 and 5-40 μg mL- 1of PARA, HCTZ and ENM, respectively. The results were statistically compared with reported HPLC method. Thus, the proposed methods can be effectively useful for the routine quality control analysis of these drugs in commercial tablet dosage form.

  3. Outcome-Dependent Sampling Design and Inference for Cox's Proportional Hazards Model.

    PubMed

    Yu, Jichang; Liu, Yanyan; Cai, Jianwen; Sandler, Dale P; Zhou, Haibo

    2016-11-01

    We propose a cost-effective outcome-dependent sampling design for the failure time data and develop an efficient inference procedure for data collected with this design. To account for the biased sampling scheme, we derive estimators from a weighted partial likelihood estimating equation. The proposed estimators for regression parameters are shown to be consistent and asymptotically normally distributed. A criteria that can be used to optimally implement the ODS design in practice is proposed and studied. The small sample performance of the proposed method is evaluated by simulation studies. The proposed design and inference procedure is shown to be statistically more powerful than existing alternative designs with the same sample sizes. We illustrate the proposed method with an existing real data from the Cancer Incidence and Mortality of Uranium Miners Study.

  4. Robust kernel representation with statistical local features for face recognition.

    PubMed

    Yang, Meng; Zhang, Lei; Shiu, Simon Chi-Keung; Zhang, David

    2013-06-01

    Factors such as misalignment, pose variation, and occlusion make robust face recognition a difficult problem. It is known that statistical features such as local binary pattern are effective for local feature extraction, whereas the recently proposed sparse or collaborative representation-based classification has shown interesting results in robust face recognition. In this paper, we propose a novel robust kernel representation model with statistical local features (SLF) for robust face recognition. Initially, multipartition max pooling is used to enhance the invariance of SLF to image registration error. Then, a kernel-based representation model is proposed to fully exploit the discrimination information embedded in the SLF, and robust regression is adopted to effectively handle the occlusion in face images. Extensive experiments are conducted on benchmark face databases, including extended Yale B, AR (A. Martinez and R. Benavente), multiple pose, illumination, and expression (multi-PIE), facial recognition technology (FERET), face recognition grand challenge (FRGC), and labeled faces in the wild (LFW), which have different variations of lighting, expression, pose, and occlusions, demonstrating the promising performance of the proposed method.

  5. An Aggregated Method for Determining Railway Defects and Obstacle Parameters

    NASA Astrophysics Data System (ADS)

    Loktev, Daniil; Loktev, Alexey; Stepanov, Roman; Pevzner, Viktor; Alenov, Kanat

    2018-03-01

    The method of combining algorithms of image blur analysis and stereo vision to determine the distance to objects (including external defects of railway tracks) and the speed of moving objects-obstacles is proposed. To estimate the deviation of the distance depending on the blur a statistical approach, logarithmic, exponential and linear standard functions are used. The statistical approach includes a method of estimating least squares and the method of least modules. The accuracy of determining the distance to the object, its speed and direction of movement is obtained. The paper develops a method of determining distances to objects by analyzing a series of images and assessment of depth using defocusing using its aggregation with stereoscopic vision. This method is based on a physical effect of dependence on the determined distance to the object on the obtained image from the focal length or aperture of the lens. In the calculation of the blur spot diameter it is assumed that blur occurs at the point equally in all directions. According to the proposed approach, it is possible to determine the distance to the studied object and its blur by analyzing a series of images obtained using the video detector with different settings. The article proposes and scientifically substantiates new and improved existing methods for detecting the parameters of static and moving objects of control, and also compares the results of the use of various methods and the results of experiments. It is shown that the aggregate method gives the best approximation to the real distances.

  6. A new computerized diagnostic algorithm for quantitative evaluation of binocular misalignment in patients with strabismus

    NASA Astrophysics Data System (ADS)

    Nam, Kyoung Won; Kim, In Young; Kang, Ho Chul; Yang, Hee Kyung; Yoon, Chang Ki; Hwang, Jeong Min; Kim, Young Jae; Kim, Tae Yun; Kim, Kwang Gi

    2012-10-01

    Accurate measurement of binocular misalignment between both eyes is important for proper preoperative management, surgical planning, and postoperative evaluation of patients with strabismus. In this study, we proposed a new computerized diagnostic algorithm that can calculate the angle of binocular eye misalignment photographically by using a dedicated three-dimensional eye model mimicking the structure of the natural human eye. To evaluate the performance of the proposed algorithm, eight healthy volunteers and eight individuals with strabismus were recruited in this study, the horizontal deviation angle, vertical deviation angle, and angle of eye misalignment were calculated and the angular differences between the healthy and the strabismus groups were evaluated using the nonparametric Mann-Whitney test and the Pearson correlation test. The experimental results demonstrated a statistically significant difference between the healthy and strabismus groups (p = 0.015 < 0.05), but no statistically significant difference between the proposed method and the Krimsky test (p = 0.912 > 0.05). The measurements of the two methods were highly correlated (r = 0.969, p < 0.05). From the experimental results, we believe that the proposed diagnostic method has the potential to be a diagnostic tool that measures the physical disorder of the human eye to diagnose non-invasively the severity of strabismus.

  7. Order-restricted inference for means with missing values.

    PubMed

    Wang, Heng; Zhong, Ping-Shou

    2017-09-01

    Missing values appear very often in many applications, but the problem of missing values has not received much attention in testing order-restricted alternatives. Under the missing at random (MAR) assumption, we impute the missing values nonparametrically using kernel regression. For data with imputation, the classical likelihood ratio test designed for testing the order-restricted means is no longer applicable since the likelihood does not exist. This article proposes a novel method for constructing test statistics for assessing means with an increasing order or a decreasing order based on jackknife empirical likelihood (JEL) ratio. It is shown that the JEL ratio statistic evaluated under the null hypothesis converges to a chi-bar-square distribution, whose weights depend on missing probabilities and nonparametric imputation. Simulation study shows that the proposed test performs well under various missing scenarios and is robust for normally and nonnormally distributed data. The proposed method is applied to an Alzheimer's disease neuroimaging initiative data set for finding a biomarker for the diagnosis of the Alzheimer's disease. © 2017, The International Biometric Society.

  8. Estimating the Proportion of True Null Hypotheses Using the Pattern of Observed p-values

    PubMed Central

    Tong, Tiejun; Feng, Zeny; Hilton, Julia S.; Zhao, Hongyu

    2013-01-01

    Estimating the proportion of true null hypotheses, π0, has attracted much attention in the recent statistical literature. Besides its apparent relevance for a set of specific scientific hypotheses, an accurate estimate of this parameter is key for many multiple testing procedures. Most existing methods for estimating π0 in the literature are motivated from the independence assumption of test statistics, which is often not true in reality. Simulations indicate that most existing estimators in the presence of the dependence among test statistics can be poor, mainly due to the increase of variation in these estimators. In this paper, we propose several data-driven methods for estimating π0 by incorporating the distribution pattern of the observed p-values as a practical approach to address potential dependence among test statistics. Specifically, we use a linear fit to give a data-driven estimate for the proportion of true-null p-values in (λ, 1] over the whole range [0, 1] instead of using the expected proportion at 1 − λ. We find that the proposed estimators may substantially decrease the variance of the estimated true null proportion and thus improve the overall performance. PMID:24078762

  9. No-reference image quality assessment based on natural scene statistics and gradient magnitude similarity

    NASA Astrophysics Data System (ADS)

    Jia, Huizhen; Sun, Quansen; Ji, Zexuan; Wang, Tonghan; Chen, Qiang

    2014-11-01

    The goal of no-reference/blind image quality assessment (NR-IQA) is to devise a perceptual model that can accurately predict the quality of a distorted image as human opinions, in which feature extraction is an important issue. However, the features used in the state-of-the-art "general purpose" NR-IQA algorithms are usually natural scene statistics (NSS) based or are perceptually relevant; therefore, the performance of these models is limited. To further improve the performance of NR-IQA, we propose a general purpose NR-IQA algorithm which combines NSS-based features with perceptually relevant features. The new method extracts features in both the spatial and gradient domains. In the spatial domain, we extract the point-wise statistics for single pixel values which are characterized by a generalized Gaussian distribution model to form the underlying features. In the gradient domain, statistical features based on neighboring gradient magnitude similarity are extracted. Then a mapping is learned to predict quality scores using a support vector regression. The experimental results on the benchmark image databases demonstrate that the proposed algorithm correlates highly with human judgments of quality and leads to significant performance improvements over state-of-the-art methods.

  10. Estimating the Proportion of True Null Hypotheses Using the Pattern of Observed p-values.

    PubMed

    Tong, Tiejun; Feng, Zeny; Hilton, Julia S; Zhao, Hongyu

    2013-01-01

    Estimating the proportion of true null hypotheses, π 0 , has attracted much attention in the recent statistical literature. Besides its apparent relevance for a set of specific scientific hypotheses, an accurate estimate of this parameter is key for many multiple testing procedures. Most existing methods for estimating π 0 in the literature are motivated from the independence assumption of test statistics, which is often not true in reality. Simulations indicate that most existing estimators in the presence of the dependence among test statistics can be poor, mainly due to the increase of variation in these estimators. In this paper, we propose several data-driven methods for estimating π 0 by incorporating the distribution pattern of the observed p -values as a practical approach to address potential dependence among test statistics. Specifically, we use a linear fit to give a data-driven estimate for the proportion of true-null p -values in (λ, 1] over the whole range [0, 1] instead of using the expected proportion at 1 - λ. We find that the proposed estimators may substantially decrease the variance of the estimated true null proportion and thus improve the overall performance.

  11. Generalized Full-Information Item Bifactor Analysis

    ERIC Educational Resources Information Center

    Cai, Li; Yang, Ji Seung; Hansen, Mark

    2011-01-01

    Full-information item bifactor analysis is an important statistical method in psychological and educational measurement. Current methods are limited to single-group analysis and inflexible in the types of item response models supported. We propose a flexible multiple-group item bifactor analysis framework that supports a variety of…

  12. Exact test-based approach for equivalence test with parameter margin.

    PubMed

    Cassie Dong, Xiaoyu; Bian, Yuanyuan; Tsong, Yi; Wang, Tianhua

    2017-01-01

    The equivalence test has a wide range of applications in pharmaceutical statistics which we need to test for the similarity between two groups. In recent years, the equivalence test has been used in assessing the analytical similarity between a proposed biosimilar product and a reference product. More specifically, the mean values of the two products for a given quality attribute are compared against an equivalence margin in the form of ±f × σ R , where ± f × σ R is a function of the reference variability. In practice, this margin is unknown and is estimated from the sample as ±f × S R . If we use this estimated margin with the classic t-test statistic on the equivalence test for the means, both Type I and Type II error rates may inflate. To resolve this issue, we develop an exact-based test method and compare this method with other proposed methods, such as the Wald test, the constrained Wald test, and the Generalized Pivotal Quantity (GPQ) in terms of Type I error rate and power. Application of those methods on data analysis is also provided in this paper. This work focuses on the development and discussion of the general statistical methodology and is not limited to the application of analytical similarity.

  13. Analyzing Kernel Matrices for the Identification of Differentially Expressed Genes

    PubMed Central

    Xia, Xiao-Lei; Xing, Huanlai; Liu, Xueqin

    2013-01-01

    One of the most important applications of microarray data is the class prediction of biological samples. For this purpose, statistical tests have often been applied to identify the differentially expressed genes (DEGs), followed by the employment of the state-of-the-art learning machines including the Support Vector Machines (SVM) in particular. The SVM is a typical sample-based classifier whose performance comes down to how discriminant samples are. However, DEGs identified by statistical tests are not guaranteed to result in a training dataset composed of discriminant samples. To tackle this problem, a novel gene ranking method namely the Kernel Matrix Gene Selection (KMGS) is proposed. The rationale of the method, which roots in the fundamental ideas of the SVM algorithm, is described. The notion of ''the separability of a sample'' which is estimated by performing -like statistics on each column of the kernel matrix, is first introduced. The separability of a classification problem is then measured, from which the significance of a specific gene is deduced. Also described is a method of Kernel Matrix Sequential Forward Selection (KMSFS) which shares the KMGS method's essential ideas but proceeds in a greedy manner. On three public microarray datasets, our proposed algorithms achieved noticeably competitive performance in terms of the B.632+ error rate. PMID:24349110

  14. Fast mean and variance computation of the diffuse sound transmission through finite-sized thick and layered wall and floor systems

    NASA Astrophysics Data System (ADS)

    Decraene, Carolina; Dijckmans, Arne; Reynders, Edwin P. B.

    2018-05-01

    A method is developed for computing the mean and variance of the diffuse field sound transmission loss of finite-sized layered wall and floor systems that consist of solid, fluid and/or poroelastic layers. This is achieved by coupling a transfer matrix model of the wall or floor to statistical energy analysis subsystem models of the adjacent room volumes. The modal behavior of the wall is approximately accounted for by projecting the wall displacement onto a set of sinusoidal lateral basis functions. This hybrid modal transfer matrix-statistical energy analysis method is validated on multiple wall systems: a thin steel plate, a polymethyl methacrylate panel, a thick brick wall, a sandwich panel, a double-leaf wall with poro-elastic material in the cavity, and a double glazing. The predictions are compared with experimental data and with results obtained using alternative prediction methods such as the transfer matrix method with spatial windowing, the hybrid wave based-transfer matrix method, and the hybrid finite element-statistical energy analysis method. These comparisons confirm the prediction accuracy of the proposed method and the computational efficiency against the conventional hybrid finite element-statistical energy analysis method.

  15. Comparing two correlated C indices with right-censored survival outcome: a one-shot nonparametric approach

    PubMed Central

    Kang, Le; Chen, Weijie; Petrick, Nicholas A.; Gallas, Brandon D.

    2014-01-01

    The area under the receiver operating characteristic (ROC) curve (AUC) is often used as a summary index of the diagnostic ability in evaluating biomarkers when the clinical outcome (truth) is binary. When the clinical outcome is right-censored survival time, the C index, motivated as an extension of AUC, has been proposed by Harrell as a measure of concordance between a predictive biomarker and the right-censored survival outcome. In this work, we investigate methods for statistical comparison of two diagnostic or predictive systems, of which they could either be two biomarkers or two fixed algorithms, in terms of their C indices. We adopt a U-statistics based C estimator that is asymptotically normal and develop a nonparametric analytical approach to estimate the variance of the C estimator and the covariance of two C estimators. A z-score test is then constructed to compare the two C indices. We validate our one-shot nonparametric method via simulation studies in terms of the type I error rate and power. We also compare our one-shot method with resampling methods including the jackknife and the bootstrap. Simulation results show that the proposed one-shot method provides almost unbiased variance estimations and has satisfactory type I error control and power. Finally, we illustrate the use of the proposed method with an example from the Framingham Heart Study. PMID:25399736

  16. A novel measure and significance testing in data analysis of cell image segmentation.

    PubMed

    Wu, Jin Chu; Halter, Michael; Kacker, Raghu N; Elliott, John T; Plant, Anne L

    2017-03-14

    Cell image segmentation (CIS) is an essential part of quantitative imaging of biological cells. Designing a performance measure and conducting significance testing are critical for evaluating and comparing the CIS algorithms for image-based cell assays in cytometry. Many measures and methods have been proposed and implemented to evaluate segmentation methods. However, computing the standard errors (SE) of the measures and their correlation coefficient is not described, and thus the statistical significance of performance differences between CIS algorithms cannot be assessed. We propose the total error rate (TER), a novel performance measure for segmenting all cells in the supervised evaluation. The TER statistically aggregates all misclassification error rates (MER) by taking cell sizes as weights. The MERs are for segmenting each single cell in the population. The TER is fully supported by the pairwise comparisons of MERs using 106 manually segmented ground-truth cells with different sizes and seven CIS algorithms taken from ImageJ. Further, the SE and 95% confidence interval (CI) of TER are computed based on the SE of MER that is calculated using the bootstrap method. An algorithm for computing the correlation coefficient of TERs between two CIS algorithms is also provided. Hence, the 95% CI error bars can be used to classify CIS algorithms. The SEs of TERs and their correlation coefficient can be employed to conduct the hypothesis testing, while the CIs overlap, to determine the statistical significance of the performance differences between CIS algorithms. A novel measure TER of CIS is proposed. The TER's SEs and correlation coefficient are computed. Thereafter, CIS algorithms can be evaluated and compared statistically by conducting the significance testing.

  17. Approximate Model Checking of PCTL Involving Unbounded Path Properties

    NASA Astrophysics Data System (ADS)

    Basu, Samik; Ghosh, Arka P.; He, Ru

    We study the problem of applying statistical methods for approximate model checking of probabilistic systems against properties encoded as PCTL formulas. Such approximate methods have been proposed primarily to deal with state-space explosion that makes the exact model checking by numerical methods practically infeasible for large systems. However, the existing statistical methods either consider a restricted subset of PCTL, specifically, the subset that can only express bounded until properties; or rely on user-specified finite bound on the sample path length. We propose a new method that does not have such restrictions and can be effectively used to reason about unbounded until properties. We approximate probabilistic characteristics of an unbounded until property by that of a bounded until property for a suitably chosen value of the bound. In essence, our method is a two-phase process: (a) the first phase is concerned with identifying the bound k 0; (b) the second phase computes the probability of satisfying the k 0-bounded until property as an estimate for the probability of satisfying the corresponding unbounded until property. In both phases, it is sufficient to verify bounded until properties which can be effectively done using existing statistical techniques. We prove the correctness of our technique and present its prototype implementations. We empirically show the practical applicability of our method by considering different case studies including a simple infinite-state model, and large finite-state models such as IPv4 zeroconf protocol and dining philosopher protocol modeled as Discrete Time Markov chains.

  18. Mammographic enhancement with combining local statistical measures and sliding band filter for improved mass segmentation in mammograms

    NASA Astrophysics Data System (ADS)

    Kim, Dae Hoe; Choi, Jae Young; Choi, Seon Hyeong; Ro, Yong Man

    2012-03-01

    In this study, a novel mammogram enhancement solution is proposed, aiming to improve the quality of subsequent mass segmentation in mammograms. It has been widely accepted that characteristics of masses are usually hyper-dense or uniform density with respect to its background. Also, their core parts are likely to have high-intensity values while the values of intensity tend to be decreased as the distance to core parts increases. Based on the aforementioned observations, we develop a new and effective mammogram enhancement method by combining local statistical measurements and Sliding Band Filtering (SBF). By effectively combining local statistical measurements and SBF, we are able to improve the contrast of the bright and smooth regions (which represent potential mass regions), as well as, at the same time, the regions where their surrounding gradients are converging to the centers of regions of interest. In this study, 89 mammograms were collected from the public MAIS database (DB) to demonstrate the effectiveness of the proposed enhancement solution in terms of improving mass segmentation. As for a segmentation method, widely used contour-based segmentation approach was employed. The contour-based method in conjunction with the proposed enhancement solution achieved overall detection accuracy of 92.4% with a total of 85 correct cases. On the other hand, without using our enhancement solution, overall detection accuracy of the contour-based method was only 78.3%. In addition, experimental results demonstrated the feasibility of our enhancement solution for the purpose of improving detection accuracy on mammograms containing dense parenchymal patterns.

  19. Scan statistics with local vote for target detection in distributed system

    NASA Astrophysics Data System (ADS)

    Luo, Junhai; Wu, Qi

    2017-12-01

    Target detection has occupied a pivotal position in distributed system. Scan statistics, as one of the most efficient detection methods, has been applied to a variety of anomaly detection problems and significantly improves the probability of detection. However, scan statistics cannot achieve the expected performance when the noise intensity is strong, or the signal emitted by the target is weak. The local vote algorithm can also achieve higher target detection rate. After the local vote, the counting rule is always adopted for decision fusion. The counting rule does not use the information about the contiguity of sensors but takes all sensors' data into consideration, which makes the result undesirable. In this paper, we propose a scan statistics with local vote (SSLV) method. This method combines scan statistics with local vote decision. Before scan statistics, each sensor executes local vote decision according to the data of its neighbors and its own. By combining the advantages of both, our method can obtain higher detection rate in low signal-to-noise ratio environment than the scan statistics. After the local vote decision, the distribution of sensors which have detected the target becomes more intensive. To make full use of local vote decision, we introduce a variable-step-parameter for the SSLV. It significantly shortens the scan period especially when the target is absent. Analysis and simulations are presented to demonstrate the performance of our method.

  20. Functional Logistic Regression Approach to Detecting Gene by Longitudinal Environmental Exposure Interaction in a Case-Control Study

    PubMed Central

    Wei, Peng; Tang, Hongwei; Li, Donghui

    2014-01-01

    Most complex human diseases are likely the consequence of the joint actions of genetic and environmental factors. Identification of gene-environment (GxE) interactions not only contributes to a better understanding of the disease mechanisms, but also improves disease risk prediction and targeted intervention. In contrast to the large number of genetic susceptibility loci discovered by genome-wide association studies, there have been very few successes in identifying GxE interactions which may be partly due to limited statistical power and inaccurately measured exposures. While existing statistical methods only consider interactions between genes and static environmental exposures, many environmental/lifestyle factors, such as air pollution and diet, change over time, and cannot be accurately captured at one measurement time point or by simply categorizing into static exposure categories. There is a dearth of statistical methods for detecting gene by time-varying environmental exposure interactions. Here we propose a powerful functional logistic regression (FLR) approach to model the time-varying effect of longitudinal environmental exposure and its interaction with genetic factors on disease risk. Capitalizing on the powerful functional data analysis framework, our proposed FLR model is capable of accommodating longitudinal exposures measured at irregular time points and contaminated by measurement errors, commonly encountered in observational studies. We use extensive simulations to show that the proposed method can control the Type I error and is more powerful than alternative ad hoc methods. We demonstrate the utility of this new method using data from a case-control study of pancreatic cancer to identify the windows of vulnerability of lifetime body mass index on the risk of pancreatic cancer as well as genes which may modify this association. PMID:25219575

  1. Functional logistic regression approach to detecting gene by longitudinal environmental exposure interaction in a case-control study.

    PubMed

    Wei, Peng; Tang, Hongwei; Li, Donghui

    2014-11-01

    Most complex human diseases are likely the consequence of the joint actions of genetic and environmental factors. Identification of gene-environment (G × E) interactions not only contributes to a better understanding of the disease mechanisms, but also improves disease risk prediction and targeted intervention. In contrast to the large number of genetic susceptibility loci discovered by genome-wide association studies, there have been very few successes in identifying G × E interactions, which may be partly due to limited statistical power and inaccurately measured exposures. Although existing statistical methods only consider interactions between genes and static environmental exposures, many environmental/lifestyle factors, such as air pollution and diet, change over time, and cannot be accurately captured at one measurement time point or by simply categorizing into static exposure categories. There is a dearth of statistical methods for detecting gene by time-varying environmental exposure interactions. Here, we propose a powerful functional logistic regression (FLR) approach to model the time-varying effect of longitudinal environmental exposure and its interaction with genetic factors on disease risk. Capitalizing on the powerful functional data analysis framework, our proposed FLR model is capable of accommodating longitudinal exposures measured at irregular time points and contaminated by measurement errors, commonly encountered in observational studies. We use extensive simulations to show that the proposed method can control the Type I error and is more powerful than alternative ad hoc methods. We demonstrate the utility of this new method using data from a case-control study of pancreatic cancer to identify the windows of vulnerability of lifetime body mass index on the risk of pancreatic cancer as well as genes that may modify this association. © 2014 Wiley Periodicals, Inc.

  2. Active learning methods for interactive image retrieval.

    PubMed

    Gosselin, Philippe Henri; Cord, Matthieu

    2008-07-01

    Active learning methods have been considered with increased interest in the statistical learning community. Initially developed within a classification framework, a lot of extensions are now being proposed to handle multimedia applications. This paper provides algorithms within a statistical framework to extend active learning for online content-based image retrieval (CBIR). The classification framework is presented with experiments to compare several powerful classification techniques in this information retrieval context. Focusing on interactive methods, active learning strategy is then described. The limitations of this approach for CBIR are emphasized before presenting our new active selection process RETIN. First, as any active method is sensitive to the boundary estimation between classes, the RETIN strategy carries out a boundary correction to make the retrieval process more robust. Second, the criterion of generalization error to optimize the active learning selection is modified to better represent the CBIR objective of database ranking. Third, a batch processing of images is proposed. Our strategy leads to a fast and efficient active learning scheme to retrieve sets of online images (query concept). Experiments on large databases show that the RETIN method performs well in comparison to several other active strategies.

  3. Prostate segmentation in MRI using a convolutional neural network architecture and training strategy based on statistical shape models.

    PubMed

    Karimi, Davood; Samei, Golnoosh; Kesch, Claudia; Nir, Guy; Salcudean, Septimiu E

    2018-05-15

    Most of the existing convolutional neural network (CNN)-based medical image segmentation methods are based on methods that have originally been developed for segmentation of natural images. Therefore, they largely ignore the differences between the two domains, such as the smaller degree of variability in the shape and appearance of the target volume and the smaller amounts of training data in medical applications. We propose a CNN-based method for prostate segmentation in MRI that employs statistical shape models to address these issues. Our CNN predicts the location of the prostate center and the parameters of the shape model, which determine the position of prostate surface keypoints. To train such a large model for segmentation of 3D images using small data (1) we adopt a stage-wise training strategy by first training the network to predict the prostate center and subsequently adding modules for predicting the parameters of the shape model and prostate rotation, (2) we propose a data augmentation method whereby the training images and their prostate surface keypoints are deformed according to the displacements computed based on the shape model, and (3) we employ various regularization techniques. Our proposed method achieves a Dice score of 0.88, which is obtained by using both elastic-net and spectral dropout for regularization. Compared with a standard CNN-based method, our method shows significantly better segmentation performance on the prostate base and apex. Our experiments also show that data augmentation using the shape model significantly improves the segmentation results. Prior knowledge about the shape of the target organ can improve the performance of CNN-based segmentation methods, especially where image features are not sufficient for a precise segmentation. Statistical shape models can also be employed to synthesize additional training data that can ease the training of large CNNs.

  4. A nonlinear quality-related fault detection approach based on modified kernel partial least squares.

    PubMed

    Jiao, Jianfang; Zhao, Ning; Wang, Guang; Yin, Shen

    2017-01-01

    In this paper, a new nonlinear quality-related fault detection method is proposed based on kernel partial least squares (KPLS) model. To deal with the nonlinear characteristics among process variables, the proposed method maps these original variables into feature space in which the linear relationship between kernel matrix and output matrix is realized by means of KPLS. Then the kernel matrix is decomposed into two orthogonal parts by singular value decomposition (SVD) and the statistics for each part are determined appropriately for the purpose of quality-related fault detection. Compared with relevant existing nonlinear approaches, the proposed method has the advantages of simple diagnosis logic and stable performance. A widely used literature example and an industrial process are used for the performance evaluation for the proposed method. Copyright © 2016 ISA. Published by Elsevier Ltd. All rights reserved.

  5. Resampling-based Methods in Single and Multiple Testing for Equality of Covariance/Correlation Matrices

    PubMed Central

    Yang, Yang; DeGruttola, Victor

    2016-01-01

    Traditional resampling-based tests for homogeneity in covariance matrices across multiple groups resample residuals, that is, data centered by group means. These residuals do not share the same second moments when the null hypothesis is false, which makes them difficult to use in the setting of multiple testing. An alternative approach is to resample standardized residuals, data centered by group sample means and standardized by group sample covariance matrices. This approach, however, has been observed to inflate type I error when sample size is small or data are generated from heavy-tailed distributions. We propose to improve this approach by using robust estimation for the first and second moments. We discuss two statistics: the Bartlett statistic and a statistic based on eigen-decomposition of sample covariance matrices. Both statistics can be expressed in terms of standardized errors under the null hypothesis. These methods are extended to test homogeneity in correlation matrices. Using simulation studies, we demonstrate that the robust resampling approach provides comparable or superior performance, relative to traditional approaches, for single testing and reasonable performance for multiple testing. The proposed methods are applied to data collected in an HIV vaccine trial to investigate possible determinants, including vaccine status, vaccine-induced immune response level and viral genotype, of unusual correlation pattern between HIV viral load and CD4 count in newly infected patients. PMID:22740584

  6. Resampling-based methods in single and multiple testing for equality of covariance/correlation matrices.

    PubMed

    Yang, Yang; DeGruttola, Victor

    2012-06-22

    Traditional resampling-based tests for homogeneity in covariance matrices across multiple groups resample residuals, that is, data centered by group means. These residuals do not share the same second moments when the null hypothesis is false, which makes them difficult to use in the setting of multiple testing. An alternative approach is to resample standardized residuals, data centered by group sample means and standardized by group sample covariance matrices. This approach, however, has been observed to inflate type I error when sample size is small or data are generated from heavy-tailed distributions. We propose to improve this approach by using robust estimation for the first and second moments. We discuss two statistics: the Bartlett statistic and a statistic based on eigen-decomposition of sample covariance matrices. Both statistics can be expressed in terms of standardized errors under the null hypothesis. These methods are extended to test homogeneity in correlation matrices. Using simulation studies, we demonstrate that the robust resampling approach provides comparable or superior performance, relative to traditional approaches, for single testing and reasonable performance for multiple testing. The proposed methods are applied to data collected in an HIV vaccine trial to investigate possible determinants, including vaccine status, vaccine-induced immune response level and viral genotype, of unusual correlation pattern between HIV viral load and CD4 count in newly infected patients.

  7. Valid Statistical Analysis for Logistic Regression with Multiple Sources

    NASA Astrophysics Data System (ADS)

    Fienberg, Stephen E.; Nardi, Yuval; Slavković, Aleksandra B.

    Considerable effort has gone into understanding issues of privacy protection of individual information in single databases, and various solutions have been proposed depending on the nature of the data, the ways in which the database will be used and the precise nature of the privacy protection being offered. Once data are merged across sources, however, the nature of the problem becomes far more complex and a number of privacy issues arise for the linked individual files that go well beyond those that are considered with regard to the data within individual sources. In the paper, we propose an approach that gives full statistical analysis on the combined database without actually combining it. We focus mainly on logistic regression, but the method and tools described may be applied essentially to other statistical models as well.

  8. Statistical appearance models based on probabilistic correspondences.

    PubMed

    Krüger, Julia; Ehrhardt, Jan; Handels, Heinz

    2017-04-01

    Model-based image analysis is indispensable in medical image processing. One key aspect of building statistical shape and appearance models is the determination of one-to-one correspondences in the training data set. At the same time, the identification of these correspondences is the most challenging part of such methods. In our earlier work, we developed an alternative method using correspondence probabilities instead of exact one-to-one correspondences for a statistical shape model (Hufnagel et al., 2008). In this work, a new approach for statistical appearance models without one-to-one correspondences is proposed. A sparse image representation is used to build a model that combines point position and appearance information at the same time. Probabilistic correspondences between the derived multi-dimensional feature vectors are used to omit the need for extensive preprocessing of finding landmarks and correspondences as well as to reduce the dependence of the generated model on the landmark positions. Model generation and model fitting can now be expressed by optimizing a single global criterion derived from a maximum a-posteriori (MAP) approach with respect to model parameters that directly affect both shape and appearance of the considered objects inside the images. The proposed approach describes statistical appearance modeling in a concise and flexible mathematical framework. Besides eliminating the demand for costly correspondence determination, the method allows for additional constraints as topological regularity in the modeling process. In the evaluation the model was applied for segmentation and landmark identification in hand X-ray images. The results demonstrate the feasibility of the model to detect hand contours as well as the positions of the joints between finger bones for unseen test images. Further, we evaluated the model on brain data of stroke patients to show the ability of the proposed model to handle partially corrupted data and to demonstrate a possible employment of the correspondence probabilities to indicate these corrupted/pathological areas. Copyright © 2017 Elsevier B.V. All rights reserved.

  9. Machine Learning Methods for Production Cases Analysis

    NASA Astrophysics Data System (ADS)

    Mokrova, Nataliya V.; Mokrov, Alexander M.; Safonova, Alexandra V.; Vishnyakov, Igor V.

    2018-03-01

    Approach to analysis of events occurring during the production process were proposed. Described machine learning system is able to solve classification tasks related to production control and hazard identification at an early stage. Descriptors of the internal production network data were used for training and testing of applied models. k-Nearest Neighbors and Random forest methods were used to illustrate and analyze proposed solution. The quality of the developed classifiers was estimated using standard statistical metrics, such as precision, recall and accuracy.

  10. An efficient direct solver for rarefied gas flows with arbitrary statistics

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Diaz, Manuel A., E-mail: f99543083@ntu.edu.tw; Yang, Jaw-Yen, E-mail: yangjy@iam.ntu.edu.tw; Center of Advanced Study in Theoretical Science, National Taiwan University, Taipei 10167, Taiwan

    2016-01-15

    A new numerical methodology associated with a unified treatment is presented to solve the Boltzmann–BGK equation of gas dynamics for the classical and quantum gases described by the Bose–Einstein and Fermi–Dirac statistics. Utilizing a class of globally-stiffly-accurate implicit–explicit Runge–Kutta scheme for the temporal evolution, associated with the discrete ordinate method for the quadratures in the momentum space and the weighted essentially non-oscillatory method for the spatial discretization, the proposed scheme is asymptotic-preserving and imposes no non-linear solver or requires the knowledge of fugacity and temperature to capture the flow structures in the hydrodynamic (Euler) limit. The proposed treatment overcomes themore » limitations found in the work by Yang and Muljadi (2011) [33] due to the non-linear nature of quantum relations, and can be applied in studying the dynamics of a gas with internal degrees of freedom with correct values of the ratio of specific heat for the flow regimes for all Knudsen numbers and energy wave lengths. The present methodology is numerically validated with the unified treatment by the one-dimensional shock tube problem and the two-dimensional Riemann problems for gases of arbitrary statistics. Descriptions of ideal quantum gases including rotational degrees of freedom have been successfully achieved under the proposed methodology.« less

  11. Effect Size as the Essential Statistic in Developing Methods for mTBI Diagnosis.

    PubMed

    Gibson, Douglas Brandt

    2015-01-01

    The descriptive statistic known as "effect size" measures the distinguishability of two sets of data. Distingishability is at the core of diagnosis. This article is intended to point out the importance of effect size in the development of effective diagnostics for mild traumatic brain injury and to point out the applicability of the effect size statistic in comparing diagnostic efficiency across the main proposed TBI diagnostic methods: psychological, physiological, biochemical, and radiologic. Comparing diagnostic approaches is difficult because different researcher in different fields have different approaches to measuring efficacy. Converting diverse measures to effect sizes, as is done in meta-analysis, is a relatively easy way to make studies comparable.

  12. Tests of Independence for Ordinal Data Using Bootstrap.

    ERIC Educational Resources Information Center

    Chan, Wai; Yung, Yiu-Fai; Bentler, Peter M.; Tang, Man-Lai

    1998-01-01

    Two bootstrap tests are proposed to test the independence hypothesis in a two-way cross table. Monte Carlo studies are used to compare the traditional asymptotic test with these bootstrap methods, and the bootstrap methods are found superior in two ways: control of Type I error and statistical power. (SLD)

  13. Controlling the joint local false discovery rate is more powerful than meta-analysis methods in joint analysis of summary statistics from multiple genome-wide association studies.

    PubMed

    Jiang, Wei; Yu, Weichuan

    2017-02-15

    In genome-wide association studies (GWASs) of common diseases/traits, we often analyze multiple GWASs with the same phenotype together to discover associated genetic variants with higher power. Since it is difficult to access data with detailed individual measurements, summary-statistics-based meta-analysis methods have become popular to jointly analyze datasets from multiple GWASs. In this paper, we propose a novel summary-statistics-based joint analysis method based on controlling the joint local false discovery rate (Jlfdr). We prove that our method is the most powerful summary-statistics-based joint analysis method when controlling the false discovery rate at a certain level. In particular, the Jlfdr-based method achieves higher power than commonly used meta-analysis methods when analyzing heterogeneous datasets from multiple GWASs. Simulation experiments demonstrate the superior power of our method over meta-analysis methods. Also, our method discovers more associations than meta-analysis methods from empirical datasets of four phenotypes. The R-package is available at: http://bioinformatics.ust.hk/Jlfdr.html . eeyu@ust.hk. Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com

  14. Statistical methods for convergence detection of multi-objective evolutionary algorithms.

    PubMed

    Trautmann, H; Wagner, T; Naujoks, B; Preuss, M; Mehnen, J

    2009-01-01

    In this paper, two approaches for estimating the generation in which a multi-objective evolutionary algorithm (MOEA) shows statistically significant signs of convergence are introduced. A set-based perspective is taken where convergence is measured by performance indicators. The proposed techniques fulfill the requirements of proper statistical assessment on the one hand and efficient optimisation for real-world problems on the other hand. The first approach accounts for the stochastic nature of the MOEA by repeating the optimisation runs for increasing generation numbers and analysing the performance indicators using statistical tools. This technique results in a very robust offline procedure. Moreover, an online convergence detection method is introduced as well. This method automatically stops the MOEA when either the variance of the performance indicators falls below a specified threshold or a stagnation of their overall trend is detected. Both methods are analysed and compared for two MOEA and on different classes of benchmark functions. It is shown that the methods successfully operate on all stated problems needing less function evaluations while preserving good approximation quality at the same time.

  15. Characterization of coronary plaque regions in intravascular ultrasound images using a hybrid ensemble classifier.

    PubMed

    Hwang, Yoo Na; Lee, Ju Hwan; Kim, Ga Young; Shin, Eun Seok; Kim, Sung Min

    2018-01-01

    The purpose of this study was to propose a hybrid ensemble classifier to characterize coronary plaque regions in intravascular ultrasound (IVUS) images. Pixels were allocated to one of four tissues (fibrous tissue (FT), fibro-fatty tissue (FFT), necrotic core (NC), and dense calcium (DC)) through processes of border segmentation, feature extraction, feature selection, and classification. Grayscale IVUS images and their corresponding virtual histology images were acquired from 11 patients with known or suspected coronary artery disease using 20 MHz catheter. A total of 102 hybrid textural features including first order statistics (FOS), gray level co-occurrence matrix (GLCM), extended gray level run-length matrix (GLRLM), Laws, local binary pattern (LBP), intensity, and discrete wavelet features (DWF) were extracted from IVUS images. To select optimal feature sets, genetic algorithm was implemented. A hybrid ensemble classifier based on histogram and texture information was then used for plaque characterization in this study. The optimal feature set was used as input of this ensemble classifier. After tissue characterization, parameters including sensitivity, specificity, and accuracy were calculated to validate the proposed approach. A ten-fold cross validation approach was used to determine the statistical significance of the proposed method. Our experimental results showed that the proposed method had reliable performance for tissue characterization in IVUS images. The hybrid ensemble classification method outperformed other existing methods by achieving characterization accuracy of 81% for FFT and 75% for NC. In addition, this study showed that Laws features (SSV and SAV) were key indicators for coronary tissue characterization. The proposed method had high clinical applicability for image-based tissue characterization. Copyright © 2017 Elsevier B.V. All rights reserved.

  16. Gene-Based Association Analysis for Censored Traits Via Fixed Effect Functional Regressions.

    PubMed

    Fan, Ruzong; Wang, Yifan; Yan, Qi; Ding, Ying; Weeks, Daniel E; Lu, Zhaohui; Ren, Haobo; Cook, Richard J; Xiong, Momiao; Swaroop, Anand; Chew, Emily Y; Chen, Wei

    2016-02-01

    Genetic studies of survival outcomes have been proposed and conducted recently, but statistical methods for identifying genetic variants that affect disease progression are rarely developed. Motivated by our ongoing real studies, here we develop Cox proportional hazard models using functional regression (FR) to perform gene-based association analysis of survival traits while adjusting for covariates. The proposed Cox models are fixed effect models where the genetic effects of multiple genetic variants are assumed to be fixed. We introduce likelihood ratio test (LRT) statistics to test for associations between the survival traits and multiple genetic variants in a genetic region. Extensive simulation studies demonstrate that the proposed Cox RF LRT statistics have well-controlled type I error rates. To evaluate power, we compare the Cox FR LRT with the previously developed burden test (BT) in a Cox model and sequence kernel association test (SKAT), which is based on mixed effect Cox models. The Cox FR LRT statistics have higher power than or similar power as Cox SKAT LRT except when 50%/50% causal variants had negative/positive effects and all causal variants are rare. In addition, the Cox FR LRT statistics have higher power than Cox BT LRT. The models and related test statistics can be useful in the whole genome and whole exome association studies. An age-related macular degeneration dataset was analyzed as an example. © 2016 WILEY PERIODICALS, INC.

  17. Gene-based Association Analysis for Censored Traits Via Fixed Effect Functional Regressions

    PubMed Central

    Fan, Ruzong; Wang, Yifan; Yan, Qi; Ding, Ying; Weeks, Daniel E.; Lu, Zhaohui; Ren, Haobo; Cook, Richard J; Xiong, Momiao; Swaroop, Anand; Chew, Emily Y.; Chen, Wei

    2015-01-01

    Summary Genetic studies of survival outcomes have been proposed and conducted recently, but statistical methods for identifying genetic variants that affect disease progression are rarely developed. Motivated by our ongoing real studies, we develop here Cox proportional hazard models using functional regression (FR) to perform gene-based association analysis of survival traits while adjusting for covariates. The proposed Cox models are fixed effect models where the genetic effects of multiple genetic variants are assumed to be fixed. We introduce likelihood ratio test (LRT) statistics to test for associations between the survival traits and multiple genetic variants in a genetic region. Extensive simulation studies demonstrate that the proposed Cox RF LRT statistics have well-controlled type I error rates. To evaluate power, we compare the Cox FR LRT with the previously developed burden test (BT) in a Cox model and sequence kernel association test (SKAT) which is based on mixed effect Cox models. The Cox FR LRT statistics have higher power than or similar power as Cox SKAT LRT except when 50%/50% causal variants had negative/positive effects and all causal variants are rare. In addition, the Cox FR LRT statistics have higher power than Cox BT LRT. The models and related test statistics can be useful in the whole genome and whole exome association studies. An age-related macular degeneration dataset was analyzed as an example. PMID:26782979

  18. Quantifying, displaying and accounting for heterogeneity in the meta-analysis of RCTs using standard and generalised Q statistics

    PubMed Central

    2011-01-01

    Background Clinical researchers have often preferred to use a fixed effects model for the primary interpretation of a meta-analysis. Heterogeneity is usually assessed via the well known Q and I2 statistics, along with the random effects estimate they imply. In recent years, alternative methods for quantifying heterogeneity have been proposed, that are based on a 'generalised' Q statistic. Methods We review 18 IPD meta-analyses of RCTs into treatments for cancer, in order to quantify the amount of heterogeneity present and also to discuss practical methods for explaining heterogeneity. Results Differing results were obtained when the standard Q and I2 statistics were used to test for the presence of heterogeneity. The two meta-analyses with the largest amount of heterogeneity were investigated further, and on inspection the straightforward application of a random effects model was not deemed appropriate. Compared to the standard Q statistic, the generalised Q statistic provided a more accurate platform for estimating the amount of heterogeneity in the 18 meta-analyses. Conclusions Explaining heterogeneity via the pre-specification of trial subgroups, graphical diagnostic tools and sensitivity analyses produced a more desirable outcome than an automatic application of the random effects model. Generalised Q statistic methods for quantifying and adjusting for heterogeneity should be incorporated as standard into statistical software. Software is provided to help achieve this aim. PMID:21473747

  19. A novel approach to assess the treatment response using Gaussian random field in PET

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wang, Mengdie; Guo, Ning; Hu, Guangshu

    2016-02-15

    Purpose: The assessment of early therapeutic response to anticancer therapy is vital for treatment planning and patient management in clinic. With the development of personal treatment plan, the early treatment response, especially before any anatomically apparent changes after treatment, becomes urgent need in clinic. Positron emission tomography (PET) imaging serves an important role in clinical oncology for tumor detection, staging, and therapy response assessment. Many studies on therapy response involve interpretation of differences between two PET images, usually in terms of standardized uptake values (SUVs). However, the quantitative accuracy of this measurement is limited. This work proposes a statistically robustmore » approach for therapy response assessment based on Gaussian random field (GRF) to provide a statistically more meaningful scale to evaluate therapy effects. Methods: The authors propose a new criterion for therapeutic assessment by incorporating image noise into traditional SUV method. An analytical method based on the approximate expressions of the Fisher information matrix was applied to model the variance of individual pixels in reconstructed images. A zero mean unit variance GRF under the null hypothesis (no response to therapy) was obtained by normalizing each pixel of the post-therapy image with the mean and standard deviation of the pretherapy image. The performance of the proposed method was evaluated by Monte Carlo simulation, where XCAT phantoms (128{sup 2} pixels) with lesions of various diameters (2–6 mm), multiple tumor-to-background contrasts (3–10), and different changes in intensity (6.25%–30%) were used. The receiver operating characteristic curves and the corresponding areas under the curve were computed for both the proposed method and the traditional methods whose figure of merit is the percentage change of SUVs. The formula for the false positive rate (FPR) estimation was developed for the proposed therapy response assessment utilizing local average method based on random field. The accuracy of the estimation was validated in terms of Euler distance and correlation coefficient. Results: It is shown that the performance of therapy response assessment is significantly improved by the introduction of variance with a higher area under the curve (97.3%) than SUVmean (91.4%) and SUVmax (82.0%). In addition, the FPR estimation serves as a good prediction for the specificity of the proposed method, consistent with simulation outcome with ∼1 correlation coefficient. Conclusions: In this work, the authors developed a method to evaluate therapy response from PET images, which were modeled as Gaussian random field. The digital phantom simulations demonstrated that the proposed method achieved a large reduction in statistical variability through incorporating knowledge of the variance of the original Gaussian random field. The proposed method has the potential to enable prediction of early treatment response and shows promise for application to clinical practice. In future work, the authors will report on the robustness of the estimation theory for application to clinical practice of therapy response evaluation, which pertains to binary discrimination tasks at a fixed location in the image such as detection of small and weak lesion.« less

  20. Information loss method to measure node similarity in networks

    NASA Astrophysics Data System (ADS)

    Li, Yongli; Luo, Peng; Wu, Chong

    2014-09-01

    Similarity measurement for the network node has been paid increasing attention in the field of statistical physics. In this paper, we propose an entropy-based information loss method to measure the node similarity. The whole model is established based on this idea that less information loss is caused by seeing two more similar nodes as the same. The proposed new method has relatively low algorithm complexity, making it less time-consuming and more efficient to deal with the large scale real-world network. In order to clarify its availability and accuracy, this new approach was compared with some other selected approaches on two artificial examples and synthetic networks. Furthermore, the proposed method is also successfully applied to predict the network evolution and predict the unknown nodes' attributions in the two application examples.

  1. Best Merge Region Growing with Integrated Probabilistic Classification for Hyperspectral Imagery

    NASA Technical Reports Server (NTRS)

    Tarabalka, Yuliya; Tilton, James C.

    2011-01-01

    A new method for spectral-spatial classification of hyperspectral images is proposed. The method is based on the integration of probabilistic classification within the hierarchical best merge region growing algorithm. For this purpose, preliminary probabilistic support vector machines classification is performed. Then, hierarchical step-wise optimization algorithm is applied, by iteratively merging regions with the smallest Dissimilarity Criterion (DC). The main novelty of this method consists in defining a DC between regions as a function of region statistical and geometrical features along with classification probabilities. Experimental results are presented on a 200-band AVIRIS image of the Northwestern Indiana s vegetation area and compared with those obtained by recently proposed spectral-spatial classification techniques. The proposed method improves classification accuracies when compared to other classification approaches.

  2. Infrared face recognition based on LBP histogram and KW feature selection

    NASA Astrophysics Data System (ADS)

    Xie, Zhihua

    2014-07-01

    The conventional LBP-based feature as represented by the local binary pattern (LBP) histogram still has room for performance improvements. This paper focuses on the dimension reduction of LBP micro-patterns and proposes an improved infrared face recognition method based on LBP histogram representation. To extract the local robust features in infrared face images, LBP is chosen to get the composition of micro-patterns of sub-blocks. Based on statistical test theory, Kruskal-Wallis (KW) feature selection method is proposed to get the LBP patterns which are suitable for infrared face recognition. The experimental results show combination of LBP and KW features selection improves the performance of infrared face recognition, the proposed method outperforms the traditional methods based on LBP histogram, discrete cosine transform(DCT) or principal component analysis(PCA).

  3. Blood detection in wireless capsule endoscope images based on salient superpixels.

    PubMed

    Iakovidis, Dimitris K; Chatzis, Dimitris; Chrysanthopoulos, Panos; Koulaouzidis, Anastasios

    2015-08-01

    Wireless capsule endoscopy (WCE) enables screening of the gastrointestinal (GI) tract with a miniature, optical endoscope packed within a small swallowable capsule, wirelessly transmitting color images. In this paper we propose a novel method for automatic blood detection in contemporary WCE images. Blood is an alarming indication for the presence of pathologies requiring further treatment. The proposed method is based on a new definition of superpixel saliency. The saliency of superpixels is assessed upon their color, enabling the identification of image regions that are likely to contain blood. The blood patterns are recognized by their color features using a supervised learning machine. Experiments performed on a public dataset using automatically selected first-order statistical features from various color components indicate that the proposed method outperforms state-of-the-art methods.

  4. Meta-analysis of correlated traits via summary statistics from GWASs with an application in hypertension.

    PubMed

    Zhu, Xiaofeng; Feng, Tao; Tayo, Bamidele O; Liang, Jingjing; Young, J Hunter; Franceschini, Nora; Smith, Jennifer A; Yanek, Lisa R; Sun, Yan V; Edwards, Todd L; Chen, Wei; Nalls, Mike; Fox, Ervin; Sale, Michele; Bottinger, Erwin; Rotimi, Charles; Liu, Yongmei; McKnight, Barbara; Liu, Kiang; Arnett, Donna K; Chakravati, Aravinda; Cooper, Richard S; Redline, Susan

    2015-01-08

    Genome-wide association studies (GWASs) have identified many genetic variants underlying complex traits. Many detected genetic loci harbor variants that associate with multiple-even distinct-traits. Most current analysis approaches focus on single traits, even though the final results from multiple traits are evaluated together. Such approaches miss the opportunity to systemically integrate the phenome-wide data available for genetic association analysis. In this study, we propose a general approach that can integrate association evidence from summary statistics of multiple traits, either correlated, independent, continuous, or binary traits, which might come from the same or different studies. We allow for trait heterogeneity effects. Population structure and cryptic relatedness can also be controlled. Our simulations suggest that the proposed method has improved statistical power over single-trait analysis in most of the cases we studied. We applied our method to the Continental Origins and Genetic Epidemiology Network (COGENT) African ancestry samples for three blood pressure traits and identified four loci (CHIC2, HOXA-EVX1, IGFBP1/IGFBP3, and CDH17; p < 5.0 × 10(-8)) associated with hypertension-related traits that were missed by a single-trait analysis in the original report. Six additional loci with suggestive association evidence (p < 5.0 × 10(-7)) were also observed, including CACNA1D and WNT3. Our study strongly suggests that analyzing multiple phenotypes can improve statistical power and that such analysis can be executed with the summary statistics from GWASs. Our method also provides a way to study a cross phenotype (CP) association by using summary statistics from GWASs of multiple phenotypes. Copyright © 2015 The American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.

  5. Robust tissue-air volume segmentation of MR images based on the statistics of phase and magnitude: Its applications in the display of susceptibility-weighted imaging of the brain.

    PubMed

    Du, Yiping P; Jin, Zhaoyang

    2009-10-01

    To develop a robust algorithm for tissue-air segmentation in magnetic resonance imaging (MRI) using the statistics of phase and magnitude of the images. A multivariate measure based on the statistics of phase and magnitude was constructed for tissue-air volume segmentation. The standard deviation of first-order phase difference and the standard deviation of magnitude were calculated in a 3 x 3 x 3 kernel in the image domain. To improve differentiation accuracy, the uniformity of phase distribution in the kernel was also calculated and linear background phase introduced by field inhomogeneity was corrected. The effectiveness of the proposed volume segmentation technique was compared to a conventional approach that uses the magnitude data alone. The proposed algorithm was shown to be more effective and robust in volume segmentation in both synthetic phantom and susceptibility-weighted images of human brain. Using our proposed volume segmentation method, veins in the peripheral regions of the brain were well depicted in the minimum-intensity projection of the susceptibility-weighted images. Using the additional statistics of phase, tissue-air volume segmentation can be substantially improved compared to that using the statistics of magnitude data alone. (c) 2009 Wiley-Liss, Inc.

  6. Computing physical properties with quantum Monte Carlo methods with statistical fluctuations independent of system size.

    PubMed

    Assaraf, Roland

    2014-12-01

    We show that the recently proposed correlated sampling without reweighting procedure extends the locality (asymptotic independence of the system size) of a physical property to the statistical fluctuations of its estimator. This makes the approach potentially vastly more efficient for computing space-localized properties in large systems compared with standard correlated methods. A proof is given for a large collection of noninteracting fragments. Calculations on hydrogen chains suggest that this behavior holds not only for systems displaying short-range correlations, but also for systems with long-range correlations.

  7. LP-search and its use in analysis of the accuracy of control systems with acoustical models

    NASA Technical Reports Server (NTRS)

    Sergeyev, V. I.; Sobol, I. M.; Statnikov, R. B.; Statnikov, I. N.

    1973-01-01

    The LP-search is proposed as an analog of the Monte Carlo method for finding values in nonlinear statistical systems. It is concluded that: To attain the required accuracy in solution to the problem of control for a statistical system in the LP-search, a considerably smaller number of tests is required than in the Monte Carlo method. The LP-search allows the possibility of multiple repetitions of tests under identical conditions and observability of the output variables of the system.

  8. 3D Representative Volume Element Reconstruction of Fiber Composites via Orientation Tensor and Substructure Features

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Li, Yi; Chen, Wei; Xu, Hongyi

    To provide a seamless integration of manufacturing processing simulation and fiber microstructure modeling, two new stochastic 3D microstructure reconstruction methods are proposed for two types of random fiber composites: random short fiber composites, and Sheet Molding Compounds (SMC) chopped fiber composites. A Random Sequential Adsorption (RSA) algorithm is first developed to embed statistical orientation information into 3D RVE reconstruction of random short fiber composites. For the SMC composites, an optimized Voronoi diagram based approach is developed for capturing the substructure features of SMC chopped fiber composites. The proposed methods are distinguished from other reconstruction works by providing a way ofmore » integrating statistical information (fiber orientation tensor) obtained from material processing simulation, as well as capturing the multiscale substructures of the SMC composites.« less

  9. Reproducible segmentation of white matter hyperintensities using a new statistical definition.

    PubMed

    Damangir, Soheil; Westman, Eric; Simmons, Andrew; Vrenken, Hugo; Wahlund, Lars-Olof; Spulber, Gabriela

    2017-06-01

    We present a method based on a proposed statistical definition of white matter hyperintensities (WMH), which can work with any combination of conventional magnetic resonance (MR) sequences without depending on manually delineated samples. T1-weighted, T2-weighted, FLAIR, and PD sequences acquired at 1.5 Tesla from 119 subjects from the Kings Health Partners-Dementia Case Register (healthy controls, mild cognitive impairment, Alzheimer's disease) were used. The segmentation was performed using a proposed definition for WMH based on the one-tailed Kolmogorov-Smirnov test. The presented method was verified, given all possible combinations of input sequences, against manual segmentations and a high similarity (Dice 0.85-0.91) was observed. Comparing segmentations with different input sequences to one another also yielded a high similarity (Dice 0.83-0.94) that exceeded intra-rater similarity (Dice 0.75-0.91). We compared the results with those of other available methods and showed that the segmentation based on the proposed definition has better accuracy and reproducibility in the test dataset used. Overall, the presented definition is shown to produce accurate results with higher reproducibility than manual delineation. This approach can be an alternative to other manual or automatic methods not only because of its accuracy, but also due to its good reproducibility.

  10. A hybrid fault diagnosis approach based on mixed-domain state features for rotating machinery.

    PubMed

    Xue, Xiaoming; Zhou, Jianzhong

    2017-01-01

    To make further improvement in the diagnosis accuracy and efficiency, a mixed-domain state features data based hybrid fault diagnosis approach, which systematically blends both the statistical analysis approach and the artificial intelligence technology, is proposed in this work for rolling element bearings. For simplifying the fault diagnosis problems, the execution of the proposed method is divided into three steps, i.e., fault preliminary detection, fault type recognition and fault degree identification. In the first step, a preliminary judgment about the health status of the equipment can be evaluated by the statistical analysis method based on the permutation entropy theory. If fault exists, the following two processes based on the artificial intelligence approach are performed to further recognize the fault type and then identify the fault degree. For the two subsequent steps, mixed-domain state features containing time-domain, frequency-domain and multi-scale features are extracted to represent the fault peculiarity under different working conditions. As a powerful time-frequency analysis method, the fast EEMD method was employed to obtain multi-scale features. Furthermore, due to the information redundancy and the submergence of original feature space, a novel manifold learning method (modified LGPCA) is introduced to realize the low-dimensional representations for high-dimensional feature space. Finally, two cases with 12 working conditions respectively have been employed to evaluate the performance of the proposed method, where vibration signals were measured from an experimental bench of rolling element bearing. The analysis results showed the effectiveness and the superiority of the proposed method of which the diagnosis thought is more suitable for practical application. Copyright © 2016 ISA. Published by Elsevier Ltd. All rights reserved.

  11. AISLE: an automatic volumetric segmentation method for the study of lung allometry.

    PubMed

    Ren, Hongliang; Kazanzides, Peter

    2011-01-01

    We developed a fully automatic segmentation method for volumetric CT (computer tomography) datasets to support construction of a statistical atlas for the study of allometric laws of the lung. The proposed segmentation method, AISLE (Automated ITK-Snap based on Level-set), is based on the level-set implementation from an existing semi-automatic segmentation program, ITK-Snap. AISLE can segment the lung field without human interaction and provide intermediate graphical results as desired. The preliminary experimental results show that the proposed method can achieve accurate segmentation, in terms of volumetric overlap metric, by comparing with the ground-truth segmentation performed by a radiologist.

  12. Adjusting the Adjusted X[superscript 2]/df Ratio Statistic for Dichotomous Item Response Theory Analyses: Does the Model Fit?

    ERIC Educational Resources Information Center

    Tay, Louis; Drasgow, Fritz

    2012-01-01

    Two Monte Carlo simulation studies investigated the effectiveness of the mean adjusted X[superscript 2]/df statistic proposed by Drasgow and colleagues and, because of problems with the method, a new approach for assessing the goodness of fit of an item response theory model was developed. It has been previously recommended that mean adjusted…

  13. Dynamic re-weighted total variation technique and statistic Iterative reconstruction method for x-ray CT metal artifact reduction

    NASA Astrophysics Data System (ADS)

    Peng, Chengtao; Qiu, Bensheng; Zhang, Cheng; Ma, Changyu; Yuan, Gang; Li, Ming

    2017-07-01

    Over the years, the X-ray computed tomography (CT) has been successfully used in clinical diagnosis. However, when the body of the patient to be examined contains metal objects, the image reconstructed would be polluted by severe metal artifacts, which affect the doctor's diagnosis of disease. In this work, we proposed a dynamic re-weighted total variation (DRWTV) technique combined with the statistic iterative reconstruction (SIR) method to reduce the artifacts. The DRWTV method is based on the total variation (TV) and re-weighted total variation (RWTV) techniques, but it provides a sparser representation than TV and protects the tissue details better than RWTV. Besides, the DRWTV can suppress the artifacts and noise, and the SIR convergence speed is also accelerated. The performance of the algorithm is tested on both simulated phantom dataset and clinical dataset, which are the teeth phantom with two metal implants and the skull with three metal implants, respectively. The proposed algorithm (SIR-DRWTV) is compared with two traditional iterative algorithms, which are SIR and SIR constrained by RWTV regulation (SIR-RWTV). The results show that the proposed algorithm has the best performance in reducing metal artifacts and protecting tissue details.

  14. Parameter identification for structural dynamics based on interval analysis algorithm

    NASA Astrophysics Data System (ADS)

    Yang, Chen; Lu, Zixing; Yang, Zhenyu; Liang, Ke

    2018-04-01

    A parameter identification method using interval analysis algorithm for structural dynamics is presented in this paper. The proposed uncertain identification method is investigated by using central difference method and ARMA system. With the help of the fixed memory least square method and matrix inverse lemma, a set-membership identification technology is applied to obtain the best estimation of the identified parameters in a tight and accurate region. To overcome the lack of insufficient statistical description of the uncertain parameters, this paper treats uncertainties as non-probabilistic intervals. As long as we know the bounds of uncertainties, this algorithm can obtain not only the center estimations of parameters, but also the bounds of errors. To improve the efficiency of the proposed method, a time-saving algorithm is presented by recursive formula. At last, to verify the accuracy of the proposed method, two numerical examples are applied and evaluated by three identification criteria respectively.

  15. Spectrofluorimetric determination of 3-methylflavone-8-carboxylic acid, the main active metabolite of flavoxate hydrochloride in human urine.

    PubMed

    Zaazaa, Hala E; Mohamed, Afaf O; Hawwam, Maha A; Abdelkawy, Mohamed

    2015-01-05

    A simple, sensitive and selective spectrofluorimetric method has been developed for the determination of 3-methylflavone-8-carboxylic acid as the main active metabolite of flavoxate hydrochloride in human urine. The proposed method was based on the measurement of the native fluorescence of the metabolite in methanol at an emission wavelength 390 nm, upon excitation at 338 nm. Moreover, the urinary excretion pattern has been calculated using the proposed method. Taking the advantage that 3-methylflavone-8-carboxylic acid is also the alkaline degradate, the proposed method was applied to in vitro determination of flavoxate hydrochloride in tablets dosage form via the measurement of its corresponding degradate. The method was validated in accordance with the ICH requirements and statistically compared to the official method with no significant difference in performance. Copyright © 2014 Elsevier B.V. All rights reserved.

  16. Climate Change Impacts at Department of Defense Installations

    DTIC Science & Technology

    2017-06-16

    locations. The ease of use of this method and its flexibility have led to a wide variety of applications for assessing impacts of climate change 4...versions of these statistical methods to provide the basis for regional climate assessments for various states, regions, and government agencies...averaging (REA) method proposed by Giorgi and Mearns (2002). This method assigns reliability classifications for the multi-model ensemble simulation by

  17. Outcome-Dependent Sampling Design and Inference for Cox’s Proportional Hazards Model

    PubMed Central

    Yu, Jichang; Liu, Yanyan; Cai, Jianwen; Sandler, Dale P.; Zhou, Haibo

    2016-01-01

    We propose a cost-effective outcome-dependent sampling design for the failure time data and develop an efficient inference procedure for data collected with this design. To account for the biased sampling scheme, we derive estimators from a weighted partial likelihood estimating equation. The proposed estimators for regression parameters are shown to be consistent and asymptotically normally distributed. A criteria that can be used to optimally implement the ODS design in practice is proposed and studied. The small sample performance of the proposed method is evaluated by simulation studies. The proposed design and inference procedure is shown to be statistically more powerful than existing alternative designs with the same sample sizes. We illustrate the proposed method with an existing real data from the Cancer Incidence and Mortality of Uranium Miners Study. PMID:28090134

  18. Improved Adaptive LSB Steganography Based on Chaos and Genetic Algorithm

    NASA Astrophysics Data System (ADS)

    Yu, Lifang; Zhao, Yao; Ni, Rongrong; Li, Ting

    2010-12-01

    We propose a novel steganographic method in JPEG images with high performance. Firstly, we propose improved adaptive LSB steganography, which can achieve high capacity while preserving the first-order statistics. Secondly, in order to minimize visual degradation of the stego image, we shuffle bits-order of the message based on chaos whose parameters are selected by the genetic algorithm. Shuffling message's bits-order provides us with a new way to improve the performance of steganography. Experimental results show that our method outperforms classical steganographic methods in image quality, while preserving characteristics of histogram and providing high capacity.

  19. 3D automatic anatomy recognition based on iterative graph-cut-ASM

    NASA Astrophysics Data System (ADS)

    Chen, Xinjian; Udupa, Jayaram K.; Bagci, Ulas; Alavi, Abass; Torigian, Drew A.

    2010-02-01

    We call the computerized assistive process of recognizing, delineating, and quantifying organs and tissue regions in medical imaging, occurring automatically during clinical image interpretation, automatic anatomy recognition (AAR). The AAR system we are developing includes five main parts: model building, object recognition, object delineation, pathology detection, and organ system quantification. In this paper, we focus on the delineation part. For the modeling part, we employ the active shape model (ASM) strategy. For recognition and delineation, we integrate several hybrid strategies of combining purely image based methods with ASM. In this paper, an iterative Graph-Cut ASM (IGCASM) method is proposed for object delineation. An algorithm called GC-ASM was presented at this symposium last year for object delineation in 2D images which attempted to combine synergistically ASM and GC. Here, we extend this method to 3D medical image delineation. The IGCASM method effectively combines the rich statistical shape information embodied in ASM with the globally optimal delineation capability of the GC method. We propose a new GC cost function, which effectively integrates the specific image information with the ASM shape model information. The proposed methods are tested on a clinical abdominal CT data set. The preliminary results show that: (a) it is feasible to explicitly bring prior 3D statistical shape information into the GC framework; (b) the 3D IGCASM delineation method improves on ASM and GC and can provide practical operational time on clinical images.

  20. Vibration-based structural health monitoring using adaptive statistical method under varying environmental condition

    NASA Astrophysics Data System (ADS)

    Jin, Seung-Seop; Jung, Hyung-Jo

    2014-03-01

    It is well known that the dynamic properties of a structure such as natural frequencies depend not only on damage but also on environmental condition (e.g., temperature). The variation in dynamic characteristics of a structure due to environmental condition may mask damage of the structure. Without taking the change of environmental condition into account, false-positive or false-negative damage diagnosis may occur so that structural health monitoring becomes unreliable. In order to address this problem, an approach to construct a regression model based on structural responses considering environmental factors has been usually used by many researchers. The key to success of this approach is the formulation between the input and output variables of the regression model to take into account the environmental variations. However, it is quite challenging to determine proper environmental variables and measurement locations in advance for fully representing the relationship between the structural responses and the environmental variations. One alternative (i.e., novelty detection) is to remove the variations caused by environmental factors from the structural responses by using multivariate statistical analysis (e.g., principal component analysis (PCA), factor analysis, etc.). The success of this method is deeply depending on the accuracy of the description of normal condition. Generally, there is no prior information on normal condition during data acquisition, so that the normal condition is determined by subjective perspective with human-intervention. The proposed method is a novel adaptive multivariate statistical analysis for monitoring of structural damage detection under environmental change. One advantage of this method is the ability of a generative learning to capture the intrinsic characteristics of the normal condition. The proposed method is tested on numerically simulated data for a range of noise in measurement under environmental variation. A comparative study with conventional methods (i.e., fixed reference scheme) demonstrates the superior performance of the proposed method for structural damage detection.

  1. Statistical Model Applied to NetFlow for Network Intrusion Detection

    NASA Astrophysics Data System (ADS)

    Proto, André; Alexandre, Leandro A.; Batista, Maira L.; Oliveira, Isabela L.; Cansian, Adriano M.

    The computers and network services became presence guaranteed in several places. These characteristics resulted in the growth of illicit events and therefore the computers and networks security has become an essential point in any computing environment. Many methodologies were created to identify these events; however, with increasing of users and services on the Internet, many difficulties are found in trying to monitor a large network environment. This paper proposes a methodology for events detection in large-scale networks. The proposal approaches the anomaly detection using the NetFlow protocol, statistical methods and monitoring the environment in a best time for the application.

  2. A generalization of random matrix theory and its application to statistical physics.

    PubMed

    Wang, Duan; Zhang, Xin; Horvatic, Davor; Podobnik, Boris; Eugene Stanley, H

    2017-02-01

    To study the statistical structure of crosscorrelations in empirical data, we generalize random matrix theory and propose a new method of cross-correlation analysis, known as autoregressive random matrix theory (ARRMT). ARRMT takes into account the influence of auto-correlations in the study of cross-correlations in multiple time series. We first analytically and numerically determine how auto-correlations affect the eigenvalue distribution of the correlation matrix. Then we introduce ARRMT with a detailed procedure of how to implement the method. Finally, we illustrate the method using two examples taken from inflation rates for air pressure data for 95 US cities.

  3. Method and system of Jones-matrix mapping of blood plasma films with "fuzzy" analysis in differentiation of breast pathology changes

    NASA Astrophysics Data System (ADS)

    Zabolotna, Natalia I.; Radchenko, Kostiantyn O.; Karas, Oleksandr V.

    2018-01-01

    A fibroadenoma diagnosing of breast using statistical analysis (determination and analysis of statistical moments of the 1st-4th order) of the obtained polarization images of Jones matrix imaginary elements of the optically thin (attenuation coefficient τ <= 0,1 ) blood plasma films with further intellectual differentiation based on the method of "fuzzy" logic and discriminant analysis were proposed. The accuracy of the intellectual differentiation of blood plasma samples to the "norm" and "fibroadenoma" of breast was 82.7% by the method of linear discriminant analysis, and by the "fuzzy" logic method is 95.3%. The obtained results allow to confirm the potentially high level of reliability of the method of differentiation by "fuzzy" analysis.

  4. Simulation-based estimation of mean and standard deviation for meta-analysis via Approximate Bayesian Computation (ABC).

    PubMed

    Kwon, Deukwoo; Reis, Isildinha M

    2015-08-12

    When conducting a meta-analysis of a continuous outcome, estimated means and standard deviations from the selected studies are required in order to obtain an overall estimate of the mean effect and its confidence interval. If these quantities are not directly reported in the publications, they must be estimated from other reported summary statistics, such as the median, the minimum, the maximum, and quartiles. We propose a simulation-based estimation approach using the Approximate Bayesian Computation (ABC) technique for estimating mean and standard deviation based on various sets of summary statistics found in published studies. We conduct a simulation study to compare the proposed ABC method with the existing methods of Hozo et al. (2005), Bland (2015), and Wan et al. (2014). In the estimation of the standard deviation, our ABC method performs better than the other methods when data are generated from skewed or heavy-tailed distributions. The corresponding average relative error (ARE) approaches zero as sample size increases. In data generated from the normal distribution, our ABC performs well. However, the Wan et al. method is best for estimating standard deviation under normal distribution. In the estimation of the mean, our ABC method is best regardless of assumed distribution. ABC is a flexible method for estimating the study-specific mean and standard deviation for meta-analysis, especially with underlying skewed or heavy-tailed distributions. The ABC method can be applied using other reported summary statistics such as the posterior mean and 95 % credible interval when Bayesian analysis has been employed.

  5. Numerical Differentiation Methods for Computing Error Covariance Matrices in Item Response Theory Modeling: An Evaluation and a New Proposal

    ERIC Educational Resources Information Center

    Tian, Wei; Cai, Li; Thissen, David; Xin, Tao

    2013-01-01

    In item response theory (IRT) modeling, the item parameter error covariance matrix plays a critical role in statistical inference procedures. When item parameters are estimated using the EM algorithm, the parameter error covariance matrix is not an automatic by-product of item calibration. Cai proposed the use of Supplemented EM algorithm for…

  6. Formalized Conflicts Detection Based on the Analysis of Multiple Emails: An Approach Combining Statistics and Ontologies

    NASA Astrophysics Data System (ADS)

    Zakaria, Chahnez; Curé, Olivier; Salzano, Gabriella; Smaïli, Kamel

    In Computer Supported Cooperative Work (CSCW), it is crucial for project leaders to detect conflicting situations as early as possible. Generally, this task is performed manually by studying a set of documents exchanged between team members. In this paper, we propose a full-fledged automatic solution that identifies documents, subjects and actors involved in relational conflicts. Our approach detects conflicts in emails, probably the most popular type of documents in CSCW, but the methods used can handle other text-based documents. These methods rely on the combination of statistical and ontological operations. The proposed solution is decomposed in several steps: (i) we enrich a simple negative emotion ontology with terms occuring in the corpus of emails, (ii) we categorize each conflicting email according to the concepts of this ontology and (iii) we identify emails, subjects and team members involved in conflicting emails using possibilistic description logic and a set of proposed measures. Each of these steps are evaluated and validated on concrete examples. Moreover, this approach's framework is generic and can be easily adapted to domains other than conflicts, e.g. security issues, and extended with operations making use of our proposed set of measures.

  7. Physics-Based Image Segmentation Using First Order Statistical Properties and Genetic Algorithm for Inductive Thermography Imaging.

    PubMed

    Gao, Bin; Li, Xiaoqing; Woo, Wai Lok; Tian, Gui Yun

    2018-05-01

    Thermographic inspection has been widely applied to non-destructive testing and evaluation with the capabilities of rapid, contactless, and large surface area detection. Image segmentation is considered essential for identifying and sizing defects. To attain a high-level performance, specific physics-based models that describe defects generation and enable the precise extraction of target region are of crucial importance. In this paper, an effective genetic first-order statistical image segmentation algorithm is proposed for quantitative crack detection. The proposed method automatically extracts valuable spatial-temporal patterns from unsupervised feature extraction algorithm and avoids a range of issues associated with human intervention in laborious manual selection of specific thermal video frames for processing. An internal genetic functionality is built into the proposed algorithm to automatically control the segmentation threshold to render enhanced accuracy in sizing the cracks. Eddy current pulsed thermography will be implemented as a platform to demonstrate surface crack detection. Experimental tests and comparisons have been conducted to verify the efficacy of the proposed method. In addition, a global quantitative assessment index F-score has been adopted to objectively evaluate the performance of different segmentation algorithms.

  8. Efficient strategy for detecting gene × gene joint action and its application in schizophrenia.

    PubMed

    Won, Sungho; Kwon, Min-Seok; Mattheisen, Manuel; Park, Suyeon; Park, Changsoon; Kihara, Daisuke; Cichon, Sven; Ophoff, Roel; Nöthen, Markus M; Rietschel, Marcella; Baur, Max; Uitterlinden, Andre G; Hofmann, A; Lange, Christoph

    2014-01-01

    We propose a new approach to detect gene × gene joint action in genome-wide association studies (GWASs) for case-control designs. This approach offers an exhaustive search for all two-way joint action (including, as a special case, single gene action) that is computationally feasible at the genome-wide level and has reasonable statistical power under most genetic models. We found that the presence of any gene × gene joint action may imply differences in three types of genetic components: the minor allele frequencies and the amounts of Hardy-Weinberg disequilibrium may differ between cases and controls, and between the two genetic loci the degree of linkage disequilibrium may differ between cases and controls. Using Fisher's method, it is possible to combine the different sources of genetic information in an overall test for detecting gene × gene joint action. The proposed statistical analysis is efficient and its simplicity makes it applicable to GWASs. In the current study, we applied the proposed approach to a GWAS on schizophrenia and found several potential gene × gene interactions. Our application illustrates the practical advantage of the proposed method. © 2013 WILEY PERIODICALS, INC.

  9. Nonparametric rank regression for analyzing water quality concentration data with multiple detection limits.

    PubMed

    Fu, Liya; Wang, You-Gan

    2011-02-15

    Environmental data usually include measurements, such as water quality data, which fall below detection limits, because of limitations of the instruments or of certain analytical methods used. The fact that some responses are not detected needs to be properly taken into account in statistical analysis of such data. However, it is well-known that it is challenging to analyze a data set with detection limits, and we often have to rely on the traditional parametric methods or simple imputation methods. Distributional assumptions can lead to biased inference and justification of distributions is often not possible when the data are correlated and there is a large proportion of data below detection limits. The extent of bias is usually unknown. To draw valid conclusions and hence provide useful advice for environmental management authorities, it is essential to develop and apply an appropriate statistical methodology. This paper proposes rank-based procedures for analyzing non-normally distributed data collected at different sites over a period of time in the presence of multiple detection limits. To take account of temporal correlations within each site, we propose an optimal linear combination of estimating functions and apply the induced smoothing method to reduce the computational burden. Finally, we apply the proposed method to the water quality data collected at Susquehanna River Basin in United States of America, which clearly demonstrates the advantages of the rank regression models.

  10. Nuclear Power Plant Thermocouple Sensor-Fault Detection and Classification Using Deep Learning and Generalized Likelihood Ratio Test

    NASA Astrophysics Data System (ADS)

    Mandal, Shyamapada; Santhi, B.; Sridhar, S.; Vinolia, K.; Swaminathan, P.

    2017-06-01

    In this paper, an online fault detection and classification method is proposed for thermocouples used in nuclear power plants. In the proposed method, the fault data are detected by the classification method, which classifies the fault data from the normal data. Deep belief network (DBN), a technique for deep learning, is applied to classify the fault data. The DBN has a multilayer feature extraction scheme, which is highly sensitive to a small variation of data. Since the classification method is unable to detect the faulty sensor; therefore, a technique is proposed to identify the faulty sensor from the fault data. Finally, the composite statistical hypothesis test, namely generalized likelihood ratio test, is applied to compute the fault pattern of the faulty sensor signal based on the magnitude of the fault. The performance of the proposed method is validated by field data obtained from thermocouple sensors of the fast breeder test reactor.

  11. A 3D model retrieval approach based on Bayesian networks lightfield descriptor

    NASA Astrophysics Data System (ADS)

    Xiao, Qinhan; Li, Yanjun

    2009-12-01

    A new 3D model retrieval methodology is proposed by exploiting a novel Bayesian networks lightfield descriptor (BNLD). There are two key novelties in our approach: (1) a BN-based method for building lightfield descriptor; and (2) a 3D model retrieval scheme based on the proposed BNLD. To overcome the disadvantages of the existing 3D model retrieval methods, we explore BN for building a new lightfield descriptor. Firstly, 3D model is put into lightfield, about 300 binary-views can be obtained along a sphere, then Fourier descriptors and Zernike moments descriptors can be calculated out from binaryviews. Then shape feature sequence would be learned into a BN model based on BN learning algorithm; Secondly, we propose a new 3D model retrieval method by calculating Kullback-Leibler Divergence (KLD) between BNLDs. Beneficial from the statistical learning, our BNLD is noise robustness as compared to the existing methods. The comparison between our method and the lightfield descriptor-based approach is conducted to demonstrate the effectiveness of our proposed methodology.

  12. Power Enhancement in High Dimensional Cross-Sectional Tests

    PubMed Central

    Fan, Jianqing; Liao, Yuan; Yao, Jiawei

    2016-01-01

    We propose a novel technique to boost the power of testing a high-dimensional vector H : θ = 0 against sparse alternatives where the null hypothesis is violated only by a couple of components. Existing tests based on quadratic forms such as the Wald statistic often suffer from low powers due to the accumulation of errors in estimating high-dimensional parameters. More powerful tests for sparse alternatives such as thresholding and extreme-value tests, on the other hand, require either stringent conditions or bootstrap to derive the null distribution and often suffer from size distortions due to the slow convergence. Based on a screening technique, we introduce a “power enhancement component”, which is zero under the null hypothesis with high probability, but diverges quickly under sparse alternatives. The proposed test statistic combines the power enhancement component with an asymptotically pivotal statistic, and strengthens the power under sparse alternatives. The null distribution does not require stringent regularity conditions, and is completely determined by that of the pivotal statistic. As specific applications, the proposed methods are applied to testing the factor pricing models and validating the cross-sectional independence in panel data models. PMID:26778846

  13. a Statistical Texture Feature for Building Collapse Information Extraction of SAR Image

    NASA Astrophysics Data System (ADS)

    Li, L.; Yang, H.; Chen, Q.; Liu, X.

    2018-04-01

    Synthetic Aperture Radar (SAR) has become one of the most important ways to extract post-disaster collapsed building information, due to its extreme versatility and almost all-weather, day-and-night working capability, etc. In view of the fact that the inherent statistical distribution of speckle in SAR images is not used to extract collapsed building information, this paper proposed a novel texture feature of statistical models of SAR images to extract the collapsed buildings. In the proposed feature, the texture parameter of G0 distribution from SAR images is used to reflect the uniformity of the target to extract the collapsed building. This feature not only considers the statistical distribution of SAR images, providing more accurate description of the object texture, but also is applied to extract collapsed building information of single-, dual- or full-polarization SAR data. The RADARSAT-2 data of Yushu earthquake which acquired on April 21, 2010 is used to present and analyze the performance of the proposed method. In addition, the applicability of this feature to SAR data with different polarizations is also analysed, which provides decision support for the data selection of collapsed building information extraction.

  14. Structural damage detection based on stochastic subspace identification and statistical pattern recognition: II. Experimental validation under varying temperature

    NASA Astrophysics Data System (ADS)

    Lin, Y. Q.; Ren, W. X.; Fang, S. E.

    2011-11-01

    Although most vibration-based damage detection methods can acquire satisfactory verification on analytical or numerical structures, most of them may encounter problems when applied to real-world structures under varying environments. The damage detection methods that directly extract damage features from the periodically sampled dynamic time history response measurements are desirable but relevant research and field application verification are still lacking. In this second part of a two-part paper, the robustness and performance of the statistics-based damage index using the forward innovation model by stochastic subspace identification of a vibrating structure proposed in the first part have been investigated against two prestressed reinforced concrete (RC) beams tested in the laboratory and a full-scale RC arch bridge tested in the field under varying environments. Experimental verification is focused on temperature effects. It is demonstrated that the proposed statistics-based damage index is insensitive to temperature variations but sensitive to the structural deterioration or state alteration. This makes it possible to detect the structural damage for the real-scale structures experiencing ambient excitations and varying environmental conditions.

  15. A Low-Cost Method for Multiple Disease Prediction.

    PubMed

    Bayati, Mohsen; Bhaskar, Sonia; Montanari, Andrea

    Recently, in response to the rising costs of healthcare services, employers that are financially responsible for the healthcare costs of their workforce have been investing in health improvement programs for their employees. A main objective of these so called "wellness programs" is to reduce the incidence of chronic illnesses such as cardiovascular disease, cancer, diabetes, and obesity, with the goal of reducing future medical costs. The majority of these wellness programs include an annual screening to detect individuals with the highest risk of developing chronic disease. Once these individuals are identified, the company can invest in interventions to reduce the risk of those individuals. However, capturing many biomarkers per employee creates a costly screening procedure. We propose a statistical data-driven method to address this challenge by minimizing the number of biomarkers in the screening procedure while maximizing the predictive power over a broad spectrum of diseases. Our solution uses multi-task learning and group dimensionality reduction from machine learning and statistics. We provide empirical validation of the proposed solution using data from two different electronic medical records systems, with comparisons to a statistical benchmark.

  16. Probabilistic-driven oriented Speckle reducing anisotropic diffusion with application to cardiac ultrasonic images.

    PubMed

    Vegas-Sanchez-Ferrero, G; Aja-Fernandez, S; Martin-Fernandez, M; Frangi, A F; Palencia, C

    2010-01-01

    A novel anisotropic diffusion filter is proposed in this work with application to cardiac ultrasonic images. It includes probabilistic models which describe the probability density function (PDF) of tissues and adapts the diffusion tensor to the image iteratively. For this purpose, a preliminary study is performed in order to select the probability models that best fit the stastitical behavior of each tissue class in cardiac ultrasonic images. Then, the parameters of the diffusion tensor are defined taking into account the statistical properties of the image at each voxel. When the structure tensor of the probability of belonging to each tissue is included in the diffusion tensor definition, a better boundaries estimates can be obtained instead of calculating directly the boundaries from the image. This is the main contribution of this work. Additionally, the proposed method follows the statistical properties of the image in each iteration. This is considered as a second contribution since state-of-the-art methods suppose that noise or statistical properties of the image do not change during the filter process.

  17. Bayesian Sensitivity Analysis of Statistical Models with Missing Data

    PubMed Central

    ZHU, HONGTU; IBRAHIM, JOSEPH G.; TANG, NIANSHENG

    2013-01-01

    Methods for handling missing data depend strongly on the mechanism that generated the missing values, such as missing completely at random (MCAR) or missing at random (MAR), as well as other distributional and modeling assumptions at various stages. It is well known that the resulting estimates and tests may be sensitive to these assumptions as well as to outlying observations. In this paper, we introduce various perturbations to modeling assumptions and individual observations, and then develop a formal sensitivity analysis to assess these perturbations in the Bayesian analysis of statistical models with missing data. We develop a geometric framework, called the Bayesian perturbation manifold, to characterize the intrinsic structure of these perturbations. We propose several intrinsic influence measures to perform sensitivity analysis and quantify the effect of various perturbations to statistical models. We use the proposed sensitivity analysis procedure to systematically investigate the tenability of the non-ignorable missing at random (NMAR) assumption. Simulation studies are conducted to evaluate our methods, and a dataset is analyzed to illustrate the use of our diagnostic measures. PMID:24753718

  18. Ladar imaging detection of salient map based on PWVD and Rényi entropy

    NASA Astrophysics Data System (ADS)

    Xu, Yuannan; Zhao, Yuan; Deng, Rong; Dong, Yanbing

    2013-10-01

    Spatial-frequency information of a given image can be extracted by associating the grey-level spatial data with one of the well-known spatial/spatial-frequency distributions. The Wigner-Ville distribution (WVD) has a good characteristic that the images can be represented in spatial/spatial-frequency domains. For intensity and range images of ladar, through the pseudo Wigner-Ville distribution (PWVD) using one or two dimension window, the statistical property of Rényi entropy is studied. We also analyzed the change of Rényi entropy's statistical property in the ladar intensity and range images when the man-made objects appear. From this foundation, a novel method for generating saliency map based on PWVD and Rényi entropy is proposed. After that, target detection is completed when the saliency map is segmented using a simple and convenient threshold method. For the ladar intensity and range images, experimental results show the proposed method can effectively detect the military vehicles from complex earth background with low false alarm.

  19. Mimic expert judgement through automated procedure for selecting rainfall events responsible for shallow landslide: A statistical approach to validation

    NASA Astrophysics Data System (ADS)

    Giovanna, Vessia; Luca, Pisano; Carmela, Vennari; Mauro, Rossi; Mario, Parise

    2016-01-01

    This paper proposes an automated method for the selection of rainfall data (duration, D, and cumulated, E), responsible for shallow landslide initiation. The method mimics an expert person identifying D and E from rainfall records through a manual procedure whose rules are applied according to her/his judgement. The comparison between the two methods is based on 300 D-E pairs drawn from temporal rainfall data series recorded in a 30 days time-lag before the landslide occurrence. Statistical tests, employed on D and E samples considered both paired and independent values to verify whether they belong to the same population, show that the automated procedure is able to replicate the expert pairs drawn by the expert judgment. Furthermore, a criterion based on cumulated distribution functions (CDFs) is proposed to select the most related D-E pairs to the expert one among the 6 drawn from the coded procedure for tracing the empirical rainfall threshold line.

  20. Statistical analysis on the signals monitoring multiphase flow patterns in pipeline-riser system

    NASA Astrophysics Data System (ADS)

    Ye, Jing; Guo, Liejin

    2013-07-01

    The signals monitoring petroleum transmission pipeline in offshore oil industry usually contain abundant information about the multiphase flow on flow assurance which includes the avoidance of most undesirable flow pattern. Therefore, extracting reliable features form these signals to analyze is an alternative way to examine the potential risks to oil platform. This paper is focused on characterizing multiphase flow patterns in pipeline-riser system that is often appeared in offshore oil industry and finding an objective criterion to describe the transition of flow patterns. Statistical analysis on pressure signal at the riser top is proposed, instead of normal prediction method based on inlet and outlet flow conditions which could not be easily determined during most situations. Besides, machine learning method (least square supported vector machine) is also performed to classify automatically the different flow patterns. The experiment results from a small-scale loop show that the proposed method is effective for analyzing the multiphase flow pattern.

  1. A fractional factorial probabilistic collocation method for uncertainty propagation of hydrologic model parameters in a reduced dimensional space

    NASA Astrophysics Data System (ADS)

    Wang, S.; Huang, G. H.; Huang, W.; Fan, Y. R.; Li, Z.

    2015-10-01

    In this study, a fractional factorial probabilistic collocation method is proposed to reveal statistical significance of hydrologic model parameters and their multi-level interactions affecting model outputs, facilitating uncertainty propagation in a reduced dimensional space. The proposed methodology is applied to the Xiangxi River watershed in China to demonstrate its validity and applicability, as well as its capability of revealing complex and dynamic parameter interactions. A set of reduced polynomial chaos expansions (PCEs) only with statistically significant terms can be obtained based on the results of factorial analysis of variance (ANOVA), achieving a reduction of uncertainty in hydrologic predictions. The predictive performance of reduced PCEs is verified by comparing against standard PCEs and the Monte Carlo with Latin hypercube sampling (MC-LHS) method in terms of reliability, sharpness, and Nash-Sutcliffe efficiency (NSE). Results reveal that the reduced PCEs are able to capture hydrologic behaviors of the Xiangxi River watershed, and they are efficient functional representations for propagating uncertainties in hydrologic predictions.

  2. Statistics of Smoothed Cosmic Fields in Perturbation Theory. I. Formulation and Useful Formulae in Second-Order Perturbation Theory

    NASA Astrophysics Data System (ADS)

    Matsubara, Takahiko

    2003-02-01

    We formulate a general method for perturbative evaluations of statistics of smoothed cosmic fields and provide useful formulae for application of the perturbation theory to various statistics. This formalism is an extensive generalization of the method used by Matsubara, who derived a weakly nonlinear formula of the genus statistic in a three-dimensional density field. After describing the general method, we apply the formalism to a series of statistics, including genus statistics, level-crossing statistics, Minkowski functionals, and a density extrema statistic, regardless of the dimensions in which each statistic is defined. The relation between the Minkowski functionals and other geometrical statistics is clarified. These statistics can be applied to several cosmic fields, including three-dimensional density field, three-dimensional velocity field, two-dimensional projected density field, and so forth. The results are detailed for second-order theory of the formalism. The effect of the bias is discussed. The statistics of smoothed cosmic fields as functions of rescaled threshold by volume fraction are discussed in the framework of second-order perturbation theory. In CDM-like models, their functional deviations from linear predictions plotted against the rescaled threshold are generally much smaller than that plotted against the direct threshold. There is still a slight meatball shift against rescaled threshold, which is characterized by asymmetry in depths of troughs in the genus curve. A theory-motivated asymmetry factor in the genus curve is proposed.

  3. CHOBS: Color Histogram of Block Statistics for Automatic Bleeding Detection in Wireless Capsule Endoscopy Video

    PubMed Central

    Ghosh, Tonmoy; Wahid, Khan A.

    2018-01-01

    Wireless capsule endoscopy (WCE) is the most advanced technology to visualize whole gastrointestinal (GI) tract in a non-invasive way. But the major disadvantage here, it takes long reviewing time, which is very laborious as continuous manual intervention is necessary. In order to reduce the burden of the clinician, in this paper, an automatic bleeding detection method for WCE video is proposed based on the color histogram of block statistics, namely CHOBS. A single pixel in WCE image may be distorted due to the capsule motion in the GI tract. Instead of considering individual pixel values, a block surrounding to that individual pixel is chosen for extracting local statistical features. By combining local block features of three different color planes of RGB color space, an index value is defined. A color histogram, which is extracted from those index values, provides distinguishable color texture feature. A feature reduction technique utilizing color histogram pattern and principal component analysis is proposed, which can drastically reduce the feature dimension. For bleeding zone detection, blocks are classified using extracted local features that do not incorporate any computational burden for feature extraction. From extensive experimentation on several WCE videos and 2300 images, which are collected from a publicly available database, a very satisfactory bleeding frame and zone detection performance is achieved in comparison to that obtained by some of the existing methods. In the case of bleeding frame detection, the accuracy, sensitivity, and specificity obtained from proposed method are 97.85%, 99.47%, and 99.15%, respectively, and in the case of bleeding zone detection, 95.75% of precision is achieved. The proposed method offers not only low feature dimension but also highly satisfactory bleeding detection performance, which even can effectively detect bleeding frame and zone in a continuous WCE video data. PMID:29468094

  4. Visualizing Similarity of Appearance by Arrangement of Cards

    PubMed Central

    Nakatsuji, Nao; Ihara, Hisayasu; Seno, Takeharu; Ito, Hiroshi

    2016-01-01

    This study proposes a novel method to extract the configuration of the psychological space by directly measuring subjects' similarity rating without computational work. Although multidimensional scaling (MDS) is well-known as a conventional method for extracting the psychological space, the method requires many pairwise evaluations. The times taken for evaluations increase in proportion to the square of the number of objects in MDS. The proposed method asks subjects to arrange cards on a poster sheet according to the degree of similarity of the objects. To compare the performance of the proposed method with the conventional one, we developed similarity maps of typefaces through the proposed method and through non-metric MDS. We calculated the trace correlation coefficient among all combinations of the configuration for both methods to evaluate the degree of similarity in the obtained configurations. The threshold value of trace correlation coefficient for statistically discriminating similar configuration was decided based on random data. The ratio of the trace correlation coefficient exceeding the threshold value was 62.0% so that the configurations of the typefaces obtained by the proposed method closely resembled those obtained by non-metric MDS. The required duration for the proposed method was approximately one third of the non-metric MDS's duration. In addition, all distances between objects in all the data for both methods were calculated. The frequency for the short distance in the proposed method was lower than that of the non-metric MDS so that a relatively small difference was likely to be emphasized among objects in the configuration by the proposed method. The card arrangement method we here propose, thus serves as a easier and time-saving tool to obtain psychological structures in the fields related to similarity of appearance. PMID:27242611

  5. Statistical reconstruction for cosmic ray muon tomography.

    PubMed

    Schultz, Larry J; Blanpied, Gary S; Borozdin, Konstantin N; Fraser, Andrew M; Hengartner, Nicolas W; Klimenko, Alexei V; Morris, Christopher L; Orum, Chris; Sossong, Michael J

    2007-08-01

    Highly penetrating cosmic ray muons constantly shower the earth at a rate of about 1 muon per cm2 per minute. We have developed a technique which exploits the multiple Coulomb scattering of these particles to perform nondestructive inspection without the use of artificial radiation. In prior work [1]-[3], we have described heuristic methods for processing muon data to create reconstructed images. In this paper, we present a maximum likelihood/expectation maximization tomographic reconstruction algorithm designed for the technique. This algorithm borrows much from techniques used in medical imaging, particularly emission tomography, but the statistics of muon scattering dictates differences. We describe the statistical model for multiple scattering, derive the reconstruction algorithm, and present simulated examples. We also propose methods to improve the robustness of the algorithm to experimental errors and events departing from the statistical model.

  6. Dealing with the Conflicting Results of Psycholinguistic Experiments: How to Resolve Them with the Help of Statistical Meta-analysis.

    PubMed

    Rákosi, Csilla

    2018-01-22

    This paper proposes the use of the tools of statistical meta-analysis as a method of conflict resolution with respect to experiments in cognitive linguistics. With the help of statistical meta-analysis, the effect size of similar experiments can be compared, a well-founded and robust synthesis of the experimental data can be achieved, and possible causes of any divergence(s) in the outcomes can be revealed. This application of statistical meta-analysis offers a novel method of how diverging evidence can be dealt with. The workability of this idea is exemplified by a case study dealing with a series of experiments conducted as non-exact replications of Thibodeau and Boroditsky (PLoS ONE 6(2):e16782, 2011. https://doi.org/10.1371/journal.pone.0016782 ).

  7. Quantifying Safety Margin Using the Risk-Informed Safety Margin Characterization (RISMC)

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Grabaskas, David; Bucknor, Matthew; Brunett, Acacia

    2015-04-26

    The Risk-Informed Safety Margin Characterization (RISMC), developed by Idaho National Laboratory as part of the Light-Water Reactor Sustainability Project, utilizes a probabilistic safety margin comparison between a load and capacity distribution, rather than a deterministic comparison between two values, as is usually done in best-estimate plus uncertainty analyses. The goal is to determine the failure probability, or in other words, the probability of the system load equaling or exceeding the system capacity. While this method has been used in pilot studies, there has been little work conducted investigating the statistical significance of the resulting failure probability. In particular, it ismore » difficult to determine how many simulations are necessary to properly characterize the failure probability. This work uses classical (frequentist) statistics and confidence intervals to examine the impact in statistical accuracy when the number of simulations is varied. Two methods are proposed to establish confidence intervals related to the failure probability established using a RISMC analysis. The confidence interval provides information about the statistical accuracy of the method utilized to explore the uncertainty space, and offers a quantitative method to gauge the increase in statistical accuracy due to performing additional simulations.« less

  8. Assessing Statistical Competencies in Clinical and Translational Science Education: One Size Does Not Fit All

    PubMed Central

    Lindsell, Christopher J.; Welty, Leah J.; Mazumdar, Madhu; Thurston, Sally W.; Rahbar, Mohammad H.; Carter, Rickey E.; Pollock, Bradley H.; Cucchiara, Andrew J.; Kopras, Elizabeth J.; Jovanovic, Borko D.; Enders, Felicity T.

    2014-01-01

    Abstract Introduction Statistics is an essential training component for a career in clinical and translational science (CTS). Given the increasing complexity of statistics, learners may have difficulty selecting appropriate courses. Our question was: what depth of statistical knowledge do different CTS learners require? Methods For three types of CTS learners (principal investigator, co‐investigator, informed reader of the literature), each with different backgrounds in research (no previous research experience, reader of the research literature, previous research experience), 18 experts in biostatistics, epidemiology, and research design proposed levels for 21 statistical competencies. Results Statistical competencies were categorized as fundamental, intermediate, or specialized. CTS learners who intend to become independent principal investigators require more specialized training, while those intending to become informed consumers of the medical literature require more fundamental education. For most competencies, less training was proposed for those with more research background. Discussion When selecting statistical coursework, the learner's research background and career goal should guide the decision. Some statistical competencies are considered to be more important than others. Baseline knowledge assessments may help learners identify appropriate coursework. Conclusion Rather than one size fits all, tailoring education to baseline knowledge, learner background, and future goals increases learning potential while minimizing classroom time. PMID:25212569

  9. A comparative study between three stability indicating spectrophotometric methods for the determination of diatrizoate sodium in presence of its cytotoxic degradation product based on two-wavelength selection

    NASA Astrophysics Data System (ADS)

    Riad, Safaa M.; El-Rahman, Mohamed K. Abd; Fawaz, Esraa M.; Shehata, Mostafa A.

    2015-06-01

    Three sensitive, selective, and precise stability indicating spectrophotometric methods for the determination of the X-ray contrast agent, diatrizoate sodium (DTA) in the presence of its acidic degradation product (highly cytotoxic 3,5-diamino metabolite) and in pharmaceutical formulation, were developed and validated. The first method is ratio difference, the second one is the bivariate method, and the third one is the dual wavelength method. The calibration curves for the three proposed methods are linear over a concentration range of 2-24 μg/mL. The selectivity of the proposed methods was tested using laboratory prepared mixtures. The proposed methods have been successfully applied to the analysis of DTA in pharmaceutical dosage forms without interference from other dosage form additives. The results were statistically compared with the official US pharmacopeial method. No significant difference for either accuracy or precision was observed.

  10. A comparative study between three stability indicating spectrophotometric methods for the determination of diatrizoate sodium in presence of its cytotoxic degradation product based on two-wavelength selection.

    PubMed

    Riad, Safaa M; El-Rahman, Mohamed K Abd; Fawaz, Esraa M; Shehata, Mostafa A

    2015-06-15

    Three sensitive, selective, and precise stability indicating spectrophotometric methods for the determination of the X-ray contrast agent, diatrizoate sodium (DTA) in the presence of its acidic degradation product (highly cytotoxic 3,5-diamino metabolite) and in pharmaceutical formulation, were developed and validated. The first method is ratio difference, the second one is the bivariate method, and the third one is the dual wavelength method. The calibration curves for the three proposed methods are linear over a concentration range of 2-24 μg/mL. The selectivity of the proposed methods was tested using laboratory prepared mixtures. The proposed methods have been successfully applied to the analysis of DTA in pharmaceutical dosage forms without interference from other dosage form additives. The results were statistically compared with the official US pharmacopeial method. No significant difference for either accuracy or precision was observed. Copyright © 2015 Elsevier B.V. All rights reserved.

  11. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Zhang, Fangyan; Zhang, Song; Chung Wong, Pak

    Effectively visualizing large graphs and capturing the statistical properties are two challenging tasks. To aid in these two tasks, many sampling approaches for graph simplification have been proposed, falling into three categories: node sampling, edge sampling, and traversal-based sampling. It is still unknown which approach is the best. We evaluate commonly used graph sampling methods through a combined visual and statistical comparison of graphs sampled at various rates. We conduct our evaluation on three graph models: random graphs, small-world graphs, and scale-free graphs. Initial results indicate that the effectiveness of a sampling method is dependent on the graph model, themore » size of the graph, and the desired statistical property. This benchmark study can be used as a guideline in choosing the appropriate method for a particular graph sampling task, and the results presented can be incorporated into graph visualization and analysis tools.« less

  12. Analysis of high-throughput biological data using their rank values.

    PubMed

    Dembélé, Doulaye

    2018-01-01

    High-throughput biological technologies are routinely used to generate gene expression profiling or cytogenetics data. To achieve high performance, methods available in the literature become more specialized and often require high computational resources. Here, we propose a new versatile method based on the data-ordering rank values. We use linear algebra, the Perron-Frobenius theorem and also extend a method presented earlier for searching differentially expressed genes for the detection of recurrent copy number aberration. A result derived from the proposed method is a one-sample Student's t-test based on rank values. The proposed method is to our knowledge the only that applies to gene expression profiling and to cytogenetics data sets. This new method is fast, deterministic, and requires a low computational load. Probabilities are associated with genes to allow a statistically significant subset selection in the data set. Stability scores are also introduced as quality parameters. The performance and comparative analyses were carried out using real data sets. The proposed method can be accessed through an R package available from the CRAN (Comprehensive R Archive Network) website: https://cran.r-project.org/web/packages/fcros .

  13. Estimation of integral curves from high angular resolution diffusion imaging (HARDI) data.

    PubMed

    Carmichael, Owen; Sakhanenko, Lyudmila

    2015-05-15

    We develop statistical methodology for a popular brain imaging technique HARDI based on the high order tensor model by Özarslan and Mareci [10]. We investigate how uncertainty in the imaging procedure propagates through all levels of the model: signals, tensor fields, vector fields, and fibers. We construct asymptotically normal estimators of the integral curves or fibers which allow us to trace the fibers together with confidence ellipsoids. The procedure is computationally intense as it blends linear algebra concepts from high order tensors with asymptotical statistical analysis. The theoretical results are illustrated on simulated and real datasets. This work generalizes the statistical methodology proposed for low angular resolution diffusion tensor imaging by Carmichael and Sakhanenko [3], to several fibers per voxel. It is also a pioneering statistical work on tractography from HARDI data. It avoids all the typical limitations of the deterministic tractography methods and it delivers the same information as probabilistic tractography methods. Our method is computationally cheap and it provides well-founded mathematical and statistical framework where diverse functionals on fibers, directions and tensors can be studied in a systematic and rigorous way.

  14. Estimation of integral curves from high angular resolution diffusion imaging (HARDI) data

    PubMed Central

    Carmichael, Owen; Sakhanenko, Lyudmila

    2015-01-01

    We develop statistical methodology for a popular brain imaging technique HARDI based on the high order tensor model by Özarslan and Mareci [10]. We investigate how uncertainty in the imaging procedure propagates through all levels of the model: signals, tensor fields, vector fields, and fibers. We construct asymptotically normal estimators of the integral curves or fibers which allow us to trace the fibers together with confidence ellipsoids. The procedure is computationally intense as it blends linear algebra concepts from high order tensors with asymptotical statistical analysis. The theoretical results are illustrated on simulated and real datasets. This work generalizes the statistical methodology proposed for low angular resolution diffusion tensor imaging by Carmichael and Sakhanenko [3], to several fibers per voxel. It is also a pioneering statistical work on tractography from HARDI data. It avoids all the typical limitations of the deterministic tractography methods and it delivers the same information as probabilistic tractography methods. Our method is computationally cheap and it provides well-founded mathematical and statistical framework where diverse functionals on fibers, directions and tensors can be studied in a systematic and rigorous way. PMID:25937674

  15. A method for classification of multisource data using interval-valued probabilities and its application to HIRIS data

    NASA Technical Reports Server (NTRS)

    Kim, H.; Swain, P. H.

    1991-01-01

    A method of classifying multisource data in remote sensing is presented. The proposed method considers each data source as an information source providing a body of evidence, represents statistical evidence by interval-valued probabilities, and uses Dempster's rule to integrate information based on multiple data source. The method is applied to the problems of ground-cover classification of multispectral data combined with digital terrain data such as elevation, slope, and aspect. Then this method is applied to simulated 201-band High Resolution Imaging Spectrometer (HIRIS) data by dividing the dimensionally huge data source into smaller and more manageable pieces based on the global statistical correlation information. It produces higher classification accuracy than the Maximum Likelihood (ML) classification method when the Hughes phenomenon is apparent.

  16. Built-up Areas Extraction in High Resolution SAR Imagery based on the method of Multiple Feature Weighted Fusion

    NASA Astrophysics Data System (ADS)

    Liu, X.; Zhang, J. X.; Zhao, Z.; Ma, A. D.

    2015-06-01

    Synthetic aperture radar in the application of remote sensing technology is becoming more and more widely because of its all-time and all-weather operation, feature extraction research in high resolution SAR image has become a hot topic of concern. In particular, with the continuous improvement of airborne SAR image resolution, image texture information become more abundant. It's of great significance to classification and extraction. In this paper, a novel method for built-up areas extraction using both statistical and structural features is proposed according to the built-up texture features. First of all, statistical texture features and structural features are respectively extracted by classical method of gray level co-occurrence matrix and method of variogram function, and the direction information is considered in this process. Next, feature weights are calculated innovatively according to the Bhattacharyya distance. Then, all features are weighted fusion. At last, the fused image is classified with K-means classification method and the built-up areas are extracted after post classification process. The proposed method has been tested by domestic airborne P band polarization SAR images, at the same time, two groups of experiments based on the method of statistical texture and the method of structural texture were carried out respectively. On the basis of qualitative analysis, quantitative analysis based on the built-up area selected artificially is enforced, in the relatively simple experimentation area, detection rate is more than 90%, in the relatively complex experimentation area, detection rate is also higher than the other two methods. In the study-area, the results show that this method can effectively and accurately extract built-up areas in high resolution airborne SAR imagery.

  17. Adaptive Encoding for Numerical Data Compression.

    ERIC Educational Resources Information Center

    Yokoo, Hidetoshi

    1994-01-01

    Discusses the adaptive compression of computer files of numerical data whose statistical properties are not given in advance. A new lossless coding method for this purpose, which utilizes Adelson-Velskii and Landis (AVL) trees, is proposed. The method is effective to any word length. Its application to the lossless compression of gray-scale images…

  18. Crossing statistic: reconstructing the expansion history of the universe

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Shafieloo, Arman, E-mail: arman@ewha.ac.kr

    2012-08-01

    We present that by combining Crossing Statistic [1,2] and Smoothing method [3-5] one can reconstruct the expansion history of the universe with a very high precision without considering any prior on the cosmological quantities such as the equation of state of dark energy. We show that the presented method performs very well in reconstruction of the expansion history of the universe independent of the underlying models and it works well even for non-trivial dark energy models with fast or slow changes in the equation of state of dark energy. Accuracy of the reconstructed quantities along with independence of the methodmore » to any prior or assumption gives the proposed method advantages to the other non-parametric methods proposed before in the literature. Applying on the Union 2.1 supernovae combined with WiggleZ BAO data we present the reconstructed results and test the consistency of the two data sets in a model independent manner. Results show that latest available supernovae and BAO data are in good agreement with each other and spatially flat ΛCDM model is in concordance with the current data.« less

  19. PCA leverage: outlier detection for high-dimensional functional magnetic resonance imaging data.

    PubMed

    Mejia, Amanda F; Nebel, Mary Beth; Eloyan, Ani; Caffo, Brian; Lindquist, Martin A

    2017-07-01

    Outlier detection for high-dimensional (HD) data is a popular topic in modern statistical research. However, one source of HD data that has received relatively little attention is functional magnetic resonance images (fMRI), which consists of hundreds of thousands of measurements sampled at hundreds of time points. At a time when the availability of fMRI data is rapidly growing-primarily through large, publicly available grassroots datasets-automated quality control and outlier detection methods are greatly needed. We propose principal components analysis (PCA) leverage and demonstrate how it can be used to identify outlying time points in an fMRI run. Furthermore, PCA leverage is a measure of the influence of each observation on the estimation of principal components, which are often of interest in fMRI data. We also propose an alternative measure, PCA robust distance, which is less sensitive to outliers and has controllable statistical properties. The proposed methods are validated through simulation studies and are shown to be highly accurate. We also conduct a reliability study using resting-state fMRI data from the Autism Brain Imaging Data Exchange and find that removal of outliers using the proposed methods results in more reliable estimation of subject-level resting-state networks using independent components analysis. © The Author 2017. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  20. Statistical Validation of Surrogate Endpoints: Another Look at the Prentice Criterion and Other Criteria.

    PubMed

    Saraf, Sanatan; Mathew, Thomas; Roy, Anindya

    2015-01-01

    For the statistical validation of surrogate endpoints, an alternative formulation is proposed for testing Prentice's fourth criterion, under a bivariate normal model. In such a setup, the criterion involves inference concerning an appropriate regression parameter, and the criterion holds if the regression parameter is zero. Testing such a null hypothesis has been criticized in the literature since it can only be used to reject a poor surrogate, and not to validate a good surrogate. In order to circumvent this, an equivalence hypothesis is formulated for the regression parameter, namely the hypothesis that the parameter is equivalent to zero. Such an equivalence hypothesis is formulated as an alternative hypothesis, so that the surrogate endpoint is statistically validated when the null hypothesis is rejected. Confidence intervals for the regression parameter and tests for the equivalence hypothesis are proposed using bootstrap methods and small sample asymptotics, and their performances are numerically evaluated and recommendations are made. The choice of the equivalence margin is a regulatory issue that needs to be addressed. The proposed equivalence testing formulation is also adopted for other parameters that have been proposed in the literature on surrogate endpoint validation, namely, the relative effect and proportion explained.

  1. Toward Global Comparability of Sexual Orientation Data in Official Statistics: A Conceptual Framework of Sexual Orientation for Health Data Collection in New Zealand's Official Statistics System

    PubMed Central

    Gray, Alistair; Veale, Jaimie F.; Binson, Diane; Sell, Randell L.

    2013-01-01

    Objective. Effectively addressing health disparities experienced by sexual minority populations requires high-quality official data on sexual orientation. We developed a conceptual framework of sexual orientation to improve the quality of sexual orientation data in New Zealand's Official Statistics System. Methods. We reviewed conceptual and methodological literature, culminating in a draft framework. To improve the framework, we held focus groups and key-informant interviews with sexual minority stakeholders and producers and consumers of official statistics. An advisory board of experts provided additional guidance. Results. The framework proposes working definitions of the sexual orientation topic and measurement concepts, describes dimensions of the measurement concepts, discusses variables framing the measurement concepts, and outlines conceptual grey areas. Conclusion. The framework proposes standard definitions and concepts for the collection of official sexual orientation data in New Zealand. It presents a model for producers of official statistics in other countries, who wish to improve the quality of health data on their citizens. PMID:23840231

  2. Applications of modern statistical methods to analysis of data in physical science

    NASA Astrophysics Data System (ADS)

    Wicker, James Eric

    Modern methods of statistical and computational analysis offer solutions to dilemmas confronting researchers in physical science. Although the ideas behind modern statistical and computational analysis methods were originally introduced in the 1970's, most scientists still rely on methods written during the early era of computing. These researchers, who analyze increasingly voluminous and multivariate data sets, need modern analysis methods to extract the best results from their studies. The first section of this work showcases applications of modern linear regression. Since the 1960's, many researchers in spectroscopy have used classical stepwise regression techniques to derive molecular constants. However, problems with thresholds of entry and exit for model variables plagues this analysis method. Other criticisms of this kind of stepwise procedure include its inefficient searching method, the order in which variables enter or leave the model and problems with overfitting data. We implement an information scoring technique that overcomes the assumptions inherent in the stepwise regression process to calculate molecular model parameters. We believe that this kind of information based model evaluation can be applied to more general analysis situations in physical science. The second section proposes new methods of multivariate cluster analysis. The K-means algorithm and the EM algorithm, introduced in the 1960's and 1970's respectively, formed the basis of multivariate cluster analysis methodology for many years. However, several shortcomings of these methods include strong dependence on initial seed values and inaccurate results when the data seriously depart from hypersphericity. We propose new cluster analysis methods based on genetic algorithms that overcomes the strong dependence on initial seed values. In addition, we propose a generalization of the Genetic K-means algorithm which can accurately identify clusters with complex hyperellipsoidal covariance structures. We then use this new algorithm in a genetic algorithm based Expectation-Maximization process that can accurately calculate parameters describing complex clusters in a mixture model routine. Using the accuracy of this GEM algorithm, we assign information scores to cluster calculations in order to best identify the number of mixture components in a multivariate data set. We will showcase how these algorithms can be used to process multivariate data from astronomical observations.

  3. Bayesian approach to inverse statistical mechanics.

    PubMed

    Habeck, Michael

    2014-05-01

    Inverse statistical mechanics aims to determine particle interactions from ensemble properties. This article looks at this inverse problem from a Bayesian perspective and discusses several statistical estimators to solve it. In addition, a sequential Monte Carlo algorithm is proposed that draws the interaction parameters from their posterior probability distribution. The posterior probability involves an intractable partition function that is estimated along with the interactions. The method is illustrated for inverse problems of varying complexity, including the estimation of a temperature, the inverse Ising problem, maximum entropy fitting, and the reconstruction of molecular interaction potentials.

  4. Bayesian approach to inverse statistical mechanics

    NASA Astrophysics Data System (ADS)

    Habeck, Michael

    2014-05-01

    Inverse statistical mechanics aims to determine particle interactions from ensemble properties. This article looks at this inverse problem from a Bayesian perspective and discusses several statistical estimators to solve it. In addition, a sequential Monte Carlo algorithm is proposed that draws the interaction parameters from their posterior probability distribution. The posterior probability involves an intractable partition function that is estimated along with the interactions. The method is illustrated for inverse problems of varying complexity, including the estimation of a temperature, the inverse Ising problem, maximum entropy fitting, and the reconstruction of molecular interaction potentials.

  5. Probability genotype imputation method and integrated weighted lasso for QTL identification.

    PubMed

    Demetrashvili, Nino; Van den Heuvel, Edwin R; Wit, Ernst C

    2013-12-30

    Many QTL studies have two common features: (1) often there is missing marker information, (2) among many markers involved in the biological process only a few are causal. In statistics, the second issue falls under the headings "sparsity" and "causal inference". The goal of this work is to develop a two-step statistical methodology for QTL mapping for markers with binary genotypes. The first step introduces a novel imputation method for missing genotypes. Outcomes of the proposed imputation method are probabilities which serve as weights to the second step, namely in weighted lasso. The sparse phenotype inference is employed to select a set of predictive markers for the trait of interest. Simulation studies validate the proposed methodology under a wide range of realistic settings. Furthermore, the methodology outperforms alternative imputation and variable selection methods in such studies. The methodology was applied to an Arabidopsis experiment, containing 69 markers for 165 recombinant inbred lines of a F8 generation. The results confirm previously identified regions, however several new markers are also found. On the basis of the inferred ROC behavior these markers show good potential for being real, especially for the germination trait Gmax. Our imputation method shows higher accuracy in terms of sensitivity and specificity compared to alternative imputation method. Also, the proposed weighted lasso outperforms commonly practiced multiple regression as well as the traditional lasso and adaptive lasso with three weighting schemes. This means that under realistic missing data settings this methodology can be used for QTL identification.

  6. Comparing two correlated C indices with right-censored survival outcome: a one-shot nonparametric approach.

    PubMed

    Kang, Le; Chen, Weijie; Petrick, Nicholas A; Gallas, Brandon D

    2015-02-20

    The area under the receiver operating characteristic curve is often used as a summary index of the diagnostic ability in evaluating biomarkers when the clinical outcome (truth) is binary. When the clinical outcome is right-censored survival time, the C index, motivated as an extension of area under the receiver operating characteristic curve, has been proposed by Harrell as a measure of concordance between a predictive biomarker and the right-censored survival outcome. In this work, we investigate methods for statistical comparison of two diagnostic or predictive systems, of which they could either be two biomarkers or two fixed algorithms, in terms of their C indices. We adopt a U-statistics-based C estimator that is asymptotically normal and develop a nonparametric analytical approach to estimate the variance of the C estimator and the covariance of two C estimators. A z-score test is then constructed to compare the two C indices. We validate our one-shot nonparametric method via simulation studies in terms of the type I error rate and power. We also compare our one-shot method with resampling methods including the jackknife and the bootstrap. Simulation results show that the proposed one-shot method provides almost unbiased variance estimations and has satisfactory type I error control and power. Finally, we illustrate the use of the proposed method with an example from the Framingham Heart Study. Copyright © 2014 John Wiley & Sons, Ltd.

  7. Incorporating conditional random fields and active learning to improve sentiment identification.

    PubMed

    Zhang, Kunpeng; Xie, Yusheng; Yang, Yi; Sun, Aaron; Liu, Hengchang; Choudhary, Alok

    2014-10-01

    Many machine learning, statistical, and computational linguistic methods have been developed to identify sentiment of sentences in documents, yielding promising results. However, most of state-of-the-art methods focus on individual sentences and ignore the impact of context on the meaning of a sentence. In this paper, we propose a method based on conditional random fields to incorporate sentence structure and context information in addition to syntactic information for improving sentiment identification. We also investigate how human interaction affects the accuracy of sentiment labeling using limited training data. We propose and evaluate two different active learning strategies for labeling sentiment data. Our experiments with the proposed approach demonstrate a 5%-15% improvement in accuracy on Amazon customer reviews compared to existing supervised learning and rule-based methods. Copyright © 2014 Elsevier Ltd. All rights reserved.

  8. Weakly supervised image semantic segmentation based on clustering superpixels

    NASA Astrophysics Data System (ADS)

    Yan, Xiong; Liu, Xiaohua

    2018-04-01

    In this paper, we propose an image semantic segmentation model which is trained from image-level labeled images. The proposed model starts with superpixel segmenting, and features of the superpixels are extracted by trained CNN. We introduce a superpixel-based graph followed by applying the graph partition method to group correlated superpixels into clusters. For the acquisition of inter-label correlations between the image-level labels in dataset, we not only utilize label co-occurrence statistics but also exploit visual contextual cues simultaneously. At last, we formulate the task of mapping appropriate image-level labels to the detected clusters as a problem of convex minimization. Experimental results on MSRC-21 dataset and LableMe dataset show that the proposed method has a better performance than most of the weakly supervised methods and is even comparable to fully supervised methods.

  9. A Weibull characterization for tensile fracture of multicomponent brittle fibers

    NASA Technical Reports Server (NTRS)

    Barrows, R. G.

    1977-01-01

    A statistical characterization for multicomponent brittle fibers in presented. The method, which is an extension of usual Weibull distribution procedures, statistically considers the components making up a fiber (e.g., substrate, sheath, and surface) as separate entities and taken together as in a fiber. Tensile data for silicon carbide fiber and for an experimental carbon-boron alloy fiber are evaluated in terms of the proposed multicomponent Weibull characterization.

  10. Modern methods for the quality management of high-rate melt solidification

    NASA Astrophysics Data System (ADS)

    Vasiliev, V. A.; Odinokov, S. A.; Serov, M. M.

    2016-12-01

    The quality management of high-rate melt solidification needs combined solution obtained by methods and approaches adapted to a certain situation. Technological audit is recommended to estimate the possibilities of the process. Statistical methods are proposed with the choice of key parameters. Numerical methods, which can be used to perform simulation under multifactor technological conditions, and an increase in the quality of decisions are of particular importance.

  11. A heuristic statistical stopping rule for iterative reconstruction in emission tomography.

    PubMed

    Ben Bouallègue, F; Crouzet, J F; Mariano-Goulart, D

    2013-01-01

    We propose a statistical stopping criterion for iterative reconstruction in emission tomography based on a heuristic statistical description of the reconstruction process. The method was assessed for MLEM reconstruction. Based on Monte-Carlo numerical simulations and using a perfectly modeled system matrix, our method was compared with classical iterative reconstruction followed by low-pass filtering in terms of Euclidian distance to the exact object, noise, and resolution. The stopping criterion was then evaluated with realistic PET data of a Hoffman brain phantom produced using the GATE platform for different count levels. The numerical experiments showed that compared with the classical method, our technique yielded significant improvement of the noise-resolution tradeoff for a wide range of counting statistics compatible with routine clinical settings. When working with realistic data, the stopping rule allowed a qualitatively and quantitatively efficient determination of the optimal image. Our method appears to give a reliable estimation of the optimal stopping point for iterative reconstruction. It should thus be of practical interest as it produces images with similar or better quality than classical post-filtered iterative reconstruction with a mastered computation time.

  12. A bootstrapping method for development of Treebank

    NASA Astrophysics Data System (ADS)

    Zarei, F.; Basirat, A.; Faili, H.; Mirain, M.

    2017-01-01

    Using statistical approaches beside the traditional methods of natural language processing could significantly improve both the quality and performance of several natural language processing (NLP) tasks. The effective usage of these approaches is subject to the availability of the informative, accurate and detailed corpora on which the learners are trained. This article introduces a bootstrapping method for developing annotated corpora based on a complex and rich linguistically motivated elementary structure called supertag. To this end, a hybrid method for supertagging is proposed that combines both of the generative and discriminative methods of supertagging. The method was applied on a subset of Wall Street Journal (WSJ) in order to annotate its sentences with a set of linguistically motivated elementary structures of the English XTAG grammar that is using a lexicalised tree-adjoining grammar formalism. The empirical results confirm that the bootstrapping method provides a satisfactory way for annotating the English sentences with the mentioned structures. The experiments show that the method could automatically annotate about 20% of WSJ with the accuracy of F-measure about 80% of which is particularly 12% higher than the F-measure of the XTAG Treebank automatically generated from the approach proposed by Basirat and Faili [(2013). Bridge the gap between statistical and hand-crafted grammars. Computer Speech and Language, 27, 1085-1104].

  13. The review and results of different methods for facial recognition

    NASA Astrophysics Data System (ADS)

    Le, Yifan

    2017-09-01

    In recent years, facial recognition draws much attention due to its wide potential applications. As a unique technology in Biometric Identification, facial recognition represents a significant improvement since it could be operated without cooperation of people under detection. Hence, facial recognition will be taken into defense system, medical detection, human behavior understanding, etc. Several theories and methods have been established to make progress in facial recognition: (1) A novel two-stage facial landmark localization method is proposed which has more accurate facial localization effect under specific database; (2) A statistical face frontalization method is proposed which outperforms state-of-the-art methods for face landmark localization; (3) It proposes a general facial landmark detection algorithm to handle images with severe occlusion and images with large head poses; (4) There are three methods proposed on Face Alignment including shape augmented regression method, pose-indexed based multi-view method and a learning based method via regressing local binary features. The aim of this paper is to analyze previous work of different aspects in facial recognition, focusing on concrete method and performance under various databases. In addition, some improvement measures and suggestions in potential applications will be put forward.

  14. Horsetail matching: a flexible approach to optimization under uncertainty

    NASA Astrophysics Data System (ADS)

    Cook, L. W.; Jarrett, J. P.

    2018-04-01

    It is important to design engineering systems to be robust with respect to uncertainties in the design process. Often, this is done by considering statistical moments, but over-reliance on statistical moments when formulating a robust optimization can produce designs that are stochastically dominated by other feasible designs. This article instead proposes a formulation for optimization under uncertainty that minimizes the difference between a design's cumulative distribution function and a target. A standard target is proposed that produces stochastically non-dominated designs, but the formulation also offers enough flexibility to recover existing approaches for robust optimization. A numerical implementation is developed that employs kernels to give a differentiable objective function. The method is applied to algebraic test problems and a robust transonic airfoil design problem where it is compared to multi-objective, weighted-sum and density matching approaches to robust optimization; several advantages over these existing methods are demonstrated.

  15. Preliminary comparative assessment of PM10 hourly measurement results from new monitoring stations type using stochastic and exploratory methodology and models

    NASA Astrophysics Data System (ADS)

    Czechowski, Piotr Oskar; Owczarek, Tomasz; Badyda, Artur; Majewski, Grzegorz; Rogulski, Mariusz; Ogrodnik, Paweł

    2018-01-01

    The paper presents selected preliminary stage key issues proposed extended equivalence measurement results assessment for new portable devices - the comparability PM10 concentration results hourly series with reference station measurement results with statistical methods. In article presented new portable meters technical aspects. The emphasis was placed on the comparability the results using the stochastic and exploratory methods methodology concept. The concept is based on notice that results series simple comparability in the time domain is insufficient. The comparison of regularity should be done in three complementary fields of statistical modeling: time, frequency and space. The proposal is based on model's results of five annual series measurement results new mobile devices and WIOS (Provincial Environmental Protection Inspectorate) reference station located in Nowy Sacz city. The obtained results indicate both the comparison methodology completeness and the high correspondence obtained new measurements results devices with reference.

  16. Probabilistic registration of an unbiased statistical shape model to ultrasound images of the spine

    NASA Astrophysics Data System (ADS)

    Rasoulian, Abtin; Rohling, Robert N.; Abolmaesumi, Purang

    2012-02-01

    The placement of an epidural needle is among the most difficult regional anesthetic techniques. Ultrasound has been proposed to improve success of placement. However, it has not become the standard-of-care because of limitations in the depictions and interpretation of the key anatomical features. We propose to augment the ultrasound images with a registered statistical shape model of the spine to aid interpretation. The model is created with a novel deformable group-wise registration method which utilizes a probabilistic approach to register groups of point sets. The method is compared to a volume-based model building technique and it demonstrates better generalization and compactness. We instantiate and register the shape model to a spine surface probability map extracted from the ultrasound images. Validation is performed on human subjects. The achieved registration accuracy (2-4 mm) is sufficient to guide the choice of puncture site and trajectory of an epidural needle.

  17. Methods and means of Fourier-Stokes polarimetry and the spatial-frequency filtering of phase anisotropy manifestations in endometriosis diagnostics

    NASA Astrophysics Data System (ADS)

    Ushenko, A. G.; Dubolazov, O. V.; Ushenko, Vladimir A.; Ushenko, Yu. A.; Sakhnovskiy, M. Yu.; Prydiy, O. G.; Lakusta, I. I.; Novakovskaya, O. Yu.; Melenko, S. R.

    2016-12-01

    This research presents investigation results of diagnostic efficiency of a new azimuthally stable Mueller-matrix method of laser autofluorescence coordinate distributions analysis of dried polycrystalline films of uterine cavity peritoneal fluid. A new model of generalized optical anisotropy of biological tissues protein networks is proposed in order to define the processes of laser autofluorescence. The influence of complex mechanisms of both phase anisotropy (linear birefringence and optical activity) and linear (circular) dichroism is taken into account. The interconnections between the azimuthally stable Mueller-matrix elements characterizing laser autofluorescence and different mechanisms of optical anisotropy are determined. The statistic analysis of coordinate distributions of such Mueller-matrix rotation invariants is proposed. Thereupon the quantitative criteria (statistic moments of the 1st to the 4th order) of differentiation of dried polycrystalline films of peritoneal fluid - group 1 (healthy donors) and group 2 (uterus endometriosis patients) are estimated.

  18. Epidermis area detection for immunofluorescence microscopy

    NASA Astrophysics Data System (ADS)

    Dovganich, Andrey; Krylov, Andrey; Nasonov, Andrey; Makhneva, Natalia

    2018-04-01

    We propose a novel image segmentation method for immunofluorescence microscopy images of skin tissue for the diagnosis of various skin diseases. The segmentation is based on machine learning algorithms. The feature vector is filled by three groups of features: statistical features, Laws' texture energy measures and local binary patterns. The images are preprocessed for better learning. Different machine learning algorithms have been used and the best results have been obtained with random forest algorithm. We use the proposed method to detect the epidermis region as a part of pemphigus diagnosis system.

  19. Spatial Statistics for Segmenting Histological Structures in H&E Stained Tissue Images.

    PubMed

    Nguyen, Luong; Tosun, Akif Burak; Fine, Jeffrey L; Lee, Adrian V; Taylor, D Lansing; Chennubhotla, S Chakra

    2017-07-01

    Segmenting a broad class of histological structures in transmitted light and/or fluorescence-based images is a prerequisite for determining the pathological basis of cancer, elucidating spatial interactions between histological structures in tumor microenvironments (e.g., tumor infiltrating lymphocytes), facilitating precision medicine studies with deep molecular profiling, and providing an exploratory tool for pathologists. This paper focuses on segmenting histological structures in hematoxylin- and eosin-stained images of breast tissues, e.g., invasive carcinoma, carcinoma in situ, atypical and normal ducts, adipose tissue, and lymphocytes. We propose two graph-theoretic segmentation methods based on local spatial color and nuclei neighborhood statistics. For benchmarking, we curated a data set of 232 high-power field breast tissue images together with expertly annotated ground truth. To accurately model the preference for histological structures (ducts, vessels, tumor nets, adipose, etc.) over the remaining connective tissue and non-tissue areas in ground truth annotations, we propose a new region-based score for evaluating segmentation algorithms. We demonstrate the improvement of our proposed methods over the state-of-the-art algorithms in both region- and boundary-based performance measures.

  20. Load Model Verification, Validation and Calibration Framework by Statistical Analysis on Field Data

    NASA Astrophysics Data System (ADS)

    Jiao, Xiangqing; Liao, Yuan; Nguyen, Thai

    2017-11-01

    Accurate load models are critical for power system analysis and operation. A large amount of research work has been done on load modeling. Most of the existing research focuses on developing load models, while little has been done on developing formal load model verification and validation (V&V) methodologies or procedures. Most of the existing load model validation is based on qualitative rather than quantitative analysis. In addition, not all aspects of model V&V problem have been addressed by the existing approaches. To complement the existing methods, this paper proposes a novel load model verification and validation framework that can systematically and more comprehensively examine load model's effectiveness and accuracy. Statistical analysis, instead of visual check, quantifies the load model's accuracy, and provides a confidence level of the developed load model for model users. The analysis results can also be used to calibrate load models. The proposed framework can be used as a guidance to systematically examine load models for utility engineers and researchers. The proposed method is demonstrated through analysis of field measurements collected from a utility system.

  1. FIT: statistical modeling tool for transcriptome dynamics under fluctuating field conditions

    PubMed Central

    Iwayama, Koji; Aisaka, Yuri; Kutsuna, Natsumaro

    2017-01-01

    Abstract Motivation: Considerable attention has been given to the quantification of environmental effects on organisms. In natural conditions, environmental factors are continuously changing in a complex manner. To reveal the effects of such environmental variations on organisms, transcriptome data in field environments have been collected and analyzed. Nagano et al. proposed a model that describes the relationship between transcriptomic variation and environmental conditions and demonstrated the capability to predict transcriptome variation in rice plants. However, the computational cost of parameter optimization has prevented its wide application. Results: We propose a new statistical model and efficient parameter optimization based on the previous study. We developed and released FIT, an R package that offers functions for parameter optimization and transcriptome prediction. The proposed method achieves comparable or better prediction performance within a shorter computational time than the previous method. The package will facilitate the study of the environmental effects on transcriptomic variation in field conditions. Availability and Implementation: Freely available from CRAN (https://cran.r-project.org/web/packages/FIT/). Contact: anagano@agr.ryukoku.ac.jp Supplementary information: Supplementary data are available at Bioinformatics online PMID:28158396

  2. Fault Diagnosis for Rotating Machinery Using Vibration Measurement Deep Statistical Feature Learning.

    PubMed

    Li, Chuan; Sánchez, René-Vinicio; Zurita, Grover; Cerrada, Mariela; Cabrera, Diego

    2016-06-17

    Fault diagnosis is important for the maintenance of rotating machinery. The detection of faults and fault patterns is a challenging part of machinery fault diagnosis. To tackle this problem, a model for deep statistical feature learning from vibration measurements of rotating machinery is presented in this paper. Vibration sensor signals collected from rotating mechanical systems are represented in the time, frequency, and time-frequency domains, each of which is then used to produce a statistical feature set. For learning statistical features, real-value Gaussian-Bernoulli restricted Boltzmann machines (GRBMs) are stacked to develop a Gaussian-Bernoulli deep Boltzmann machine (GDBM). The suggested approach is applied as a deep statistical feature learning tool for both gearbox and bearing systems. The fault classification performances in experiments using this approach are 95.17% for the gearbox, and 91.75% for the bearing system. The proposed approach is compared to such standard methods as a support vector machine, GRBM and a combination model. In experiments, the best fault classification rate was detected using the proposed model. The results show that deep learning with statistical feature extraction has an essential improvement potential for diagnosing rotating machinery faults.

  3. Fault Diagnosis for Rotating Machinery Using Vibration Measurement Deep Statistical Feature Learning

    PubMed Central

    Li, Chuan; Sánchez, René-Vinicio; Zurita, Grover; Cerrada, Mariela; Cabrera, Diego

    2016-01-01

    Fault diagnosis is important for the maintenance of rotating machinery. The detection of faults and fault patterns is a challenging part of machinery fault diagnosis. To tackle this problem, a model for deep statistical feature learning from vibration measurements of rotating machinery is presented in this paper. Vibration sensor signals collected from rotating mechanical systems are represented in the time, frequency, and time-frequency domains, each of which is then used to produce a statistical feature set. For learning statistical features, real-value Gaussian-Bernoulli restricted Boltzmann machines (GRBMs) are stacked to develop a Gaussian-Bernoulli deep Boltzmann machine (GDBM). The suggested approach is applied as a deep statistical feature learning tool for both gearbox and bearing systems. The fault classification performances in experiments using this approach are 95.17% for the gearbox, and 91.75% for the bearing system. The proposed approach is compared to such standard methods as a support vector machine, GRBM and a combination model. In experiments, the best fault classification rate was detected using the proposed model. The results show that deep learning with statistical feature extraction has an essential improvement potential for diagnosing rotating machinery faults. PMID:27322273

  4. A powerful score-based test statistic for detecting gene-gene co-association.

    PubMed

    Xu, Jing; Yuan, Zhongshang; Ji, Jiadong; Zhang, Xiaoshuai; Li, Hongkai; Wu, Xuesen; Xue, Fuzhong; Liu, Yanxun

    2016-01-29

    The genetic variants identified by Genome-wide association study (GWAS) can only account for a small proportion of the total heritability for complex disease. The existence of gene-gene joint effects which contains the main effects and their co-association is one of the possible explanations for the "missing heritability" problems. Gene-gene co-association refers to the extent to which the joint effects of two genes differ from the main effects, not only due to the traditional interaction under nearly independent condition but the correlation between genes. Generally, genes tend to work collaboratively within specific pathway or network contributing to the disease and the specific disease-associated locus will often be highly correlated (e.g. single nucleotide polymorphisms (SNPs) in linkage disequilibrium). Therefore, we proposed a novel score-based statistic (SBS) as a gene-based method for detecting gene-gene co-association. Various simulations illustrate that, under different sample sizes, marginal effects of causal SNPs and co-association levels, the proposed SBS has the better performance than other existed methods including single SNP-based and principle component analysis (PCA)-based logistic regression model, the statistics based on canonical correlations (CCU), kernel canonical correlation analysis (KCCU), partial least squares path modeling (PLSPM) and delta-square (δ (2)) statistic. The real data analysis of rheumatoid arthritis (RA) further confirmed its advantages in practice. SBS is a powerful and efficient gene-based method for detecting gene-gene co-association.

  5. Bayesian inference for joint modelling of longitudinal continuous, binary and ordinal events.

    PubMed

    Li, Qiuju; Pan, Jianxin; Belcher, John

    2016-12-01

    In medical studies, repeated measurements of continuous, binary and ordinal outcomes are routinely collected from the same patient. Instead of modelling each outcome separately, in this study we propose to jointly model the trivariate longitudinal responses, so as to take account of the inherent association between the different outcomes and thus improve statistical inferences. This work is motivated by a large cohort study in the North West of England, involving trivariate responses from each patient: Body Mass Index, Depression (Yes/No) ascertained with cut-off score not less than 8 at the Hospital Anxiety and Depression Scale, and Pain Interference generated from the Medical Outcomes Study 36-item short-form health survey with values returned on an ordinal scale 1-5. There are some well-established methods for combined continuous and binary, or even continuous and ordinal responses, but little work was done on the joint analysis of continuous, binary and ordinal responses. We propose conditional joint random-effects models, which take into account the inherent association between the continuous, binary and ordinal outcomes. Bayesian analysis methods are used to make statistical inferences. Simulation studies show that, by jointly modelling the trivariate outcomes, standard deviations of the estimates of parameters in the models are smaller and much more stable, leading to more efficient parameter estimates and reliable statistical inferences. In the real data analysis, the proposed joint analysis yields a much smaller deviance information criterion value than the separate analysis, and shows other good statistical properties too. © The Author(s) 2014.

  6. Local Linear Regression for Data with AR Errors.

    PubMed

    Li, Runze; Li, Yan

    2009-07-01

    In many statistical applications, data are collected over time, and they are likely correlated. In this paper, we investigate how to incorporate the correlation information into the local linear regression. Under the assumption that the error process is an auto-regressive process, a new estimation procedure is proposed for the nonparametric regression by using local linear regression method and the profile least squares techniques. We further propose the SCAD penalized profile least squares method to determine the order of auto-regressive process. Extensive Monte Carlo simulation studies are conducted to examine the finite sample performance of the proposed procedure, and to compare the performance of the proposed procedures with the existing one. From our empirical studies, the newly proposed procedures can dramatically improve the accuracy of naive local linear regression with working-independent error structure. We illustrate the proposed methodology by an analysis of real data set.

  7. A novel hybrid meta-heuristic technique applied to the well-known benchmark optimization problems

    NASA Astrophysics Data System (ADS)

    Abtahi, Amir-Reza; Bijari, Afsane

    2017-03-01

    In this paper, a hybrid meta-heuristic algorithm, based on imperialistic competition algorithm (ICA), harmony search (HS), and simulated annealing (SA) is presented. The body of the proposed hybrid algorithm is based on ICA. The proposed hybrid algorithm inherits the advantages of the process of harmony creation in HS algorithm to improve the exploitation phase of the ICA algorithm. In addition, the proposed hybrid algorithm uses SA to make a balance between exploration and exploitation phases. The proposed hybrid algorithm is compared with several meta-heuristic methods, including genetic algorithm (GA), HS, and ICA on several well-known benchmark instances. The comprehensive experiments and statistical analysis on standard benchmark functions certify the superiority of the proposed method over the other algorithms. The efficacy of the proposed hybrid algorithm is promising and can be used in several real-life engineering and management problems.

  8. Modeling of carbon dioxide condensation in the high pressure flows using the statistical BGK approach

    NASA Astrophysics Data System (ADS)

    Kumar, Rakesh; Li, Zheng; Levin, Deborah A.

    2011-05-01

    In this work, we propose a new heat accommodation model to simulate freely expanding homogeneous condensation flows of gaseous carbon dioxide using a new approach, the statistical Bhatnagar-Gross-Krook method. The motivation for the present work comes from the earlier work of Li et al. [J. Phys. Chem. 114, 5276 (2010)] in which condensation models were proposed and used in the direct simulation Monte Carlo method to simulate the flow of carbon dioxide from supersonic expansions of small nozzles into near-vacuum conditions. Simulations conducted for stagnation pressures of one and three bar were compared with the measurements of gas and cluster number densities, cluster size, and carbon dioxide rotational temperature obtained by Ramos et al. [Phys. Rev. A 72, 3204 (2005)]. Due to the high computational cost of direct simulation Monte Carlo method, comparison between simulations and data could only be performed for these stagnation pressures, with good agreement obtained beyond the condensation onset point, in the farfield. As the stagnation pressure increases, the degree of condensation also increases; therefore, to improve the modeling of condensation onset, one must be able to simulate higher stagnation pressures. In simulations of an expanding flow of argon through a nozzle, Kumar et al. [AIAA J. 48, 1531 (2010)] found that the statistical Bhatnagar-Gross-Krook method provides the same accuracy as direct simulation Monte Carlo method, but, at one half of the computational cost. In this work, the statistical Bhatnagar-Gross-Krook method was modified to account for internal degrees of freedom for multi-species polyatomic gases. With the computational approach in hand, we developed and tested a new heat accommodation model for a polyatomic system to properly account for the heat release of condensation. We then developed condensation models in the framework of the statistical Bhatnagar-Gross-Krook method. Simulations were found to agree well with the experiment for all stagnation pressure cases (1-5 bar), validating the accuracy of the Bhatnagar-Gross-Krook based condensation model in capturing the physics of condensation.

  9. Infrared maritime target detection using the high order statistic filtering in fractional Fourier domain

    NASA Astrophysics Data System (ADS)

    Zhou, Anran; Xie, Weixin; Pei, Jihong

    2018-06-01

    Accurate detection of maritime targets in infrared imagery under various sea clutter conditions is always a challenging task. The fractional Fourier transform (FRFT) is the extension of the Fourier transform in the fractional order, and has richer spatial-frequency information. By combining it with the high order statistic filtering, a new ship detection method is proposed. First, the proper range of angle parameter is determined to make it easier for the ship components and background to be separated. Second, a new high order statistic curve (HOSC) at each fractional frequency point is designed. It is proved that maximal peak interval in HOSC reflects the target information, while the points outside the interval reflect the background. And the value of HOSC relative to the ship is much bigger than that to the sea clutter. Then, search the curve's maximal target peak interval and extract the interval by bandpass filtering in fractional Fourier domain. The value outside the peak interval of HOSC decreases rapidly to 0, so the background is effectively suppressed. Finally, the detection result is obtained by the double threshold segmenting and the target region selection method. The results show the proposed method is excellent for maritime targets detection with high clutters.

  10. Statistical monitoring of the hand, foot and mouth disease in China.

    PubMed

    Zhang, Jingnan; Kang, Yicheng; Yang, Yang; Qiu, Peihua

    2015-09-01

    In a period starting around 2007, the Hand, Foot, and Mouth Disease (HFMD) became wide-spreading in China, and the Chinese public health was seriously threatened. To prevent the outbreak of infectious diseases like HFMD, effective disease surveillance systems would be especially helpful to give signals of disease outbreaks as early as possible. Statistical process control (SPC) charts provide a major statistical tool in industrial quality control for detecting product defectives in a timely manner. In recent years, SPC charts have been used for disease surveillance. However, disease surveillance data often have much more complicated structures, compared to the data collected from industrial production lines. Major challenges, including lack of in-control data, complex seasonal effects, and spatio-temporal correlations, make the surveillance data difficult to handle. In this article, we propose a three-step procedure for analyzing disease surveillance data, and our procedure is demonstrated using the HFMD data collected during 2008-2009 in China. Our method uses nonparametric longitudinal data and time series analysis methods to eliminate the possible impact of seasonality and temporal correlation before the disease incidence data are sequentially monitored by a SPC chart. At both national and provincial levels, our proposed method can effectively detect the increasing trend of disease incidence rate before the disease becomes wide-spreading. © 2015, The International Biometric Society.

  11. Statistical mechanical estimation of the free energy of formation of E. coli biomass for use with macroscopic bioreactor balances.

    PubMed

    Grosz, R; Stephanopoulos, G

    1983-09-01

    The need for the determination of the free energy of formation of biomass in bioreactor second law balances is well established. A statistical mechanical method for the calculation of the free energy of formation of E. coli biomass is introduced. In this method, biomass is modelled to consist of a system of biopolymer networks. The partition function of this system is proposed to consist of acoustic and optical modes of vibration. Acoustic modes are described by Tarasov's model, the parameters of which are evaluated with the aid of low-temperature calorimetric data for the crystalline protein bovine chymotrypsinogen A. The optical modes are described by considering the low-temperature thermodynamic properties of biological monomer crystals such as amino acid crystals. Upper and lower bounds are placed on the entropy to establish the maximum error associated with the statistical method. The upper bound is determined by endowing the monomers in biomass with ideal gas properties. The lower bound is obtained by limiting the monomers to complete immobility. On this basis, the free energy of formation is fixed to within 10%. Proposals are made with regard to experimental verification of the calculated value and extension of the calculation to other types of biomass.

  12. Kriging with Unknown Variance Components for Regional Ionospheric Reconstruction.

    PubMed

    Huang, Ling; Zhang, Hongping; Xu, Peiliang; Geng, Jianghui; Wang, Cheng; Liu, Jingnan

    2017-02-27

    Ionospheric delay effect is a critical issue that limits the accuracy of precise Global Navigation Satellite System (GNSS) positioning and navigation for single-frequency users, especially in mid- and low-latitude regions where variations in the ionosphere are larger. Kriging spatial interpolation techniques have been recently introduced to model the spatial correlation and variability of ionosphere, which intrinsically assume that the ionosphere field is stochastically stationary but does not take the random observational errors into account. In this paper, by treating the spatial statistical information on ionosphere as prior knowledge and based on Total Electron Content (TEC) semivariogram analysis, we use Kriging techniques to spatially interpolate TEC values. By assuming that the stochastic models of both the ionospheric signals and measurement errors are only known up to some unknown factors, we propose a new Kriging spatial interpolation method with unknown variance components for both the signals of ionosphere and TEC measurements. Variance component estimation has been integrated with Kriging to reconstruct regional ionospheric delays. The method has been applied to data from the Crustal Movement Observation Network of China (CMONOC) and compared with the ordinary Kriging and polynomial interpolations with spherical cap harmonic functions, polynomial functions and low-degree spherical harmonic functions. The statistics of results indicate that the daily ionospheric variations during the experimental period characterized by the proposed approach have good agreement with the other methods, ranging from 10 to 80 TEC Unit (TECU, 1 TECU = 1 × 10 16 electrons/m²) with an overall mean of 28.2 TECU. The proposed method can produce more appropriate estimations whose general TEC level is as smooth as the ordinary Kriging but with a smaller standard deviation around 3 TECU than others. The residual results show that the interpolation precision of the new proposed method is better than the ordinary Kriging and polynomial interpolation by about 1.2 TECU and 0.7 TECU, respectively. The root mean squared error of the proposed new Kriging with variance components is within 1.5 TECU and is smaller than those from other methods under comparison by about 1 TECU. When compared with ionospheric grid points, the mean squared error of the proposed method is within 6 TECU and smaller than Kriging, indicating that the proposed method can produce more accurate ionospheric delays and better estimation accuracy over China regional area.

  13. Kriging with Unknown Variance Components for Regional Ionospheric Reconstruction

    PubMed Central

    Huang, Ling; Zhang, Hongping; Xu, Peiliang; Geng, Jianghui; Wang, Cheng; Liu, Jingnan

    2017-01-01

    Ionospheric delay effect is a critical issue that limits the accuracy of precise Global Navigation Satellite System (GNSS) positioning and navigation for single-frequency users, especially in mid- and low-latitude regions where variations in the ionosphere are larger. Kriging spatial interpolation techniques have been recently introduced to model the spatial correlation and variability of ionosphere, which intrinsically assume that the ionosphere field is stochastically stationary but does not take the random observational errors into account. In this paper, by treating the spatial statistical information on ionosphere as prior knowledge and based on Total Electron Content (TEC) semivariogram analysis, we use Kriging techniques to spatially interpolate TEC values. By assuming that the stochastic models of both the ionospheric signals and measurement errors are only known up to some unknown factors, we propose a new Kriging spatial interpolation method with unknown variance components for both the signals of ionosphere and TEC measurements. Variance component estimation has been integrated with Kriging to reconstruct regional ionospheric delays. The method has been applied to data from the Crustal Movement Observation Network of China (CMONOC) and compared with the ordinary Kriging and polynomial interpolations with spherical cap harmonic functions, polynomial functions and low-degree spherical harmonic functions. The statistics of results indicate that the daily ionospheric variations during the experimental period characterized by the proposed approach have good agreement with the other methods, ranging from 10 to 80 TEC Unit (TECU, 1 TECU = 1 × 1016 electrons/m2) with an overall mean of 28.2 TECU. The proposed method can produce more appropriate estimations whose general TEC level is as smooth as the ordinary Kriging but with a smaller standard deviation around 3 TECU than others. The residual results show that the interpolation precision of the new proposed method is better than the ordinary Kriging and polynomial interpolation by about 1.2 TECU and 0.7 TECU, respectively. The root mean squared error of the proposed new Kriging with variance components is within 1.5 TECU and is smaller than those from other methods under comparison by about 1 TECU. When compared with ionospheric grid points, the mean squared error of the proposed method is within 6 TECU and smaller than Kriging, indicating that the proposed method can produce more accurate ionospheric delays and better estimation accuracy over China regional area. PMID:28264424

  14. Pilot points method for conditioning multiple-point statistical facies simulation on flow data

    NASA Astrophysics Data System (ADS)

    Ma, Wei; Jafarpour, Behnam

    2018-05-01

    We propose a new pilot points method for conditioning discrete multiple-point statistical (MPS) facies simulation on dynamic flow data. While conditioning MPS simulation on static hard data is straightforward, their calibration against nonlinear flow data is nontrivial. The proposed method generates conditional models from a conceptual model of geologic connectivity, known as a training image (TI), by strategically placing and estimating pilot points. To place pilot points, a score map is generated based on three sources of information: (i) the uncertainty in facies distribution, (ii) the model response sensitivity information, and (iii) the observed flow data. Once the pilot points are placed, the facies values at these points are inferred from production data and then are used, along with available hard data at well locations, to simulate a new set of conditional facies realizations. While facies estimation at the pilot points can be performed using different inversion algorithms, in this study the ensemble smoother (ES) is adopted to update permeability maps from production data, which are then used to statistically infer facies types at the pilot point locations. The developed method combines the information in the flow data and the TI by using the former to infer facies values at selected locations away from the wells and the latter to ensure consistent facies structure and connectivity where away from measurement locations. Several numerical experiments are used to evaluate the performance of the developed method and to discuss its important properties.

  15. Spatiotemporal Interpolation for Environmental Modelling

    PubMed Central

    Susanto, Ferry; de Souza, Paulo; He, Jing

    2016-01-01

    A variation of the reduction-based approach to spatiotemporal interpolation (STI), in which time is treated independently from the spatial dimensions, is proposed in this paper. We reviewed and compared three widely-used spatial interpolation techniques: ordinary kriging, inverse distance weighting and the triangular irregular network. We also proposed a new distribution-based distance weighting (DDW) spatial interpolation method. In this study, we utilised one year of Tasmania’s South Esk Hydrology model developed by CSIRO. Root mean squared error statistical methods were performed for performance evaluations. Our results show that the proposed reduction approach is superior to the extension approach to STI. However, the proposed DDW provides little benefit compared to the conventional inverse distance weighting (IDW) method. We suggest that the improved IDW technique, with the reduction approach used for the temporal dimension, is the optimal combination for large-scale spatiotemporal interpolation within environmental modelling applications. PMID:27509497

  16. REANALYSIS OF F-STATISTIC GRAVITATIONAL-WAVE SEARCHES WITH THE HIGHER CRITICISM STATISTIC

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bennett, M. F.; Melatos, A.; Delaigle, A.

    2013-04-01

    We propose a new method of gravitational-wave detection using a modified form of higher criticism, a statistical technique introduced by Donoho and Jin. Higher criticism is designed to detect a group of sparse, weak sources, none of which are strong enough to be reliably estimated or detected individually. We apply higher criticism as a second-pass method to synthetic F-statistic and C-statistic data for a monochromatic periodic source in a binary system and quantify the improvement relative to the first-pass methods. We find that higher criticism on C-statistic data is more sensitive by {approx}6% than the C-statistic alone under optimal conditionsmore » (i.e., binary orbit known exactly) and the relative advantage increases as the error in the orbital parameters increases. Higher criticism is robust even when the source is not monochromatic (e.g., phase-wandering in an accreting system). Applying higher criticism to a phase-wandering source over multiple time intervals gives a {approx}> 30% increase in detectability with few assumptions about the frequency evolution. By contrast, in all-sky searches for unknown periodic sources, which are dominated by the brightest source, second-pass higher criticism does not provide any benefits over a first-pass search.« less

  17. A robust bayesian estimate of the concordance correlation coefficient.

    PubMed

    Feng, Dai; Baumgartner, Richard; Svetnik, Vladimir

    2015-01-01

    A need for assessment of agreement arises in many situations including statistical biomarker qualification or assay or method validation. Concordance correlation coefficient (CCC) is one of the most popular scaled indices reported in evaluation of agreement. Robust methods for CCC estimation currently present an important statistical challenge. Here, we propose a novel Bayesian method of robust estimation of CCC based on multivariate Student's t-distribution and compare it with its alternatives. Furthermore, we extend the method to practically relevant settings, enabling incorporation of confounding covariates and replications. The superiority of the new approach is demonstrated using simulation as well as real datasets from biomarker application in electroencephalography (EEG). This biomarker is relevant in neuroscience for development of treatments for insomnia.

  18. A study of speech emotion recognition based on hybrid algorithm

    NASA Astrophysics Data System (ADS)

    Zhu, Ju-xia; Zhang, Chao; Lv, Zhao; Rao, Yao-quan; Wu, Xiao-pei

    2011-10-01

    To effectively improve the recognition accuracy of the speech emotion recognition system, a hybrid algorithm which combines Continuous Hidden Markov Model (CHMM), All-Class-in-One Neural Network (ACON) and Support Vector Machine (SVM) is proposed. In SVM and ACON methods, some global statistics are used as emotional features, while in CHMM method, instantaneous features are employed. The recognition rate by the proposed method is 92.25%, with the rejection rate to be 0.78%. Furthermore, it obtains the relative increasing of 8.53%, 4.69% and 0.78% compared with ACON, CHMM and SVM methods respectively. The experiment result confirms the efficiency of distinguishing anger, happiness, neutral and sadness emotional states.

  19. Modeling Sensor Reliability in Fault Diagnosis Based on Evidence Theory

    PubMed Central

    Yuan, Kaijuan; Xiao, Fuyuan; Fei, Liguo; Kang, Bingyi; Deng, Yong

    2016-01-01

    Sensor data fusion plays an important role in fault diagnosis. Dempster–Shafer (D-R) evidence theory is widely used in fault diagnosis, since it is efficient to combine evidence from different sensors. However, under the situation where the evidence highly conflicts, it may obtain a counterintuitive result. To address the issue, a new method is proposed in this paper. Not only the statistic sensor reliability, but also the dynamic sensor reliability are taken into consideration. The evidence distance function and the belief entropy are combined to obtain the dynamic reliability of each sensor report. A weighted averaging method is adopted to modify the conflict evidence by assigning different weights to evidence according to sensor reliability. The proposed method has better performance in conflict management and fault diagnosis due to the fact that the information volume of each sensor report is taken into consideration. An application in fault diagnosis based on sensor fusion is illustrated to show the efficiency of the proposed method. The results show that the proposed method improves the accuracy of fault diagnosis from 81.19% to 89.48% compared to the existing methods. PMID:26797611

  20. Investigating Measurement Invariance in Computer-Based Personality Testing: The Impact of Using Anchor Items on Effect Size Indices

    ERIC Educational Resources Information Center

    Egberink, Iris J. L.; Meijer, Rob R.; Tendeiro, Jorge N.

    2015-01-01

    A popular method to assess measurement invariance of a particular item is based on likelihood ratio tests with all other items as anchor items. The results of this method are often only reported in terms of statistical significance, and researchers proposed different methods to empirically select anchor items. It is unclear, however, how many…

  1. Predicting individual brain functional connectivity using a Bayesian hierarchical model.

    PubMed

    Dai, Tian; Guo, Ying

    2017-02-15

    Network-oriented analysis of functional magnetic resonance imaging (fMRI), especially resting-state fMRI, has revealed important association between abnormal connectivity and brain disorders such as schizophrenia, major depression and Alzheimer's disease. Imaging-based brain connectivity measures have become a useful tool for investigating the pathophysiology, progression and treatment response of psychiatric disorders and neurodegenerative diseases. Recent studies have started to explore the possibility of using functional neuroimaging to help predict disease progression and guide treatment selection for individual patients. These studies provide the impetus to develop statistical methodology that would help provide predictive information on disease progression-related or treatment-related changes in neural connectivity. To this end, we propose a prediction method based on Bayesian hierarchical model that uses individual's baseline fMRI scans, coupled with relevant subject characteristics, to predict the individual's future functional connectivity. A key advantage of the proposed method is that it can improve the accuracy of individualized prediction of connectivity by combining information from both group-level connectivity patterns that are common to subjects with similar characteristics as well as individual-level connectivity features that are particular to the specific subject. Furthermore, our method also offers statistical inference tools such as predictive intervals that help quantify the uncertainty or variability of the predicted outcomes. The proposed prediction method could be a useful approach to predict the changes in individual patient's brain connectivity with the progression of a disease. It can also be used to predict a patient's post-treatment brain connectivity after a specified treatment regimen. Another utility of the proposed method is that it can be applied to test-retest imaging data to develop a more reliable estimator for individual functional connectivity. We show there exists a nice connection between our proposed estimator and a recently developed shrinkage estimator of connectivity measures in the neuroimaging community. We develop an expectation-maximization (EM) algorithm for estimation of the proposed Bayesian hierarchical model. Simulations studies are performed to evaluate the accuracy of our proposed prediction methods. We illustrate the application of the methods with two data examples: the longitudinal resting-state fMRI from ADNI2 study and the test-retest fMRI data from Kirby21 study. In both the simulation studies and the fMRI data applications, we demonstrate that the proposed methods provide more accurate prediction and more reliable estimation of individual functional connectivity as compared with alternative methods. Copyright © 2017 Elsevier Inc. All rights reserved.

  2. Using knowledge for indexing health web resources in a quality-controlled gateway.

    PubMed

    Joubert, Michel; Darmoni, Stefan J; Avillach, Paul; Dahamna, Badisse; Fieschi, Marius

    2008-01-01

    The aim of this study is to provide to indexers MeSH terms to be considered as major ones in a list of terms automatically extracted from a document. We propose a method combining symbolic knowledge - the UMLS Metathesaurus and Semantic Network - and statistical knowledge drawn from co-occurrences of terms in the CISMeF database (a French-language quality-controlled health gateway) using data mining measures. The method was tested on CISMeF corpus of 293 resources. There was a proportion of 0.37+/-0.26 major terms in the processed records. The method produced lists of terms with a proportion of terms initially pointed out as major of 0.54+/-0.31. The method we propose reduces the number of terms, which seem not useful for content description of resources, such as "check tags", but retains the most descriptive ones. Discarding these terms is accounted for by: 1) the removal by using semantic knowledge of associations of concepts bearing no real medical significance, 2) the removal by using statistical knowledge of nonstatistically significant associations of terms. This method can assist effectively indexers in their daily work and will be soon applied in the CISMeF system.

  3. Low-dose X-ray computed tomography image reconstruction with a combined low-mAs and sparse-view protocol.

    PubMed

    Gao, Yang; Bian, Zhaoying; Huang, Jing; Zhang, Yunwan; Niu, Shanzhou; Feng, Qianjin; Chen, Wufan; Liang, Zhengrong; Ma, Jianhua

    2014-06-16

    To realize low-dose imaging in X-ray computed tomography (CT) examination, lowering milliampere-seconds (low-mAs) or reducing the required number of projection views (sparse-view) per rotation around the body has been widely studied as an easy and effective approach. In this study, we are focusing on low-dose CT image reconstruction from the sinograms acquired with a combined low-mAs and sparse-view protocol and propose a two-step image reconstruction strategy. Specifically, to suppress significant statistical noise in the noisy and insufficient sinograms, an adaptive sinogram restoration (ASR) method is first proposed with consideration of the statistical property of sinogram data, and then to further acquire a high-quality image, a total variation based projection onto convex sets (TV-POCS) method is adopted with a slight modification. For simplicity, the present reconstruction strategy was termed as "ASR-TV-POCS." To evaluate the present ASR-TV-POCS method, both qualitative and quantitative studies were performed on a physical phantom. Experimental results have demonstrated that the present ASR-TV-POCS method can achieve promising gains over other existing methods in terms of the noise reduction, contrast-to-noise ratio, and edge detail preservation.

  4. iTTVis: Interactive Visualization of Table Tennis Data.

    PubMed

    Wu, Yingcai; Lan, Ji; Shu, Xinhuan; Ji, Chenyang; Zhao, Kejian; Wang, Jiachen; Zhang, Hui

    2018-01-01

    The rapid development of information technology paved the way for the recording of fine-grained data, such as stroke techniques and stroke placements, during a table tennis match. This data recording creates opportunities to analyze and evaluate matches from new perspectives. Nevertheless, the increasingly complex data poses a significant challenge to make sense of and gain insights into. Analysts usually employ tedious and cumbersome methods which are limited to watching videos and reading statistical tables. However, existing sports visualization methods cannot be applied to visualizing table tennis competitions due to different competition rules and particular data attributes. In this work, we collaborate with data analysts to understand and characterize the sophisticated domain problem of analysis of table tennis data. We propose iTTVis, a novel interactive table tennis visualization system, which to our knowledge, is the first visual analysis system for analyzing and exploring table tennis data. iTTVis provides a holistic visualization of an entire match from three main perspectives, namely, time-oriented, statistical, and tactical analyses. The proposed system with several well-coordinated views not only supports correlation identification through statistics and pattern detection of tactics with a score timeline but also allows cross analysis to gain insights. Data analysts have obtained several new insights by using iTTVis. The effectiveness and usability of the proposed system are demonstrated with four case studies.

  5. A segmentation editing framework based on shape change statistics

    NASA Astrophysics Data System (ADS)

    Mostapha, Mahmoud; Vicory, Jared; Styner, Martin; Pizer, Stephen

    2017-02-01

    Segmentation is a key task in medical image analysis because its accuracy significantly affects successive steps. Automatic segmentation methods often produce inadequate segmentations, which require the user to manually edit the produced segmentation slice by slice. Because editing is time-consuming, an editing tool that enables the user to produce accurate segmentations by only drawing a sparse set of contours would be needed. This paper describes such a framework as applied to a single object. Constrained by the additional information enabled by the manually segmented contours, the proposed framework utilizes object shape statistics to transform the failed automatic segmentation to a more accurate version. Instead of modeling the object shape, the proposed framework utilizes shape change statistics that were generated to capture the object deformation from the failed automatic segmentation to its corresponding correct segmentation. An optimization procedure was used to minimize an energy function that consists of two terms, an external contour match term and an internal shape change regularity term. The high accuracy of the proposed segmentation editing approach was confirmed by testing it on a simulated data set based on 10 in-vivo infant magnetic resonance brain data sets using four similarity metrics. Segmentation results indicated that our method can provide efficient and adequately accurate segmentations (Dice segmentation accuracy increase of 10%), with very sparse contours (only 10%), which is promising in greatly decreasing the work expected from the user.

  6. A Statistical Method for Synthesizing Mediation Analyses Using the Product of Coefficient Approach Across Multiple Trials

    PubMed Central

    Huang, Shi; MacKinnon, David P.; Perrino, Tatiana; Gallo, Carlos; Cruden, Gracelyn; Brown, C Hendricks

    2016-01-01

    Mediation analysis often requires larger sample sizes than main effect analysis to achieve the same statistical power. Combining results across similar trials may be the only practical option for increasing statistical power for mediation analysis in some situations. In this paper, we propose a method to estimate: 1) marginal means for mediation path a, the relation of the independent variable to the mediator; 2) marginal means for path b, the relation of the mediator to the outcome, across multiple trials; and 3) the between-trial level variance-covariance matrix based on a bivariate normal distribution. We present the statistical theory and an R computer program to combine regression coefficients from multiple trials to estimate a combined mediated effect and confidence interval under a random effects model. Values of coefficients a and b, along with their standard errors from each trial are the input for the method. This marginal likelihood based approach with Monte Carlo confidence intervals provides more accurate inference than the standard meta-analytic approach. We discuss computational issues, apply the method to two real-data examples and make recommendations for the use of the method in different settings. PMID:28239330

  7. Scene-based nonuniformity correction with reduced ghosting using a gated LMS algorithm.

    PubMed

    Hardie, Russell C; Baxley, Frank; Brys, Brandon; Hytla, Patrick

    2009-08-17

    In this paper, we present a scene-based nouniformity correction (NUC) method using a modified adaptive least mean square (LMS) algorithm with a novel gating operation on the updates. The gating is designed to significantly reduce ghosting artifacts produced by many scene-based NUC algorithms by halting updates when temporal variation is lacking. We define the algorithm and present a number of experimental results to demonstrate the efficacy of the proposed method in comparison to several previously published methods including other LMS and constant statistics based methods. The experimental results include simulated imagery and a real infrared image sequence. We show that the proposed method significantly reduces ghosting artifacts, but has a slightly longer convergence time. (c) 2009 Optical Society of America

  8. Spatial cluster detection for repeatedly measured outcomes while accounting for residential history.

    PubMed

    Cook, Andrea J; Gold, Diane R; Li, Yi

    2009-10-01

    Spatial cluster detection has become an important methodology in quantifying the effect of hazardous exposures. Previous methods have focused on cross-sectional outcomes that are binary or continuous. There are virtually no spatial cluster detection methods proposed for longitudinal outcomes. This paper proposes a new spatial cluster detection method for repeated outcomes using cumulative geographic residuals. A major advantage of this method is its ability to readily incorporate information on study participants relocation, which most cluster detection statistics cannot. Application of these methods will be illustrated by the Home Allergens and Asthma prospective cohort study analyzing the relationship between environmental exposures and repeated measured outcome, occurrence of wheeze in the last 6 months, while taking into account mobile locations.

  9. Analysis and modeling of wafer-level process variability in 28 nm FD-SOI using split C-V measurements

    NASA Astrophysics Data System (ADS)

    Pradeep, Krishna; Poiroux, Thierry; Scheer, Patrick; Juge, André; Gouget, Gilles; Ghibaudo, Gérard

    2018-07-01

    This work details the analysis of wafer level global process variability in 28 nm FD-SOI using split C-V measurements. The proposed approach initially evaluates the native on wafer process variability using efficient extraction methods on split C-V measurements. The on-wafer threshold voltage (VT) variability is first studied and modeled using a simple analytical model. Then, a statistical model based on the Leti-UTSOI compact model is proposed to describe the total C-V variability in different bias conditions. This statistical model is finally used to study the contribution of each process parameter to the total C-V variability.

  10. Non-parametric correlative uncertainty quantification and sensitivity analysis: Application to a Langmuir bimolecular adsorption model

    NASA Astrophysics Data System (ADS)

    Feng, Jinchao; Lansford, Joshua; Mironenko, Alexander; Pourkargar, Davood Babaei; Vlachos, Dionisios G.; Katsoulakis, Markos A.

    2018-03-01

    We propose non-parametric methods for both local and global sensitivity analysis of chemical reaction models with correlated parameter dependencies. The developed mathematical and statistical tools are applied to a benchmark Langmuir competitive adsorption model on a close packed platinum surface, whose parameters, estimated from quantum-scale computations, are correlated and are limited in size (small data). The proposed mathematical methodology employs gradient-based methods to compute sensitivity indices. We observe that ranking influential parameters depends critically on whether or not correlations between parameters are taken into account. The impact of uncertainty in the correlation and the necessity of the proposed non-parametric perspective are demonstrated.

  11. Surface microbial contamination in hospitals: A pilot study on methods of sampling and the use of proposed microbiologic standards.

    PubMed

    Claro, Tânia; O'Reilly, Marese; Daniels, Stephen; Humphreys, Hilary

    2015-09-01

    Contamination of hospital surfaces by bacteria is increasingly recognized. We assessed commonly touched surfaces using contact plates and Petrifilms (3M, St. Paul, MN) and compared the results against proposed microbiology standards. Toilet door handles were the most heavily contaminated (7.97 ± 0.68 colony forming units [CFU]/cm(2)) and exceeded proposed standards on 74% of occasions. Petrifilms detected statistically higher CFU from bedside lockers. Further research is required on the use of standards and methods of sampling. Copyright © 2015 Association for Professionals in Infection Control and Epidemiology, Inc. Published by Elsevier Inc. All rights reserved.

  12. A Wavelet-based Fast Discrimination of Transformer Magnetizing Inrush Current

    NASA Astrophysics Data System (ADS)

    Kitayama, Masashi

    Recently customers who need electricity of higher quality have been installing co-generation facilities. They can avoid voltage sags and other distribution system related disturbances by supplying electricity to important load from their generators. For another example, FRIENDS, highly reliable distribution system using semiconductor switches or storage devices based on power electronics technology, is proposed. These examples illustrates that the request for high reliability in distribution system is increasing. In order to realize these systems, fast relaying algorithms are indispensable. The author proposes a new method of detecting magnetizing inrush current using discrete wavelet transform (DWT). DWT provides the function of detecting discontinuity of current waveform. Inrush current occurs when transformer core becomes saturated. The proposed method detects spikes of DWT components derived from the discontinuity of the current waveform at both the beginning and the end of inrush current. Wavelet thresholding, one of the wavelet-based statistical modeling, was applied to detect the DWT component spikes. The proposed method is verified using experimental data using single-phase transformer and the proposed method is proved to be effective.

  13. A Nonlinear Framework of Delayed Particle Smoothing Method for Vehicle Localization under Non-Gaussian Environment.

    PubMed

    Xiao, Zhu; Havyarimana, Vincent; Li, Tong; Wang, Dong

    2016-05-13

    In this paper, a novel nonlinear framework of smoothing method, non-Gaussian delayed particle smoother (nGDPS), is proposed, which enables vehicle state estimation (VSE) with high accuracy taking into account the non-Gaussianity of the measurement and process noises. Within the proposed method, the multivariate Student's t-distribution is adopted in order to compute the probability distribution function (PDF) related to the process and measurement noises, which are assumed to be non-Gaussian distributed. A computation approach based on Ensemble Kalman Filter (EnKF) is designed to cope with the mean and the covariance matrix of the proposal non-Gaussian distribution. A delayed Gibbs sampling algorithm, which incorporates smoothing of the sampled trajectories over a fixed-delay, is proposed to deal with the sample degeneracy of particles. The performance is investigated based on the real-world data, which is collected by low-cost on-board vehicle sensors. The comparison study based on the real-world experiments and the statistical analysis demonstrates that the proposed nGDPS has significant improvement on the vehicle state accuracy and outperforms the existing filtering and smoothing methods.

  14. TRAPR: R Package for Statistical Analysis and Visualization of RNA-Seq Data.

    PubMed

    Lim, Jae Hyun; Lee, Soo Youn; Kim, Ju Han

    2017-03-01

    High-throughput transcriptome sequencing, also known as RNA sequencing (RNA-Seq), is a standard technology for measuring gene expression with unprecedented accuracy. Numerous bioconductor packages have been developed for the statistical analysis of RNA-Seq data. However, these tools focus on specific aspects of the data analysis pipeline, and are difficult to appropriately integrate with one another due to their disparate data structures and processing methods. They also lack visualization methods to confirm the integrity of the data and the process. In this paper, we propose an R-based RNA-Seq analysis pipeline called TRAPR, an integrated tool that facilitates the statistical analysis and visualization of RNA-Seq expression data. TRAPR provides various functions for data management, the filtering of low-quality data, normalization, transformation, statistical analysis, data visualization, and result visualization that allow researchers to build customized analysis pipelines.

  15. Applying a statistical PTB detection procedure to complement the gold standard.

    PubMed

    Noor, Norliza Mohd; Yunus, Ashari; Bakar, S A R Abu; Hussin, Amran; Rijal, Omar Mohd

    2011-04-01

    This paper investigates a novel statistical discrimination procedure to detect PTB when the gold standard requirement is taken into consideration. Archived data were used to establish two groups of patients which are the control and test group. The control group was used to develop the statistical discrimination procedure using four vectors of wavelet coefficients as feature vectors for the detection of pulmonary tuberculosis (PTB), lung cancer (LC), and normal lung (NL). This discrimination procedure was investigated using the test group where the number of sputum positive and sputum negative cases that were correctly classified as PTB cases were noted. The proposed statistical discrimination method is able to detect PTB patients and LC with high true positive fraction. The method is also able to detect PTB patients that are sputum negative and therefore may be used as a complement to the gold standard. Copyright © 2010 Elsevier Ltd. All rights reserved.

  16. Different Manhattan project: automatic statistical model generation

    NASA Astrophysics Data System (ADS)

    Yap, Chee Keng; Biermann, Henning; Hertzmann, Aaron; Li, Chen; Meyer, Jon; Pao, Hsing-Kuo; Paxia, Salvatore

    2002-03-01

    We address the automatic generation of large geometric models. This is important in visualization for several reasons. First, many applications need access to large but interesting data models. Second, we often need such data sets with particular characteristics (e.g., urban models, park and recreation landscape). Thus we need the ability to generate models with different parameters. We propose a new approach for generating such models. It is based on a top-down propagation of statistical parameters. We illustrate the method in the generation of a statistical model of Manhattan. But the method is generally applicable in the generation of models of large geographical regions. Our work is related to the literature on generating complex natural scenes (smoke, forests, etc) based on procedural descriptions. The difference in our approach stems from three characteristics: modeling with statistical parameters, integration of ground truth (actual map data), and a library-based approach for texture mapping.

  17. Spike sorting using locality preserving projection with gap statistics and landmark-based spectral clustering.

    PubMed

    Nguyen, Thanh; Khosravi, Abbas; Creighton, Douglas; Nahavandi, Saeid

    2014-12-30

    Understanding neural functions requires knowledge from analysing electrophysiological data. The process of assigning spikes of a multichannel signal into clusters, called spike sorting, is one of the important problems in such analysis. There have been various automated spike sorting techniques with both advantages and disadvantages regarding accuracy and computational costs. Therefore, developing spike sorting methods that are highly accurate and computationally inexpensive is always a challenge in the biomedical engineering practice. An automatic unsupervised spike sorting method is proposed in this paper. The method uses features extracted by the locality preserving projection (LPP) algorithm. These features afterwards serve as inputs for the landmark-based spectral clustering (LSC) method. Gap statistics (GS) is employed to evaluate the number of clusters before the LSC can be performed. The proposed LPP-LSC is highly accurate and computationally inexpensive spike sorting approach. LPP spike features are very discriminative; thereby boost the performance of clustering methods. Furthermore, the LSC method exhibits its efficiency when integrated with the cluster evaluator GS. The proposed method's accuracy is approximately 13% superior to that of the benchmark combination between wavelet transformation and superparamagnetic clustering (WT-SPC). Additionally, LPP-LSC computing time is six times less than that of the WT-SPC. LPP-LSC obviously demonstrates a win-win spike sorting solution meeting both accuracy and computational cost criteria. LPP and LSC are linear algorithms that help reduce computational burden and thus their combination can be applied into real-time spike analysis. Copyright © 2014 Elsevier B.V. All rights reserved.

  18. A novel approach for choosing summary statistics in approximate Bayesian computation.

    PubMed

    Aeschbacher, Simon; Beaumont, Mark A; Futschik, Andreas

    2012-11-01

    The choice of summary statistics is a crucial step in approximate Bayesian computation (ABC). Since statistics are often not sufficient, this choice involves a trade-off between loss of information and reduction of dimensionality. The latter may increase the efficiency of ABC. Here, we propose an approach for choosing summary statistics based on boosting, a technique from the machine-learning literature. We consider different types of boosting and compare them to partial least-squares regression as an alternative. To mitigate the lack of sufficiency, we also propose an approach for choosing summary statistics locally, in the putative neighborhood of the true parameter value. We study a demographic model motivated by the reintroduction of Alpine ibex (Capra ibex) into the Swiss Alps. The parameters of interest are the mean and standard deviation across microsatellites of the scaled ancestral mutation rate (θ(anc) = 4N(e)u) and the proportion of males obtaining access to matings per breeding season (ω). By simulation, we assess the properties of the posterior distribution obtained with the various methods. According to our criteria, ABC with summary statistics chosen locally via boosting with the L(2)-loss performs best. Applying that method to the ibex data, we estimate θ(anc)≈ 1.288 and find that most of the variation across loci of the ancestral mutation rate u is between 7.7 × 10(-4) and 3.5 × 10(-3) per locus per generation. The proportion of males with access to matings is estimated as ω≈ 0.21, which is in good agreement with recent independent estimates.

  19. A Novel Approach for Choosing Summary Statistics in Approximate Bayesian Computation

    PubMed Central

    Aeschbacher, Simon; Beaumont, Mark A.; Futschik, Andreas

    2012-01-01

    The choice of summary statistics is a crucial step in approximate Bayesian computation (ABC). Since statistics are often not sufficient, this choice involves a trade-off between loss of information and reduction of dimensionality. The latter may increase the efficiency of ABC. Here, we propose an approach for choosing summary statistics based on boosting, a technique from the machine-learning literature. We consider different types of boosting and compare them to partial least-squares regression as an alternative. To mitigate the lack of sufficiency, we also propose an approach for choosing summary statistics locally, in the putative neighborhood of the true parameter value. We study a demographic model motivated by the reintroduction of Alpine ibex (Capra ibex) into the Swiss Alps. The parameters of interest are the mean and standard deviation across microsatellites of the scaled ancestral mutation rate (θanc = 4Neu) and the proportion of males obtaining access to matings per breeding season (ω). By simulation, we assess the properties of the posterior distribution obtained with the various methods. According to our criteria, ABC with summary statistics chosen locally via boosting with the L2-loss performs best. Applying that method to the ibex data, we estimate θ^anc≈1.288 and find that most of the variation across loci of the ancestral mutation rate u is between 7.7 × 10−4 and 3.5 × 10−3 per locus per generation. The proportion of males with access to matings is estimated as ω^≈0.21, which is in good agreement with recent independent estimates. PMID:22960215

  20. Robust identification of transcriptional regulatory networks using a Gibbs sampler on outlier sum statistic.

    PubMed

    Gu, Jinghua; Xuan, Jianhua; Riggins, Rebecca B; Chen, Li; Wang, Yue; Clarke, Robert

    2012-08-01

    Identification of transcriptional regulatory networks (TRNs) is of significant importance in computational biology for cancer research, providing a critical building block to unravel disease pathways. However, existing methods for TRN identification suffer from the inclusion of excessive 'noise' in microarray data and false-positives in binding data, especially when applied to human tumor-derived cell line studies. More robust methods that can counteract the imperfection of data sources are therefore needed for reliable identification of TRNs in this context. In this article, we propose to establish a link between the quality of one target gene to represent its regulator and the uncertainty of its expression to represent other target genes. Specifically, an outlier sum statistic was used to measure the aggregated evidence for regulation events between target genes and their corresponding transcription factors. A Gibbs sampling method was then developed to estimate the marginal distribution of the outlier sum statistic, hence, to uncover underlying regulatory relationships. To evaluate the effectiveness of our proposed method, we compared its performance with that of an existing sampling-based method using both simulation data and yeast cell cycle data. The experimental results show that our method consistently outperforms the competing method in different settings of signal-to-noise ratio and network topology, indicating its robustness for biological applications. Finally, we applied our method to breast cancer cell line data and demonstrated its ability to extract biologically meaningful regulatory modules related to estrogen signaling and action in breast cancer. The Gibbs sampler MATLAB package is freely available at http://www.cbil.ece.vt.edu/software.htm. xuan@vt.edu Supplementary data are available at Bioinformatics online.

  1. Robust identification of transcriptional regulatory networks using a Gibbs sampler on outlier sum statistic

    PubMed Central

    Gu, Jinghua; Xuan, Jianhua; Riggins, Rebecca B.; Chen, Li; Wang, Yue; Clarke, Robert

    2012-01-01

    Motivation: Identification of transcriptional regulatory networks (TRNs) is of significant importance in computational biology for cancer research, providing a critical building block to unravel disease pathways. However, existing methods for TRN identification suffer from the inclusion of excessive ‘noise’ in microarray data and false-positives in binding data, especially when applied to human tumor-derived cell line studies. More robust methods that can counteract the imperfection of data sources are therefore needed for reliable identification of TRNs in this context. Results: In this article, we propose to establish a link between the quality of one target gene to represent its regulator and the uncertainty of its expression to represent other target genes. Specifically, an outlier sum statistic was used to measure the aggregated evidence for regulation events between target genes and their corresponding transcription factors. A Gibbs sampling method was then developed to estimate the marginal distribution of the outlier sum statistic, hence, to uncover underlying regulatory relationships. To evaluate the effectiveness of our proposed method, we compared its performance with that of an existing sampling-based method using both simulation data and yeast cell cycle data. The experimental results show that our method consistently outperforms the competing method in different settings of signal-to-noise ratio and network topology, indicating its robustness for biological applications. Finally, we applied our method to breast cancer cell line data and demonstrated its ability to extract biologically meaningful regulatory modules related to estrogen signaling and action in breast cancer. Availability and implementation: The Gibbs sampler MATLAB package is freely available at http://www.cbil.ece.vt.edu/software.htm. Contact: xuan@vt.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:22595208

  2. Perturbation Selection and Local Influence Analysis for Nonlinear Structural Equation Model

    ERIC Educational Resources Information Center

    Chen, Fei; Zhu, Hong-Tu; Lee, Sik-Yum

    2009-01-01

    Local influence analysis is an important statistical method for studying the sensitivity of a proposed model to model inputs. One of its important issues is related to the appropriate choice of a perturbation vector. In this paper, we develop a general method to select an appropriate perturbation vector and a second-order local influence measure…

  3. Decomposing biodiversity data using the Latent Dirichlet Allocation model, a probabilistic multivariate statistical method

    Treesearch

    Denis Valle; Benjamin Baiser; Christopher W. Woodall; Robin Chazdon; Jerome Chave

    2014-01-01

    We propose a novel multivariate method to analyse biodiversity data based on the Latent Dirichlet Allocation (LDA) model. LDA, a probabilistic model, reduces assemblages to sets of distinct component communities. It produces easily interpretable results, can represent abrupt and gradual changes in composition, accommodates missing data and allows for coherent estimates...

  4. Performance of a proposed determinative method for p-TSA in rainbow trout fillet tissue and bridging the proposed method with a method for total chloramine-T residues in rainbow trout fillet tissue

    USGS Publications Warehouse

    Meinertz, J.R.; Stehly, G.R.; Gingerich, W.H.; Greseth, Shari L.

    2001-01-01

    Chloramine-T is an effective drug for controlling fish mortality caused by bacterial gill disease. As part of the data required for approval of chloramine-T use in aquaculture, depletion of the chloramine-T marker residue (para-toluenesulfonamide; p-TSA) from edible fillet tissue of fish must be characterized. Declaration of p-TSA as the marker residue for chloramine-T in rainbow trout was based on total residue depletion studies using a method that used time consuming and cumbersome techniques. A simple and robust method recently developed is being proposed as a determinative method for p-TSA in fish fillet tissue. The proposed determinative method was evaluated by comparing accuracy and precision data with U.S. Food and Drug Administration criteria and by bridging the method to the former method for chloramine-T residues. The method accuracy and precision fulfilled the criteria for determinative methods; accuracy was 92.6, 93.4, and 94.6% with samples fortified at 0.5X, 1X, and 2X the expected 1000 ng/g tolerance limit for p-TSA, respectively. Method precision with tissue containing incurred p-TSA at a nominal concentration of 1000 ng/g ranged from 0.80 to 8.4%. The proposed determinative method was successfully bridged with the former method. The concentrations of p-TSA developed with the proposed method were not statistically different at p < 0.05 from p-TSA concentrations developed with the former method.

  5. Sparse approximation of currents for statistics on curves and surfaces.

    PubMed

    Durrleman, Stanley; Pennec, Xavier; Trouvé, Alain; Ayache, Nicholas

    2008-01-01

    Computing, processing, visualizing statistics on shapes like curves or surfaces is a real challenge with many applications ranging from medical image analysis to computational geometry. Modelling such geometrical primitives with currents avoids feature-based approach as well as point-correspondence method. This framework has been proved to be powerful to register brain surfaces or to measure geometrical invariants. However, if the state-of-the-art methods perform efficiently pairwise registrations, new numerical schemes are required to process groupwise statistics due to an increasing complexity when the size of the database is growing. Statistics such as mean and principal modes of a set of shapes often have a heavy and highly redundant representation. We propose therefore to find an adapted basis on which mean and principal modes have a sparse decomposition. Besides the computational improvement, this sparse representation offers a way to visualize and interpret statistics on currents. Experiments show the relevance of the approach on 34 sets of 70 sulcal lines and on 50 sets of 10 meshes of deep brain structures.

  6. Significance levels for studies with correlated test statistics.

    PubMed

    Shi, Jianxin; Levinson, Douglas F; Whittemore, Alice S

    2008-07-01

    When testing large numbers of null hypotheses, one needs to assess the evidence against the global null hypothesis that none of the hypotheses is false. Such evidence typically is based on the test statistic of the largest magnitude, whose statistical significance is evaluated by permuting the sample units to simulate its null distribution. Efron (2007) has noted that correlation among the test statistics can induce substantial interstudy variation in the shapes of their histograms, which may cause misleading tail counts. Here, we show that permutation-based estimates of the overall significance level also can be misleading when the test statistics are correlated. We propose that such estimates be conditioned on a simple measure of the spread of the observed histogram, and we provide a method for obtaining conditional significance levels. We justify this conditioning using the conditionality principle described by Cox and Hinkley (1974). Application of the method to gene expression data illustrates the circumstances when conditional significance levels are needed.

  7. Volatility measurement with directional change in Chinese stock market: Statistical property and investment strategy

    NASA Astrophysics Data System (ADS)

    Ma, Junjun; Xiong, Xiong; He, Feng; Zhang, Wei

    2017-04-01

    The stock price fluctuation is studied in this paper with intrinsic time perspective. The event, directional change (DC) or overshoot, are considered as time scale of price time series. With this directional change law, its corresponding statistical properties and parameter estimation is tested in Chinese stock market. Furthermore, a directional change trading strategy is proposed for invest in the market portfolio in Chinese stock market, and both in-sample and out-of-sample performance are compared among the different method of model parameter estimation. We conclude that DC method can capture important fluctuations in Chinese stock market and gain profit due to the statistical property that average upturn overshoot size is bigger than average downturn directional change size. The optimal parameter of DC method is not fixed and we obtained 1.8% annual excess return with this DC-based trading strategy.

  8. Optimal allocation of testing resources for statistical simulations

    NASA Astrophysics Data System (ADS)

    Quintana, Carolina; Millwater, Harry R.; Singh, Gulshan; Golden, Patrick

    2015-07-01

    Statistical estimates from simulation involve uncertainty caused by the variability in the input random variables due to limited data. Allocating resources to obtain more experimental data of the input variables to better characterize their probability distributions can reduce the variance of statistical estimates. The methodology proposed determines the optimal number of additional experiments required to minimize the variance of the output moments given single or multiple constraints. The method uses multivariate t-distribution and Wishart distribution to generate realizations of the population mean and covariance of the input variables, respectively, given an amount of available data. This method handles independent and correlated random variables. A particle swarm method is used for the optimization. The optimal number of additional experiments per variable depends on the number and variance of the initial data, the influence of the variable in the output function and the cost of each additional experiment. The methodology is demonstrated using a fretting fatigue example.

  9. Improved dynamical scaling analysis using the kernel method for nonequilibrium relaxation.

    PubMed

    Echinaka, Yuki; Ozeki, Yukiyasu

    2016-10-01

    The dynamical scaling analysis for the Kosterlitz-Thouless transition in the nonequilibrium relaxation method is improved by the use of Bayesian statistics and the kernel method. This allows data to be fitted to a scaling function without using any parametric model function, which makes the results more reliable and reproducible and enables automatic and faster parameter estimation. Applying this method, the bootstrap method is introduced and a numerical discrimination for the transition type is proposed.

  10. Cost-Effectiveness Analysis: a proposal of new reporting standards in statistical analysis

    PubMed Central

    Bang, Heejung; Zhao, Hongwei

    2014-01-01

    Cost-effectiveness analysis (CEA) is a method for evaluating the outcomes and costs of competing strategies designed to improve health, and has been applied to a variety of different scientific fields. Yet, there are inherent complexities in cost estimation and CEA from statistical perspectives (e.g., skewness, bi-dimensionality, and censoring). The incremental cost-effectiveness ratio that represents the additional cost per one unit of outcome gained by a new strategy has served as the most widely accepted methodology in the CEA. In this article, we call for expanded perspectives and reporting standards reflecting a more comprehensive analysis that can elucidate different aspects of available data. Specifically, we propose that mean and median-based incremental cost-effectiveness ratios and average cost-effectiveness ratios be reported together, along with relevant summary and inferential statistics as complementary measures for informed decision making. PMID:24605979

  11. Pointwise probability reinforcements for robust statistical inference.

    PubMed

    Frénay, Benoît; Verleysen, Michel

    2014-02-01

    Statistical inference using machine learning techniques may be difficult with small datasets because of abnormally frequent data (AFDs). AFDs are observations that are much more frequent in the training sample that they should be, with respect to their theoretical probability, and include e.g. outliers. Estimates of parameters tend to be biased towards models which support such data. This paper proposes to introduce pointwise probability reinforcements (PPRs): the probability of each observation is reinforced by a PPR and a regularisation allows controlling the amount of reinforcement which compensates for AFDs. The proposed solution is very generic, since it can be used to robustify any statistical inference method which can be formulated as a likelihood maximisation. Experiments show that PPRs can be easily used to tackle regression, classification and projection: models are freed from the influence of outliers. Moreover, outliers can be filtered manually since an abnormality degree is obtained for each observation. Copyright © 2013 Elsevier Ltd. All rights reserved.

  12. A PLSPM-Based Test Statistic for Detecting Gene-Gene Co-Association in Genome-Wide Association Study with Case-Control Design

    PubMed Central

    Zhang, Xiaoshuai; Yang, Xiaowei; Yuan, Zhongshang; Liu, Yanxun; Li, Fangyu; Peng, Bin; Zhu, Dianwen; Zhao, Jinghua; Xue, Fuzhong

    2013-01-01

    For genome-wide association data analysis, two genes in any pathway, two SNPs in the two linked gene regions respectively or in the two linked exons respectively within one gene are often correlated with each other. We therefore proposed the concept of gene-gene co-association, which refers to the effects not only due to the traditional interaction under nearly independent condition but the correlation between two genes. Furthermore, we constructed a novel statistic for detecting gene-gene co-association based on Partial Least Squares Path Modeling (PLSPM). Through simulation, the relationship between traditional interaction and co-association was highlighted under three different types of co-association. Both simulation and real data analysis demonstrated that the proposed PLSPM-based statistic has better performance than single SNP-based logistic model, PCA-based logistic model, and other gene-based methods. PMID:23620809

  13. A PLSPM-based test statistic for detecting gene-gene co-association in genome-wide association study with case-control design.

    PubMed

    Zhang, Xiaoshuai; Yang, Xiaowei; Yuan, Zhongshang; Liu, Yanxun; Li, Fangyu; Peng, Bin; Zhu, Dianwen; Zhao, Jinghua; Xue, Fuzhong

    2013-01-01

    For genome-wide association data analysis, two genes in any pathway, two SNPs in the two linked gene regions respectively or in the two linked exons respectively within one gene are often correlated with each other. We therefore proposed the concept of gene-gene co-association, which refers to the effects not only due to the traditional interaction under nearly independent condition but the correlation between two genes. Furthermore, we constructed a novel statistic for detecting gene-gene co-association based on Partial Least Squares Path Modeling (PLSPM). Through simulation, the relationship between traditional interaction and co-association was highlighted under three different types of co-association. Both simulation and real data analysis demonstrated that the proposed PLSPM-based statistic has better performance than single SNP-based logistic model, PCA-based logistic model, and other gene-based methods.

  14. Proposed hybrid-classifier ensemble algorithm to map snow cover area

    NASA Astrophysics Data System (ADS)

    Nijhawan, Rahul; Raman, Balasubramanian; Das, Josodhir

    2018-01-01

    Metaclassification ensemble approach is known to improve the prediction performance of snow-covered area. The methodology adopted in this case is based on neural network along with four state-of-art machine learning algorithms: support vector machine, artificial neural networks, spectral angle mapper, K-mean clustering, and a snow index: normalized difference snow index. An AdaBoost ensemble algorithm related to decision tree for snow-cover mapping is also proposed. According to available literature, these methods have been rarely used for snow-cover mapping. Employing the above techniques, a study was conducted for Raktavarn and Chaturangi Bamak glaciers, Uttarakhand, Himalaya using multispectral Landsat 7 ETM+ (enhanced thematic mapper) image. The study also compares the results with those obtained from statistical combination methods (majority rule and belief functions) and accuracies of individual classifiers. Accuracy assessment is performed by computing the quantity and allocation disagreement, analyzing statistic measures (accuracy, precision, specificity, AUC, and sensitivity) and receiver operating characteristic curves. A total of 225 combinations of parameters for individual classifiers were trained and tested on the dataset and results were compared with the proposed approach. It was observed that the proposed methodology produced the highest classification accuracy (95.21%), close to (94.01%) that was produced by the proposed AdaBoost ensemble algorithm. From the sets of observations, it was concluded that the ensemble of classifiers produced better results compared to individual classifiers.

  15. High-order fuzzy time-series based on multi-period adaptation model for forecasting stock markets

    NASA Astrophysics Data System (ADS)

    Chen, Tai-Liang; Cheng, Ching-Hsue; Teoh, Hia-Jong

    2008-02-01

    Stock investors usually make their short-term investment decisions according to recent stock information such as the late market news, technical analysis reports, and price fluctuations. To reflect these short-term factors which impact stock price, this paper proposes a comprehensive fuzzy time-series, which factors linear relationships between recent periods of stock prices and fuzzy logical relationships (nonlinear relationships) mined from time-series into forecasting processes. In empirical analysis, the TAIEX (Taiwan Stock Exchange Capitalization Weighted Stock Index) and HSI (Heng Seng Index) are employed as experimental datasets, and four recent fuzzy time-series models, Chen’s (1996), Yu’s (2005), Cheng’s (2006) and Chen’s (2007), are used as comparison models. Besides, to compare with conventional statistic method, the method of least squares is utilized to estimate the auto-regressive models of the testing periods within the databases. From analysis results, the performance comparisons indicate that the multi-period adaptation model, proposed in this paper, can effectively improve the forecasting performance of conventional fuzzy time-series models which only factor fuzzy logical relationships in forecasting processes. From the empirical study, the traditional statistic method and the proposed model both reveal that stock price patterns in the Taiwan stock and Hong Kong stock markets are short-term.

  16. Accelerating simulation for the multiple-point statistics algorithm using vector quantization

    NASA Astrophysics Data System (ADS)

    Zuo, Chen; Pan, Zhibin; Liang, Hao

    2018-03-01

    Multiple-point statistics (MPS) is a prominent algorithm to simulate categorical variables based on a sequential simulation procedure. Assuming training images (TIs) as prior conceptual models, MPS extracts patterns from TIs using a template and records their occurrences in a database. However, complex patterns increase the size of the database and require considerable time to retrieve the desired elements. In order to speed up simulation and improve simulation quality over state-of-the-art MPS methods, we propose an accelerating simulation for MPS using vector quantization (VQ), called VQ-MPS. First, a variable representation is presented to make categorical variables applicable for vector quantization. Second, we adopt a tree-structured VQ to compress the database so that stationary simulations are realized. Finally, a transformed template and classified VQ are used to address nonstationarity. A two-dimensional (2D) stationary channelized reservoir image is used to validate the proposed VQ-MPS. In comparison with several existing MPS programs, our method exhibits significantly better performance in terms of computational time, pattern reproductions, and spatial uncertainty. Further demonstrations consist of a 2D four facies simulation, two 2D nonstationary channel simulations, and a three-dimensional (3D) rock simulation. The results reveal that our proposed method is also capable of solving multifacies, nonstationarity, and 3D simulations based on 2D TIs.

  17. Analyzing depression tendency of web posts using an event-driven depression tendency warning model.

    PubMed

    Tung, Chiaming; Lu, Wenhsiang

    2016-01-01

    The Internet has become a platform to express individual moods/feelings of daily life, where authors share their thoughts in web blogs, micro-blogs, forums, bulletin board systems or other media. In this work, we investigate text-mining technology to analyze and predict the depression tendency of web posts. In this paper, we defined depression factors, which include negative events, negative emotions, symptoms, and negative thoughts from web posts. We proposed an enhanced event extraction (E3) method to automatically extract negative event terms. In addition, we also proposed an event-driven depression tendency warning (EDDTW) model to predict the depression tendency of web bloggers or post authors by analyzing their posted articles. We compare the performance among the proposed EDDTW model, negative emotion evaluation (NEE) model, and the diagnostic and statistical manual of mental disorders-based depression tendency evaluation method. The EDDTW model obtains the best recall rate and F-measure at 0.668 and 0.624, respectively, while the diagnostic and statistical manual of mental disorders-based method achieves the best precision rate of 0.666. The main reason is that our enhanced event extraction method can increase recall rate by enlarging the negative event lexicon at the expense of precision. Our EDDTW model can also be used to track the change or trend of depression tendency for each post author. The depression tendency trend can help doctors to diagnose and even track depression of web post authors more efficiently. This paper presents an E3 method to automatically extract negative event terms in web posts. We also proposed a new EDDTW model to predict the depression tendency of web posts and possibly help bloggers or post authors to early detect major depressive disorder. Copyright © 2015 Elsevier B.V. All rights reserved.

  18. Feature Screening for Ultrahigh Dimensional Categorical Data with Applications.

    PubMed

    Huang, Danyang; Li, Runze; Wang, Hansheng

    2014-01-01

    Ultrahigh dimensional data with both categorical responses and categorical covariates are frequently encountered in the analysis of big data, for which feature screening has become an indispensable statistical tool. We propose a Pearson chi-square based feature screening procedure for categorical response with ultrahigh dimensional categorical covariates. The proposed procedure can be directly applied for detection of important interaction effects. We further show that the proposed procedure possesses screening consistency property in the terminology of Fan and Lv (2008). We investigate the finite sample performance of the proposed procedure by Monte Carlo simulation studies, and illustrate the proposed method by two empirical datasets.

  19. A Bayesian sequential design with adaptive randomization for 2-sided hypothesis test.

    PubMed

    Yu, Qingzhao; Zhu, Lin; Zhu, Han

    2017-11-01

    Bayesian sequential and adaptive randomization designs are gaining popularity in clinical trials thanks to their potentials to reduce the number of required participants and save resources. We propose a Bayesian sequential design with adaptive randomization rates so as to more efficiently attribute newly recruited patients to different treatment arms. In this paper, we consider 2-arm clinical trials. Patients are allocated to the 2 arms with a randomization rate to achieve minimum variance for the test statistic. Algorithms are presented to calculate the optimal randomization rate, critical values, and power for the proposed design. Sensitivity analysis is implemented to check the influence on design by changing the prior distributions. Simulation studies are applied to compare the proposed method and traditional methods in terms of power and actual sample sizes. Simulations show that, when total sample size is fixed, the proposed design can obtain greater power and/or cost smaller actual sample size than the traditional Bayesian sequential design. Finally, we apply the proposed method to a real data set and compare the results with the Bayesian sequential design without adaptive randomization in terms of sample sizes. The proposed method can further reduce required sample size. Copyright © 2017 John Wiley & Sons, Ltd.

  20. A Method to Analyze How Various Parts of Clouds Influence Each Other's Brightness

    NASA Technical Reports Server (NTRS)

    Varnai, Tamas; Marshak, Alexander; Lau, William (Technical Monitor)

    2001-01-01

    This paper proposes a method for obtaining new information on 3D radiative effects that arise from horizontal radiative interactions in heterogeneous clouds. Unlike current radiative transfer models, it can not only calculate how 3D effects change radiative quantities at any given point, but can also determine which areas contribute to these 3D effects, to what degree, and through what mechanisms. After describing the proposed method, the paper illustrates its new capabilities both for detailed case studies and for the statistical processing of large datasets. Because the proposed method makes it possible, for the first time, to link a particular change in cloud properties to the resulting 3D effect, in future studies it can be used to develop new radiative transfer parameterizations that would consider 3D effects in practical applications currently limited to 1D theory-such as remote sensing of cloud properties and dynamical cloud modeling.

  1. Hydrometeor classification through statistical clustering of polarimetric radar measurements: a semi-supervised approach

    NASA Astrophysics Data System (ADS)

    Besic, Nikola; Ventura, Jordi Figueras i.; Grazioli, Jacopo; Gabella, Marco; Germann, Urs; Berne, Alexis

    2016-09-01

    Polarimetric radar-based hydrometeor classification is the procedure of identifying different types of hydrometeors by exploiting polarimetric radar observations. The main drawback of the existing supervised classification methods, mostly based on fuzzy logic, is a significant dependency on a presumed electromagnetic behaviour of different hydrometeor types. Namely, the results of the classification largely rely upon the quality of scattering simulations. When it comes to the unsupervised approach, it lacks the constraints related to the hydrometeor microphysics. The idea of the proposed method is to compensate for these drawbacks by combining the two approaches in a way that microphysical hypotheses can, to a degree, adjust the content of the classes obtained statistically from the observations. This is done by means of an iterative approach, performed offline, which, in a statistical framework, examines clustered representative polarimetric observations by comparing them to the presumed polarimetric properties of each hydrometeor class. Aside from comparing, a routine alters the content of clusters by encouraging further statistical clustering in case of non-identification. By merging all identified clusters, the multi-dimensional polarimetric signatures of various hydrometeor types are obtained for each of the studied representative datasets, i.e. for each radar system of interest. These are depicted by sets of centroids which are then employed in operational labelling of different hydrometeors. The method has been applied on three C-band datasets, each acquired by different operational radar from the MeteoSwiss Rad4Alp network, as well as on two X-band datasets acquired by two research mobile radars. The results are discussed through a comparative analysis which includes a corresponding supervised and unsupervised approach, emphasising the operational potential of the proposed method.

  2. Object-based change detection method using refined Markov random field

    NASA Astrophysics Data System (ADS)

    Peng, Daifeng; Zhang, Yongjun

    2017-01-01

    In order to fully consider the local spatial constraints between neighboring objects in object-based change detection (OBCD), an OBCD approach is presented by introducing a refined Markov random field (MRF). First, two periods of images are stacked and segmented to produce image objects. Second, object spectral and textual histogram features are extracted and G-statistic is implemented to measure the distance among different histogram distributions. Meanwhile, object heterogeneity is calculated by combining spectral and textual histogram distance using adaptive weight. Third, an expectation-maximization algorithm is applied for determining the change category of each object and the initial change map is then generated. Finally, a refined change map is produced by employing the proposed refined object-based MRF method. Three experiments were conducted and compared with some state-of-the-art unsupervised OBCD methods to evaluate the effectiveness of the proposed method. Experimental results demonstrate that the proposed method obtains the highest accuracy among the methods used in this paper, which confirms its validness and effectiveness in OBCD.

  3. An asymptotic theory for cross-correlation between auto-correlated sequences and its application on neuroimaging data.

    PubMed

    Zhou, Yunyi; Tao, Chenyang; Lu, Wenlian; Feng, Jianfeng

    2018-04-20

    Functional connectivity is among the most important tools to study brain. The correlation coefficient, between time series of different brain areas, is the most popular method to quantify functional connectivity. Correlation coefficient in practical use assumes the data to be temporally independent. However, the time series data of brain can manifest significant temporal auto-correlation. A widely applicable method is proposed for correcting temporal auto-correlation. We considered two types of time series models: (1) auto-regressive-moving-average model, (2) nonlinear dynamical system model with noisy fluctuations, and derived their respective asymptotic distributions of correlation coefficient. These two types of models are most commonly used in neuroscience studies. We show the respective asymptotic distributions share a unified expression. We have verified the validity of our method, and shown our method exhibited sufficient statistical power for detecting true correlation on numerical experiments. Employing our method on real dataset yields more robust functional network and higher classification accuracy than conventional methods. Our method robustly controls the type I error while maintaining sufficient statistical power for detecting true correlation in numerical experiments, where existing methods measuring association (linear and nonlinear) fail. In this work, we proposed a widely applicable approach for correcting the effect of temporal auto-correlation on functional connectivity. Empirical results favor the use of our method in functional network analysis. Copyright © 2018. Published by Elsevier B.V.

  4. TU-AB-202-11: Tumor Segmentation by Fusion of Multi-Tracer PET Images Using Copula Based Statistical Methods

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lapuyade-Lahorgue, J; Ruan, S; Li, H

    Purpose: Multi-tracer PET imaging is getting more attention in radiotherapy by providing additional tumor volume information such as glucose and oxygenation. However, automatic PET-based tumor segmentation is still a very challenging problem. We propose a statistical fusion approach to joint segment the sub-area of tumors from the two tracers FDG and FMISO PET images. Methods: Non-standardized Gamma distributions are convenient to model intensity distributions in PET. As a serious correlation exists in multi-tracer PET images, we proposed a new fusion method based on copula which is capable to represent dependency between different tracers. The Hidden Markov Field (HMF) model ismore » used to represent spatial relationship between PET image voxels and statistical dynamics of intensities for each modality. Real PET images of five patients with FDG and FMISO are used to evaluate quantitatively and qualitatively our method. A comparison between individual and multi-tracer segmentations was conducted to show advantages of the proposed fusion method. Results: The segmentation results show that fusion with Gaussian copula can receive high Dice coefficient of 0.84 compared to that of 0.54 and 0.3 of monomodal segmentation results based on individual segmentation of FDG and FMISO PET images. In addition, high correlation coefficients (0.75 to 0.91) for the Gaussian copula for all five testing patients indicates the dependency between tumor regions in the multi-tracer PET images. Conclusion: This study shows that using multi-tracer PET imaging can efficiently improve the segmentation of tumor region where hypoxia and glucidic consumption are present at the same time. Introduction of copulas for modeling the dependency between two tracers can simultaneously take into account information from both tracers and deal with two pathological phenomena. Future work will be to consider other families of copula such as spherical and archimedian copulas, and to eliminate partial volume effect by considering dependency between neighboring voxels.« less

  5. Microscopic saw mark analysis: an empirical approach.

    PubMed

    Love, Jennifer C; Derrick, Sharon M; Wiersema, Jason M; Peters, Charles

    2015-01-01

    Microscopic saw mark analysis is a well published and generally accepted qualitative analytical method. However, little research has focused on identifying and mitigating potential sources of error associated with the method. The presented study proposes the use of classification trees and random forest classifiers as an optimal, statistically sound approach to mitigate the potential for error of variability and outcome error in microscopic saw mark analysis. The statistical model was applied to 58 experimental saw marks created with four types of saws. The saw marks were made in fresh human femurs obtained through anatomical gift and were analyzed using a Keyence digital microscope. The statistical approach weighed the variables based on discriminatory value and produced decision trees with an associated outcome error rate of 8.62-17.82%. © 2014 American Academy of Forensic Sciences.

  6. Applications of quantum entropy to statistics

    NASA Astrophysics Data System (ADS)

    Silver, R. N.; Martz, H. F.

    This paper develops two generalizations of the maximum entropy (ME) principle. First, Shannon classical entropy is replaced by von Neumann quantum entropy to yield a broader class of information divergences (or penalty functions) for statistics applications. Negative relative quantum entropy enforces convexity, positivity, non-local extensivity and prior correlations such as smoothness. This enables the extension of ME methods from their traditional domain of ill-posed in-verse problems to new applications such as non-parametric density estimation. Second, given a choice of information divergence, a combination of ME and Bayes rule is used to assign both prior and posterior probabilities. Hyperparameters are interpreted as Lagrange multipliers enforcing constraints. Conservation principles are proposed to act statistical regularization and other hyperparameters, such as conservation of information and smoothness. ME provides an alternative to hierarchical Bayes methods.

  7. A practical approach for the scale-up of roller compaction process.

    PubMed

    Shi, Weixian; Sprockel, Omar L

    2016-09-01

    An alternative approach for the scale-up of ribbon formation during roller compaction was investigated, which required only one batch at the commercial scale to set the operational conditions. The scale-up of ribbon formation was based on a probability method. It was sufficient in describing the mechanism of ribbon formation at both scales. In this method, a statistical relationship between roller compaction parameters and ribbon attributes (thickness and density) was first defined with DoE using a pilot Alexanderwerk WP120 roller compactor. While the milling speed was included in the design, it has no practical effect on granule properties within the study range despite its statistical significance. The statistical relationship was then adapted to a commercial Alexanderwerk WP200 roller compactor with one experimental run. The experimental run served as a calibration of the statistical model parameters. The proposed transfer method was then confirmed by conducting a mapping study on the Alexanderwerk WP200 using a factorial DoE, which showed a match between the predictions and the verification experiments. The study demonstrates the applicability of the roller compaction transfer method using the statistical model from the development scale calibrated with one experiment point at the commercial scale. Copyright © 2016 Elsevier B.V. All rights reserved.

  8. Scene-based nonuniformity correction for airborne point target detection systems.

    PubMed

    Zhou, Dabiao; Wang, Dejiang; Huo, Lijun; Liu, Rang; Jia, Ping

    2017-06-26

    Images acquired by airborne infrared search and track (IRST) systems are often characterized by nonuniform noise. In this paper, a scene-based nonuniformity correction method for infrared focal-plane arrays (FPAs) is proposed based on the constant statistics of the received radiation ratios of adjacent pixels. The gain of each pixel is computed recursively based on the ratios between adjacent pixels, which are estimated through a median operation. Then, an elaborate mathematical model describing the error propagation, derived from random noise and the recursive calculation procedure, is established. The proposed method maintains the characteristics of traditional methods in calibrating the whole electro-optics chain, in compensating for temporal drifts, and in not preserving the radiometric accuracy of the system. Moreover, the proposed method is robust since the frame number is the only variant, and is suitable for real-time applications owing to its low computational complexity and simplicity of implementation. The experimental results, on different scenes from a proof-of-concept point target detection system with a long-wave Sofradir FPA, demonstrate the compelling performance of the proposed method.

  9. Link Prediction in Evolving Networks Based on Popularity of Nodes.

    PubMed

    Wang, Tong; He, Xing-Sheng; Zhou, Ming-Yang; Fu, Zhong-Qian

    2017-08-02

    Link prediction aims to uncover the underlying relationship behind networks, which could be utilized to predict missing edges or identify the spurious edges. The key issue of link prediction is to estimate the likelihood of potential links in networks. Most classical static-structure based methods ignore the temporal aspects of networks, limited by the time-varying features, such approaches perform poorly in evolving networks. In this paper, we propose a hypothesis that the ability of each node to attract links depends not only on its structural importance, but also on its current popularity (activeness), since active nodes have much more probability to attract future links. Then a novel approach named popularity based structural perturbation method (PBSPM) and its fast algorithm are proposed to characterize the likelihood of an edge from both existing connectivity structure and current popularity of its two endpoints. Experiments on six evolving networks show that the proposed methods outperform state-of-the-art methods in accuracy and robustness. Besides, visual results and statistical analysis reveal that the proposed methods are inclined to predict future edges between active nodes, rather than edges between inactive nodes.

  10. A novel method of estimation of lipophilicity using distance-based topological indices: dominating role of equalized electronegativity.

    PubMed

    Agrawal, Vijay K; Gupta, Madhu; Singh, Jyoti; Khadikar, Padmakar V

    2005-03-15

    Attempt is made to propose yet another method of estimating lipophilicity of a heterogeneous set of 223 compounds. The method is based on the use of equalized electronegativity along with topological indices. It was observed that excellent results are obtained in multiparametric regression upon introduction of indicator parameters. The results are discussed critically on the basis various statistical parameters.

  11. A methodology to design heuristics for model selection based on the characteristics of data: Application to investigate when the Negative Binomial Lindley (NB-L) is preferred over the Negative Binomial (NB).

    PubMed

    Shirazi, Mohammadali; Dhavala, Soma Sekhar; Lord, Dominique; Geedipally, Srinivas Reddy

    2017-10-01

    Safety analysts usually use post-modeling methods, such as the Goodness-of-Fit statistics or the Likelihood Ratio Test, to decide between two or more competitive distributions or models. Such metrics require all competitive distributions to be fitted to the data before any comparisons can be accomplished. Given the continuous growth in introducing new statistical distributions, choosing the best one using such post-modeling methods is not a trivial task, in addition to all theoretical or numerical issues the analyst may face during the analysis. Furthermore, and most importantly, these measures or tests do not provide any intuitions into why a specific distribution (or model) is preferred over another (Goodness-of-Logic). This paper ponders into these issues by proposing a methodology to design heuristics for Model Selection based on the characteristics of data, in terms of descriptive summary statistics, before fitting the models. The proposed methodology employs two analytic tools: (1) Monte-Carlo Simulations and (2) Machine Learning Classifiers, to design easy heuristics to predict the label of the 'most-likely-true' distribution for analyzing data. The proposed methodology was applied to investigate when the recently introduced Negative Binomial Lindley (NB-L) distribution is preferred over the Negative Binomial (NB) distribution. Heuristics were designed to select the 'most-likely-true' distribution between these two distributions, given a set of prescribed summary statistics of data. The proposed heuristics were successfully compared against classical tests for several real or observed datasets. Not only they are easy to use and do not need any post-modeling inputs, but also, using these heuristics, the analyst can attain useful information about why the NB-L is preferred over the NB - or vice versa- when modeling data. Copyright © 2017 Elsevier Ltd. All rights reserved.

  12. Efficient discovery of risk patterns in medical data.

    PubMed

    Li, Jiuyong; Fu, Ada Wai-chee; Fahey, Paul

    2009-01-01

    This paper studies a problem of efficiently discovering risk patterns in medical data. Risk patterns are defined by a statistical metric, relative risk, which has been widely used in epidemiological research. To avoid fruitless search in the complete exploration of risk patterns, we define optimal risk pattern set to exclude superfluous patterns, i.e. complicated patterns with lower relative risk than their corresponding simpler form patterns. We prove that mining optimal risk pattern sets conforms an anti-monotone property that supports an efficient mining algorithm. We propose an efficient algorithm for mining optimal risk pattern sets based on this property. We also propose a hierarchical structure to present discovered patterns for the easy perusal by domain experts. The proposed approach is compared with two well-known rule discovery methods, decision tree and association rule mining approaches on benchmark data sets and applied to a real world application. The proposed method discovers more and better quality risk patterns than a decision tree approach. The decision tree method is not designed for such applications and is inadequate for pattern exploring. The proposed method does not discover a large number of uninteresting superfluous patterns as an association mining approach does. The proposed method is more efficient than an association rule mining method. A real world case study shows that the method reveals some interesting risk patterns to medical practitioners. The proposed method is an efficient approach to explore risk patterns. It quickly identifies cohorts of patients that are vulnerable to a risk outcome from a large data set. The proposed method is useful for exploratory study on large medical data to generate and refine hypotheses. The method is also useful for designing medical surveillance systems.

  13. Measurement of turbulent spatial structure and kinetic energy spectrum by exact temporal-to-spatial mapping

    NASA Astrophysics Data System (ADS)

    Buchhave, Preben; Velte, Clara M.

    2017-08-01

    We present a method for converting a time record of turbulent velocity measured at a point in a flow to a spatial velocity record consisting of consecutive convection elements. The spatial record allows computation of dynamic statistical moments such as turbulent kinetic wavenumber spectra and spatial structure functions in a way that completely bypasses the need for Taylor's hypothesis. The spatial statistics agree with the classical counterparts, such as the total kinetic energy spectrum, at least for spatial extents up to the Taylor microscale. The requirements for applying the method are access to the instantaneous velocity magnitude, in addition to the desired flow quantity, and a high temporal resolution in comparison to the relevant time scales of the flow. We map, without distortion and bias, notoriously difficult developing turbulent high intensity flows using three main aspects that distinguish these measurements from previous work in the field: (1) The measurements are conducted using laser Doppler anemometry and are therefore not contaminated by directional ambiguity (in contrast to, e.g., frequently employed hot-wire anemometers); (2) the measurement data are extracted using a correctly and transparently functioning processor and are analysed using methods derived from first principles to provide unbiased estimates of the velocity statistics; (3) the exact mapping proposed herein has been applied to the high turbulence intensity flows investigated to avoid the significant distortions caused by Taylor's hypothesis. The method is first confirmed to produce the correct statistics using computer simulations and later applied to measurements in some of the most difficult regions of a round turbulent jet—the non-equilibrium developing region and the outermost parts of the developed jet. The proposed mapping is successfully validated using corresponding directly measured spatial statistics in the fully developed jet, even in the difficult outer regions of the jet where the average convection velocity is negligible and turbulence intensities increase dramatically. The measurements in the developing region reveal interesting features of an incomplete Richardson-Kolmogorov cascade under development.

  14. A more powerful test based on ratio distribution for retention noninferiority hypothesis.

    PubMed

    Deng, Ling; Chen, Gang

    2013-03-11

    Rothmann et al. ( 2003 ) proposed a method for the statistical inference of fraction retention noninferiority (NI) hypothesis. A fraction retention hypothesis is defined as a ratio of the new treatment effect verse the control effect in the context of a time to event endpoint. One of the major concerns using this method in the design of an NI trial is that with a limited sample size, the power of the study is usually very low. This makes an NI trial not applicable particularly when using time to event endpoint. To improve power, Wang et al. ( 2006 ) proposed a ratio test based on asymptotic normality theory. Under a strong assumption (equal variance of the NI test statistic under null and alternative hypotheses), the sample size using Wang's test was much smaller than that using Rothmann's test. However, in practice, the assumption of equal variance is generally questionable for an NI trial design. This assumption is removed in the ratio test proposed in this article, which is derived directly from a Cauchy-like ratio distribution. In addition, using this method, the fundamental assumption used in Rothmann's test, that the observed control effect is always positive, that is, the observed hazard ratio for placebo over the control is greater than 1, is no longer necessary. Without assuming equal variance under null and alternative hypotheses, the sample size required for an NI trial can be significantly reduced if using the proposed ratio test for a fraction retention NI hypothesis.

  15. Statistical physics inspired energy-efficient coded-modulation for optical communications.

    PubMed

    Djordjevic, Ivan B; Xu, Lei; Wang, Ting

    2012-04-15

    Because Shannon's entropy can be obtained by Stirling's approximation of thermodynamics entropy, the statistical physics energy minimization methods are directly applicable to the signal constellation design. We demonstrate that statistical physics inspired energy-efficient (EE) signal constellation designs, in combination with large-girth low-density parity-check (LDPC) codes, significantly outperform conventional LDPC-coded polarization-division multiplexed quadrature amplitude modulation schemes. We also describe an EE signal constellation design algorithm. Finally, we propose the discrete-time implementation of D-dimensional transceiver and corresponding EE polarization-division multiplexed system. © 2012 Optical Society of America

  16. [The problem of small "n" and big "P" in neuropsycho-pharmacology, or how to keep the rate of false discoveries under control].

    PubMed

    Petschner, Péter; Bagdy, György; Tóthfalusi, Laszló

    2015-03-01

    One of the characteristics of many methods used in neuropsychopharmacology is that a large number of parameters (P) are measured in relatively few subjects (n). Functional magnetic resonance imaging, electroencephalography (EEG) and genomic studies are typical examples. For example one microarray chip can contain thousands of probes. Therefore, in studies using microarray chips, P may be several thousand-fold larger than n. Statistical analysis of such studies is a challenging task and they are refereed to in the statistical literature such as the small "n" big "P" problem. The problem has many facets including the controversies associated with multiple hypothesis testing. A typical scenario in this context is, when two or more groups are compared by the individual attributes. If the increased classification error due to the multiple testing is neglected, then several highly significant differences will be discovered. But in reality, some of these significant differences are coincidental, not reproducible findings. Several methods were proposed to solve this problem. In this review we discuss two of the proposed solutions, algorithms to compare sets and statistical hypothesis tests controlling the false discovery rate.

  17. The Research of Feature Extraction Method of Liver Pathological Image Based on Multispatial Mapping and Statistical Properties

    PubMed Central

    Liu, Huiling; Xia, Bingbing; Yi, Dehui

    2016-01-01

    We propose a new feature extraction method of liver pathological image based on multispatial mapping and statistical properties. For liver pathological images of Hematein Eosin staining, the image of R and B channels can reflect the sensitivity of liver pathological images better, while the entropy space and Local Binary Pattern (LBP) space can reflect the texture features of the image better. To obtain the more comprehensive information, we map liver pathological images to the entropy space, LBP space, R space, and B space. The traditional Higher Order Local Autocorrelation Coefficients (HLAC) cannot reflect the overall information of the image, so we propose an average correction HLAC feature. We calculate the statistical properties and the average gray value of pathological images and then update the current pixel value as the absolute value of the difference between the current pixel gray value and the average gray value, which can be more sensitive to the gray value changes of pathological images. Lastly the HLAC template is used to calculate the features of the updated image. The experiment results show that the improved features of the multispatial mapping have the better classification performance for the liver cancer. PMID:27022407

  18. On the Emergent Constraints of Climate Sensitivity [On proposed emergent constraints of climate sensitivity

    DOE PAGES

    Qu, Xin; Hall, Alex; DeAngelis, Anthony M.; ...

    2018-01-11

    Differences among climate models in equilibrium climate sensitivity (ECS; the equilibrium surface temperature response to a doubling of atmospheric CO2) remain a significant barrier to the accurate assessment of societally important impacts of climate change. Relationships between ECS and observable metrics of the current climate in model ensembles, so-called emergent constraints, have been used to constrain ECS. Here a statistical method (including a backward selection process) is employed to achieve a better statistical understanding of the connections between four recently proposed emergent constraint metrics and individual feedbacks influencing ECS. The relationship between each metric and ECS is largely attributable tomore » a statistical connection with shortwave low cloud feedback, the leading cause of intermodel ECS spread. This result bolsters confidence in some of the metrics, which had assumed such a connection in the first place. Additional analysis is conducted with a few thousand artificial metrics that are randomly generated but are well correlated with ECS. The relationships between the contrived metrics and ECS can also be linked statistically to shortwave cloud feedback. Thus, any proposed or forthcoming ECS constraint based on the current generation of climate models should be viewed as a potential constraint on shortwave cloud feedback, and physical links with that feedback should be investigated to verify that the constraint is real. Additionally, any proposed ECS constraint should not be taken at face value since other factors influencing ECS besides shortwave cloud feedback could be systematically biased in the models.« less

  19. On the Emergent Constraints of Climate Sensitivity [On proposed emergent constraints of climate sensitivity

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Qu, Xin; Hall, Alex; DeAngelis, Anthony M.

    Differences among climate models in equilibrium climate sensitivity (ECS; the equilibrium surface temperature response to a doubling of atmospheric CO2) remain a significant barrier to the accurate assessment of societally important impacts of climate change. Relationships between ECS and observable metrics of the current climate in model ensembles, so-called emergent constraints, have been used to constrain ECS. Here a statistical method (including a backward selection process) is employed to achieve a better statistical understanding of the connections between four recently proposed emergent constraint metrics and individual feedbacks influencing ECS. The relationship between each metric and ECS is largely attributable tomore » a statistical connection with shortwave low cloud feedback, the leading cause of intermodel ECS spread. This result bolsters confidence in some of the metrics, which had assumed such a connection in the first place. Additional analysis is conducted with a few thousand artificial metrics that are randomly generated but are well correlated with ECS. The relationships between the contrived metrics and ECS can also be linked statistically to shortwave cloud feedback. Thus, any proposed or forthcoming ECS constraint based on the current generation of climate models should be viewed as a potential constraint on shortwave cloud feedback, and physical links with that feedback should be investigated to verify that the constraint is real. Additionally, any proposed ECS constraint should not be taken at face value since other factors influencing ECS besides shortwave cloud feedback could be systematically biased in the models.« less

  20. Spatial scan statistics for detection of multiple clusters with arbitrary shapes.

    PubMed

    Lin, Pei-Sheng; Kung, Yi-Hung; Clayton, Murray

    2016-12-01

    In applying scan statistics for public health research, it would be valuable to develop a detection method for multiple clusters that accommodates spatial correlation and covariate effects in an integrated model. In this article, we connect the concepts of the likelihood ratio (LR) scan statistic and the quasi-likelihood (QL) scan statistic to provide a series of detection procedures sufficiently flexible to apply to clusters of arbitrary shape. First, we use an independent scan model for detection of clusters and then a variogram tool to examine the existence of spatial correlation and regional variation based on residuals of the independent scan model. When the estimate of regional variation is significantly different from zero, a mixed QL estimating equation is developed to estimate coefficients of geographic clusters and covariates. We use the Benjamini-Hochberg procedure (1995) to find a threshold for p-values to address the multiple testing problem. A quasi-deviance criterion is used to regroup the estimated clusters to find geographic clusters with arbitrary shapes. We conduct simulations to compare the performance of the proposed method with other scan statistics. For illustration, the method is applied to enterovirus data from Taiwan. © 2016, The International Biometric Society.

Top