Sample records for statistical analysis based

  1. Time Series Analysis Based on Running Mann Whitney Z Statistics

    USDA-ARS?s Scientific Manuscript database

    A sensitive and objective time series analysis method based on the calculation of Mann Whitney U statistics is described. This method samples data rankings over moving time windows, converts those samples to Mann-Whitney U statistics, and then normalizes the U statistics to Z statistics using Monte-...

  2. Pathway analysis with next-generation sequencing data.

    PubMed

    Zhao, Jinying; Zhu, Yun; Boerwinkle, Eric; Xiong, Momiao

    2015-04-01

    Although pathway analysis methods have been developed and successfully applied to association studies of common variants, the statistical methods for pathway-based association analysis of rare variants have not been well developed. Many investigators observed highly inflated false-positive rates and low power in pathway-based tests of association of rare variants. The inflated false-positive rates and low true-positive rates of the current methods are mainly due to their lack of ability to account for gametic phase disequilibrium. To overcome these serious limitations, we develop a novel statistic that is based on the smoothed functional principal component analysis (SFPCA) for pathway association tests with next-generation sequencing data. The developed statistic has the ability to capture position-level variant information and account for gametic phase disequilibrium. By intensive simulations, we demonstrate that the SFPCA-based statistic for testing pathway association with either rare or common or both rare and common variants has the correct type 1 error rates. Also the power of the SFPCA-based statistic and 22 additional existing statistics are evaluated. We found that the SFPCA-based statistic has a much higher power than other existing statistics in all the scenarios considered. To further evaluate its performance, the SFPCA-based statistic is applied to pathway analysis of exome sequencing data in the early-onset myocardial infarction (EOMI) project. We identify three pathways significantly associated with EOMI after the Bonferroni correction. In addition, our preliminary results show that the SFPCA-based statistic has much smaller P-values to identify pathway association than other existing methods.

  3. 75 FR 72611 - Assessments, Large Bank Pricing

    Federal Register 2010, 2011, 2012, 2013, 2014

    2010-11-24

    ... the worst risk ranking and are included in the statistical analysis. Appendix 1 to the NPR describes the statistical analysis in detail. \\12\\ The percentage approximated by factors is based on the statistical model for that particual year. Actual weights assigned to each scorecard measure are largely based...

  4. General Framework for Meta-analysis of Rare Variants in Sequencing Association Studies

    PubMed Central

    Lee, Seunggeun; Teslovich, Tanya M.; Boehnke, Michael; Lin, Xihong

    2013-01-01

    We propose a general statistical framework for meta-analysis of gene- or region-based multimarker rare variant association tests in sequencing association studies. In genome-wide association studies, single-marker meta-analysis has been widely used to increase statistical power by combining results via regression coefficients and standard errors from different studies. In analysis of rare variants in sequencing studies, region-based multimarker tests are often used to increase power. We propose meta-analysis methods for commonly used gene- or region-based rare variants tests, such as burden tests and variance component tests. Because estimation of regression coefficients of individual rare variants is often unstable or not feasible, the proposed method avoids this difficulty by calculating score statistics instead that only require fitting the null model for each study and then aggregating these score statistics across studies. Our proposed meta-analysis rare variant association tests are conducted based on study-specific summary statistics, specifically score statistics for each variant and between-variant covariance-type (linkage disequilibrium) relationship statistics for each gene or region. The proposed methods are able to incorporate different levels of heterogeneity of genetic effects across studies and are applicable to meta-analysis of multiple ancestry groups. We show that the proposed methods are essentially as powerful as joint analysis by directly pooling individual level genotype data. We conduct extensive simulations to evaluate the performance of our methods by varying levels of heterogeneity across studies, and we apply the proposed methods to meta-analysis of rare variant effects in a multicohort study of the genetics of blood lipid levels. PMID:23768515

  5. General specifications for the development of a USL NASA PC R and D statistical analysis support package

    NASA Technical Reports Server (NTRS)

    Dominick, Wayne D. (Editor); Bassari, Jinous; Triantafyllopoulos, Spiros

    1984-01-01

    The University of Southwestern Louisiana (USL) NASA PC R and D statistical analysis support package is designed to be a three-level package to allow statistical analysis for a variety of applications within the USL Data Base Management System (DBMS) contract work. The design addresses usage of the statistical facilities as a library package, as an interactive statistical analysis system, and as a batch processing package.

  6. Controlling the joint local false discovery rate is more powerful than meta-analysis methods in joint analysis of summary statistics from multiple genome-wide association studies.

    PubMed

    Jiang, Wei; Yu, Weichuan

    2017-02-15

    In genome-wide association studies (GWASs) of common diseases/traits, we often analyze multiple GWASs with the same phenotype together to discover associated genetic variants with higher power. Since it is difficult to access data with detailed individual measurements, summary-statistics-based meta-analysis methods have become popular to jointly analyze datasets from multiple GWASs. In this paper, we propose a novel summary-statistics-based joint analysis method based on controlling the joint local false discovery rate (Jlfdr). We prove that our method is the most powerful summary-statistics-based joint analysis method when controlling the false discovery rate at a certain level. In particular, the Jlfdr-based method achieves higher power than commonly used meta-analysis methods when analyzing heterogeneous datasets from multiple GWASs. Simulation experiments demonstrate the superior power of our method over meta-analysis methods. Also, our method discovers more associations than meta-analysis methods from empirical datasets of four phenotypes. The R-package is available at: http://bioinformatics.ust.hk/Jlfdr.html . eeyu@ust.hk. Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com

  7. Dealing with missing standard deviation and mean values in meta-analysis of continuous outcomes: a systematic review.

    PubMed

    Weir, Christopher J; Butcher, Isabella; Assi, Valentina; Lewis, Stephanie C; Murray, Gordon D; Langhorne, Peter; Brady, Marian C

    2018-03-07

    Rigorous, informative meta-analyses rely on availability of appropriate summary statistics or individual participant data. For continuous outcomes, especially those with naturally skewed distributions, summary information on the mean or variability often goes unreported. While full reporting of original trial data is the ideal, we sought to identify methods for handling unreported mean or variability summary statistics in meta-analysis. We undertook two systematic literature reviews to identify methodological approaches used to deal with missing mean or variability summary statistics. Five electronic databases were searched, in addition to the Cochrane Colloquium abstract books and the Cochrane Statistics Methods Group mailing list archive. We also conducted cited reference searching and emailed topic experts to identify recent methodological developments. Details recorded included the description of the method, the information required to implement the method, any underlying assumptions and whether the method could be readily applied in standard statistical software. We provided a summary description of the methods identified, illustrating selected methods in example meta-analysis scenarios. For missing standard deviations (SDs), following screening of 503 articles, fifteen methods were identified in addition to those reported in a previous review. These included Bayesian hierarchical modelling at the meta-analysis level; summary statistic level imputation based on observed SD values from other trials in the meta-analysis; a practical approximation based on the range; and algebraic estimation of the SD based on other summary statistics. Following screening of 1124 articles for methods estimating the mean, one approximate Bayesian computation approach and three papers based on alternative summary statistics were identified. Illustrative meta-analyses showed that when replacing a missing SD the approximation using the range minimised loss of precision and generally performed better than omitting trials. When estimating missing means, a formula using the median, lower quartile and upper quartile performed best in preserving the precision of the meta-analysis findings, although in some scenarios, omitting trials gave superior results. Methods based on summary statistics (minimum, maximum, lower quartile, upper quartile, median) reported in the literature facilitate more comprehensive inclusion of randomised controlled trials with missing mean or variability summary statistics within meta-analyses.

  8. [Design and implementation of online statistical analysis function in information system of air pollution and health impact monitoring].

    PubMed

    Lü, Yiran; Hao, Shuxin; Zhang, Guoqing; Liu, Jie; Liu, Yue; Xu, Dongqun

    2018-01-01

    To implement the online statistical analysis function in information system of air pollution and health impact monitoring, and obtain the data analysis information real-time. Using the descriptive statistical method as well as time-series analysis and multivariate regression analysis, SQL language and visual tools to implement online statistical analysis based on database software. Generate basic statistical tables and summary tables of air pollution exposure and health impact data online; Generate tendency charts of each data part online and proceed interaction connecting to database; Generate butting sheets which can lead to R, SAS and SPSS directly online. The information system air pollution and health impact monitoring implements the statistical analysis function online, which can provide real-time analysis result to its users.

  9. [Adequate application of quantitative and qualitative statistic analytic methods in acupuncture clinical trials].

    PubMed

    Tan, Ming T; Liu, Jian-ping; Lao, Lixing

    2012-08-01

    Recently, proper use of the statistical methods in traditional Chinese medicine (TCM) randomized controlled trials (RCTs) has received increased attention. Statistical inference based on hypothesis testing is the foundation of clinical trials and evidence-based medicine. In this article, the authors described the methodological differences between literature published in Chinese and Western journals in the design and analysis of acupuncture RCTs and the application of basic statistical principles. In China, qualitative analysis method has been widely used in acupuncture and TCM clinical trials, while the between-group quantitative analysis methods on clinical symptom scores are commonly used in the West. The evidence for and against these analytical differences were discussed based on the data of RCTs assessing acupuncture for pain relief. The authors concluded that although both methods have their unique advantages, quantitative analysis should be used as the primary analysis while qualitative analysis can be a secondary criterion for analysis. The purpose of this paper is to inspire further discussion of such special issues in clinical research design and thus contribute to the increased scientific rigor of TCM research.

  10. Statistical tools for transgene copy number estimation based on real-time PCR.

    PubMed

    Yuan, Joshua S; Burris, Jason; Stewart, Nathan R; Mentewab, Ayalew; Stewart, C Neal

    2007-11-01

    As compared with traditional transgene copy number detection technologies such as Southern blot analysis, real-time PCR provides a fast, inexpensive and high-throughput alternative. However, the real-time PCR based transgene copy number estimation tends to be ambiguous and subjective stemming from the lack of proper statistical analysis and data quality control to render a reliable estimation of copy number with a prediction value. Despite the recent progresses in statistical analysis of real-time PCR, few publications have integrated these advancements in real-time PCR based transgene copy number determination. Three experimental designs and four data quality control integrated statistical models are presented. For the first method, external calibration curves are established for the transgene based on serially-diluted templates. The Ct number from a control transgenic event and putative transgenic event are compared to derive the transgene copy number or zygosity estimation. Simple linear regression and two group T-test procedures were combined to model the data from this design. For the second experimental design, standard curves were generated for both an internal reference gene and the transgene, and the copy number of transgene was compared with that of internal reference gene. Multiple regression models and ANOVA models can be employed to analyze the data and perform quality control for this approach. In the third experimental design, transgene copy number is compared with reference gene without a standard curve, but rather, is based directly on fluorescence data. Two different multiple regression models were proposed to analyze the data based on two different approaches of amplification efficiency integration. Our results highlight the importance of proper statistical treatment and quality control integration in real-time PCR-based transgene copy number determination. These statistical methods allow the real-time PCR-based transgene copy number estimation to be more reliable and precise with a proper statistical estimation. Proper confidence intervals are necessary for unambiguous prediction of trangene copy number. The four different statistical methods are compared for their advantages and disadvantages. Moreover, the statistical methods can also be applied for other real-time PCR-based quantification assays including transfection efficiency analysis and pathogen quantification.

  11. Ground-Based Navigation and Dispersion Analysis for the Orion Exploration Mission 1

    NASA Technical Reports Server (NTRS)

    D' Souza, Christopher; Holt, Greg; Zanetti, Renato; Wood, Brandon

    2016-01-01

    This paper presents the Orion Exploration Mission 1 Linear Covariance Analysis for the DRO mission using ground-based navigation. The Delta V statistics for each maneuver are presented. In particular, the statistics of the lunar encounters and the Entry Interface are presented.

  12. SOCR: Statistics Online Computational Resource

    PubMed Central

    Dinov, Ivo D.

    2011-01-01

    The need for hands-on computer laboratory experience in undergraduate and graduate statistics education has been firmly established in the past decade. As a result a number of attempts have been undertaken to develop novel approaches for problem-driven statistical thinking, data analysis and result interpretation. In this paper we describe an integrated educational web-based framework for: interactive distribution modeling, virtual online probability experimentation, statistical data analysis, visualization and integration. Following years of experience in statistical teaching at all college levels using established licensed statistical software packages, like STATA, S-PLUS, R, SPSS, SAS, Systat, etc., we have attempted to engineer a new statistics education environment, the Statistics Online Computational Resource (SOCR). This resource performs many of the standard types of statistical analysis, much like other classical tools. In addition, it is designed in a plug-in object-oriented architecture and is completely platform independent, web-based, interactive, extensible and secure. Over the past 4 years we have tested, fine-tuned and reanalyzed the SOCR framework in many of our undergraduate and graduate probability and statistics courses and have evidence that SOCR resources build student’s intuition and enhance their learning. PMID:21451741

  13. Statistical parsimony networks and species assemblages in Cephalotrichid nemerteans (nemertea).

    PubMed

    Chen, Haixia; Strand, Malin; Norenburg, Jon L; Sun, Shichun; Kajihara, Hiroshi; Chernyshev, Alexey V; Maslakova, Svetlana A; Sundberg, Per

    2010-09-21

    It has been suggested that statistical parsimony network analysis could be used to get an indication of species represented in a set of nucleotide data, and the approach has been used to discuss species boundaries in some taxa. Based on 635 base pairs of the mitochondrial protein-coding gene cytochrome c oxidase I (COI), we analyzed 152 nemertean specimens using statistical parsimony network analysis with the connection probability set to 95%. The analysis revealed 15 distinct networks together with seven singletons. Statistical parsimony yielded three networks supporting the species status of Cephalothrix rufifrons, C. major and C. spiralis as they currently have been delineated by morphological characters and geographical location. Many other networks contained haplotypes from nearby geographical locations. Cladistic structure by maximum likelihood analysis overall supported the network analysis, but indicated a false positive result where subnetworks should have been connected into one network/species. This probably is caused by undersampling of the intraspecific haplotype diversity. Statistical parsimony network analysis provides a rapid and useful tool for detecting possible undescribed/cryptic species among cephalotrichid nemerteans based on COI gene. It should be combined with phylogenetic analysis to get indications of false positive results, i.e., subnetworks that would have been connected with more extensive haplotype sampling.

  14. A simulations approach for meta-analysis of genetic association studies based on additive genetic model.

    PubMed

    John, Majnu; Lencz, Todd; Malhotra, Anil K; Correll, Christoph U; Zhang, Jian-Ping

    2018-06-01

    Meta-analysis of genetic association studies is being increasingly used to assess phenotypic differences between genotype groups. When the underlying genetic model is assumed to be dominant or recessive, assessing the phenotype differences based on summary statistics, reported for individual studies in a meta-analysis, is a valid strategy. However, when the genetic model is additive, a similar strategy based on summary statistics will lead to biased results. This fact about the additive model is one of the things that we establish in this paper, using simulations. The main goal of this paper is to present an alternate strategy for the additive model based on simulating data for the individual studies. We show that the alternate strategy is far superior to the strategy based on summary statistics.

  15. Active Structural Acoustic Control as an Approach to Acoustic Optimization of Lightweight Structures

    DTIC Science & Technology

    2001-06-01

    appropriate approach based on Statistical Energy Analysis (SEA) would facilitate investigations of the structural behavior at a high modal density. On the way...higher frequency investigations an approach based on the Statistical Energy Analysis (SEA) is recommended to describe the structural dynamic behavior

  16. Comparisons of non-Gaussian statistical models in DNA methylation analysis.

    PubMed

    Ma, Zhanyu; Teschendorff, Andrew E; Yu, Hong; Taghia, Jalil; Guo, Jun

    2014-06-16

    As a key regulatory mechanism of gene expression, DNA methylation patterns are widely altered in many complex genetic diseases, including cancer. DNA methylation is naturally quantified by bounded support data; therefore, it is non-Gaussian distributed. In order to capture such properties, we introduce some non-Gaussian statistical models to perform dimension reduction on DNA methylation data. Afterwards, non-Gaussian statistical model-based unsupervised clustering strategies are applied to cluster the data. Comparisons and analysis of different dimension reduction strategies and unsupervised clustering methods are presented. Experimental results show that the non-Gaussian statistical model-based methods are superior to the conventional Gaussian distribution-based method. They are meaningful tools for DNA methylation analysis. Moreover, among several non-Gaussian methods, the one that captures the bounded nature of DNA methylation data reveals the best clustering performance.

  17. Comparisons of Non-Gaussian Statistical Models in DNA Methylation Analysis

    PubMed Central

    Ma, Zhanyu; Teschendorff, Andrew E.; Yu, Hong; Taghia, Jalil; Guo, Jun

    2014-01-01

    As a key regulatory mechanism of gene expression, DNA methylation patterns are widely altered in many complex genetic diseases, including cancer. DNA methylation is naturally quantified by bounded support data; therefore, it is non-Gaussian distributed. In order to capture such properties, we introduce some non-Gaussian statistical models to perform dimension reduction on DNA methylation data. Afterwards, non-Gaussian statistical model-based unsupervised clustering strategies are applied to cluster the data. Comparisons and analysis of different dimension reduction strategies and unsupervised clustering methods are presented. Experimental results show that the non-Gaussian statistical model-based methods are superior to the conventional Gaussian distribution-based method. They are meaningful tools for DNA methylation analysis. Moreover, among several non-Gaussian methods, the one that captures the bounded nature of DNA methylation data reveals the best clustering performance. PMID:24937687

  18. A Novel Genome-Information Content-Based Statistic for Genome-Wide Association Analysis Designed for Next-Generation Sequencing Data

    PubMed Central

    Luo, Li; Zhu, Yun

    2012-01-01

    Abstract The genome-wide association studies (GWAS) designed for next-generation sequencing data involve testing association of genomic variants, including common, low frequency, and rare variants. The current strategies for association studies are well developed for identifying association of common variants with the common diseases, but may be ill-suited when large amounts of allelic heterogeneity are present in sequence data. Recently, group tests that analyze their collective frequency differences between cases and controls shift the current variant-by-variant analysis paradigm for GWAS of common variants to the collective test of multiple variants in the association analysis of rare variants. However, group tests ignore differences in genetic effects among SNPs at different genomic locations. As an alternative to group tests, we developed a novel genome-information content-based statistics for testing association of the entire allele frequency spectrum of genomic variation with the diseases. To evaluate the performance of the proposed statistics, we use large-scale simulations based on whole genome low coverage pilot data in the 1000 Genomes Project to calculate the type 1 error rates and power of seven alternative statistics: a genome-information content-based statistic, the generalized T2, collapsing method, multivariate and collapsing (CMC) method, individual χ2 test, weighted-sum statistic, and variable threshold statistic. Finally, we apply the seven statistics to published resequencing dataset from ANGPTL3, ANGPTL4, ANGPTL5, and ANGPTL6 genes in the Dallas Heart Study. We report that the genome-information content-based statistic has significantly improved type 1 error rates and higher power than the other six statistics in both simulated and empirical datasets. PMID:22651812

  19. A novel genome-information content-based statistic for genome-wide association analysis designed for next-generation sequencing data.

    PubMed

    Luo, Li; Zhu, Yun; Xiong, Momiao

    2012-06-01

    The genome-wide association studies (GWAS) designed for next-generation sequencing data involve testing association of genomic variants, including common, low frequency, and rare variants. The current strategies for association studies are well developed for identifying association of common variants with the common diseases, but may be ill-suited when large amounts of allelic heterogeneity are present in sequence data. Recently, group tests that analyze their collective frequency differences between cases and controls shift the current variant-by-variant analysis paradigm for GWAS of common variants to the collective test of multiple variants in the association analysis of rare variants. However, group tests ignore differences in genetic effects among SNPs at different genomic locations. As an alternative to group tests, we developed a novel genome-information content-based statistics for testing association of the entire allele frequency spectrum of genomic variation with the diseases. To evaluate the performance of the proposed statistics, we use large-scale simulations based on whole genome low coverage pilot data in the 1000 Genomes Project to calculate the type 1 error rates and power of seven alternative statistics: a genome-information content-based statistic, the generalized T(2), collapsing method, multivariate and collapsing (CMC) method, individual χ(2) test, weighted-sum statistic, and variable threshold statistic. Finally, we apply the seven statistics to published resequencing dataset from ANGPTL3, ANGPTL4, ANGPTL5, and ANGPTL6 genes in the Dallas Heart Study. We report that the genome-information content-based statistic has significantly improved type 1 error rates and higher power than the other six statistics in both simulated and empirical datasets.

  20. A Matlab user interface for the statistically assisted fluid registration algorithm and tensor-based morphometry

    NASA Astrophysics Data System (ADS)

    Yepes-Calderon, Fernando; Brun, Caroline; Sant, Nishita; Thompson, Paul; Lepore, Natasha

    2015-01-01

    Tensor-Based Morphometry (TBM) is an increasingly popular method for group analysis of brain MRI data. The main steps in the analysis consist of a nonlinear registration to align each individual scan to a common space, and a subsequent statistical analysis to determine morphometric differences, or difference in fiber structure between groups. Recently, we implemented the Statistically-Assisted Fluid Registration Algorithm or SAFIRA,1 which is designed for tracking morphometric differences among populations. To this end, SAFIRA allows the inclusion of statistical priors extracted from the populations being studied as regularizers in the registration. This flexibility and degree of sophistication limit the tool to expert use, even more so considering that SAFIRA was initially implemented in command line mode. Here, we introduce a new, intuitive, easy to use, Matlab-based graphical user interface for SAFIRA's multivariate TBM. The interface also generates different choices for the TBM statistics, including both the traditional univariate statistics on the Jacobian matrix, and comparison of the full deformation tensors.2 This software will be freely disseminated to the neuroimaging research community.

  1. Shock and Vibration Symposium (59th) Held in Albuquerque, New Mexico on 18-20 October 1988. Volume 3

    DTIC Science & Technology

    1988-10-01

    N. F. Rieger Statistical Energy Analysis : An Overview of Its Development and Engineering Applications J. E. Manning DATA BASES DOE/DOD Environmental...Vibroacoustic Response Using the Finite Element Method and Statistical Energy Analysis F. L. Gloyna Study of Helium Effect on Spacecraft Random Vibration...Analysis S. A. Wilkerson vi DYNAMIC ANALYSIS Modeling of Vibration Transmission in a Damped Beam Structure Using Statistical Energy Analysis S. S

  2. A Statistical Discrimination Experiment for Eurasian Events Using a Twenty-Seven-Station Network

    DTIC Science & Technology

    1980-07-08

    to test the effectiveness of a multivariate method of analysis for distinguishing earthquakes from explosions. The data base for the experiment...to test the effectiveness of a multivariate method of analysis for distinguishing earthquakes from explosions. The data base for the experiment...the weight assigned to each variable whenever a new one is added. Jennrich, R. I. (1977). Stepwise discriminant analysis , in Statistical Methods for

  3. A note on generalized Genome Scan Meta-Analysis statistics

    PubMed Central

    Koziol, James A; Feng, Anne C

    2005-01-01

    Background Wise et al. introduced a rank-based statistical technique for meta-analysis of genome scans, the Genome Scan Meta-Analysis (GSMA) method. Levinson et al. recently described two generalizations of the GSMA statistic: (i) a weighted version of the GSMA statistic, so that different studies could be ascribed different weights for analysis; and (ii) an order statistic approach, reflecting the fact that a GSMA statistic can be computed for each chromosomal region or bin width across the various genome scan studies. Results We provide an Edgeworth approximation to the null distribution of the weighted GSMA statistic, and, we examine the limiting distribution of the GSMA statistics under the order statistic formulation, and quantify the relevance of the pairwise correlations of the GSMA statistics across different bins on this limiting distribution. We also remark on aggregate criteria and multiple testing for determining significance of GSMA results. Conclusion Theoretical considerations detailed herein can lead to clarification and simplification of testing criteria for generalizations of the GSMA statistic. PMID:15717930

  4. The Importance of Statistical Modeling in Data Analysis and Inference

    ERIC Educational Resources Information Center

    Rollins, Derrick, Sr.

    2017-01-01

    Statistical inference simply means to draw a conclusion based on information that comes from data. Error bars are the most commonly used tool for data analysis and inference in chemical engineering data studies. This work demonstrates, using common types of data collection studies, the importance of specifying the statistical model for sound…

  5. Online Statistical Modeling (Regression Analysis) for Independent Responses

    NASA Astrophysics Data System (ADS)

    Made Tirta, I.; Anggraeni, Dian; Pandutama, Martinus

    2017-06-01

    Regression analysis (statistical analmodelling) are among statistical methods which are frequently needed in analyzing quantitative data, especially to model relationship between response and explanatory variables. Nowadays, statistical models have been developed into various directions to model various type and complex relationship of data. Rich varieties of advanced and recent statistical modelling are mostly available on open source software (one of them is R). However, these advanced statistical modelling, are not very friendly to novice R users, since they are based on programming script or command line interface. Our research aims to developed web interface (based on R and shiny), so that most recent and advanced statistical modelling are readily available, accessible and applicable on web. We have previously made interface in the form of e-tutorial for several modern and advanced statistical modelling on R especially for independent responses (including linear models/LM, generalized linier models/GLM, generalized additive model/GAM and generalized additive model for location scale and shape/GAMLSS). In this research we unified them in the form of data analysis, including model using Computer Intensive Statistics (Bootstrap and Markov Chain Monte Carlo/ MCMC). All are readily accessible on our online Virtual Statistics Laboratory. The web (interface) make the statistical modeling becomes easier to apply and easier to compare them in order to find the most appropriate model for the data.

  6. Randomization Procedures Applied to Analysis of Ballistic Data

    DTIC Science & Technology

    1991-06-01

    test,;;15. NUMBER OF PAGES data analysis; computationally intensive statistics ; randomization tests; permutation tests; 16 nonparametric statistics ...be 0.13. 8 Any reasonable statistical procedure would fail to support the notion of improvement of dynamic over standard indexing based on this data ...AD-A238 389 TECHNICAL REPORT BRL-TR-3245 iBRL RANDOMIZATION PROCEDURES APPLIED TO ANALYSIS OF BALLISTIC DATA MALCOLM S. TAYLOR BARRY A. BODT - JUNE

  7. The Shock and Vibration Digest, Volume 17, Number 8

    DTIC Science & Technology

    1985-08-01

    ate, transmit, and radiate audible sound. dures are based on acoustic power flow, statistical energy analysis (SEA), and modal methods [22-283. A...modified partition area. features of the acoustic field. I.--1 85-1642 Statistical Energy Analysis , Structural Reso- nances, and Beam Networks BUILDING...energy methods, Structural resonance L.J. Lee Heriot-Watt Univ., Chambers St., Edinburgh The statistical energy analysis method is EHI 1HX, Scotland

  8. Interpolative modeling of GaAs FET S-parameter data bases for use in Monte Carlo simulations

    NASA Technical Reports Server (NTRS)

    Campbell, L.; Purviance, J.

    1992-01-01

    A statistical interpolation technique is presented for modeling GaAs FET S-parameter measurements for use in the statistical analysis and design of circuits. This is accomplished by interpolating among the measurements in a GaAs FET S-parameter data base in a statistically valid manner.

  9. Web-Based Statistical Sampling and Analysis

    ERIC Educational Resources Information Center

    Quinn, Anne; Larson, Karen

    2016-01-01

    Consistent with the Common Core State Standards for Mathematics (CCSSI 2010), the authors write that they have asked students to do statistics projects with real data. To obtain real data, their students use the free Web-based app, Census at School, created by the American Statistical Association (ASA) to help promote civic awareness among school…

  10. Security of statistical data bases: invasion of privacy through attribute correlational modeling

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Palley, M.A.

    This study develops, defines, and applies a statistical technique for the compromise of confidential information in a statistical data base. Attribute Correlational Modeling (ACM) recognizes that the information contained in a statistical data base represents real world statistical phenomena. As such, ACM assumes correlational behavior among the database attributes. ACM proceeds to compromise confidential information through creation of a regression model, where the confidential attribute is treated as the dependent variable. The typical statistical data base may preclude the direct application of regression. In this scenario, the research introduces the notion of a synthetic data base, created through legitimate queriesmore » of the actual data base, and through proportional random variation of responses to these queries. The synthetic data base is constructed to resemble the actual data base as closely as possible in a statistical sense. ACM then applies regression analysis to the synthetic data base, and utilizes the derived model to estimate confidential information in the actual database.« less

  11. Statistical analysis of Thematic Mapper Simulator data for the geobotanical discrimination of rock types in southwest Oregon

    NASA Technical Reports Server (NTRS)

    Morrissey, L. A.; Weinstock, K. J.; Mouat, D. A.; Card, D. H.

    1984-01-01

    An evaluation of Thematic Mapper Simulator (TMS) data for the geobotanical discrimination of rock types based on vegetative cover characteristics is addressed in this research. A methodology for accomplishing this evaluation utilizing univariate and multivariate techniques is presented. TMS data acquired with a Daedalus DEI-1260 multispectral scanner were integrated with vegetation and geologic information for subsequent statistical analyses, which included a chi-square test, an analysis of variance, stepwise discriminant analysis, and Duncan's multiple range test. Results indicate that ultramafic rock types are spectrally separable from nonultramafics based on vegetative cover through the use of statistical analyses.

  12. Independent component analysis for automatic note extraction from musical trills

    NASA Astrophysics Data System (ADS)

    Brown, Judith C.; Smaragdis, Paris

    2004-05-01

    The method of principal component analysis, which is based on second-order statistics (or linear independence), has long been used for redundancy reduction of audio data. The more recent technique of independent component analysis, enforcing much stricter statistical criteria based on higher-order statistical independence, is introduced and shown to be far superior in separating independent musical sources. This theory has been applied to piano trills and a database of trill rates was assembled from experiments with a computer-driven piano, recordings of a professional pianist, and commercially available compact disks. The method of independent component analysis has thus been shown to be an outstanding, effective means of automatically extracting interesting musical information from a sea of redundant data.

  13. Two Paradoxes in Linear Regression Analysis.

    PubMed

    Feng, Ge; Peng, Jing; Tu, Dongke; Zheng, Julia Z; Feng, Changyong

    2016-12-25

    Regression is one of the favorite tools in applied statistics. However, misuse and misinterpretation of results from regression analysis are common in biomedical research. In this paper we use statistical theory and simulation studies to clarify some paradoxes around this popular statistical method. In particular, we show that a widely used model selection procedure employed in many publications in top medical journals is wrong. Formal procedures based on solid statistical theory should be used in model selection.

  14. Tract-Based Spatial Statistics in Preterm-Born Neonates Predicts Cognitive and Motor Outcomes at 18 Months.

    PubMed

    Duerden, E G; Foong, J; Chau, V; Branson, H; Poskitt, K J; Grunau, R E; Synnes, A; Zwicker, J G; Miller, S P

    2015-08-01

    Adverse neurodevelopmental outcome is common in children born preterm. Early sensitive predictors of neurodevelopmental outcome such as MR imaging are needed. Tract-based spatial statistics, a diffusion MR imaging analysis method, performed at term-equivalent age (40 weeks) is a promising predictor of neurodevelopmental outcomes in children born very preterm. We sought to determine the association of tract-based spatial statistics findings before term-equivalent age with neurodevelopmental outcome at 18-months corrected age. Of 180 neonates (born at 24-32-weeks' gestation) enrolled, 153 had DTI acquired early at 32 weeks' postmenstrual age and 105 had DTI acquired later at 39.6 weeks' postmenstrual age. Voxelwise statistics were calculated by performing tract-based spatial statistics on DTI that was aligned to age-appropriate templates. At 18-month corrected age, 166 neonates underwent neurodevelopmental assessment by using the Bayley Scales of Infant Development, 3rd ed, and the Peabody Developmental Motor Scales, 2nd ed. Tract-based spatial statistics analysis applied to early-acquired scans (postmenstrual age of 30-33 weeks) indicated a limited significant positive association between motor skills and axial diffusivity and radial diffusivity values in the corpus callosum, internal and external/extreme capsules, and midbrain (P < .05, corrected). In contrast, for term scans (postmenstrual age of 37-41 weeks), tract-based spatial statistics analysis showed a significant relationship between both motor and cognitive scores with fractional anisotropy in the corpus callosum and corticospinal tracts (P < .05, corrected). Tract-based spatial statistics in a limited subset of neonates (n = 22) scanned at <30 weeks did not significantly predict neurodevelopmental outcomes. The strength of the association between fractional anisotropy values and neurodevelopmental outcome scores increased from early-to-late-acquired scans in preterm-born neonates, consistent with brain dysmaturation in this population. © 2015 by American Journal of Neuroradiology.

  15. Statistics on gene-based laser speckles with a small number of scatterers: implications for the detection of polymorphism in the Chlamydia trachomatis omp1 gene

    NASA Astrophysics Data System (ADS)

    Ulyanov, Sergey S.; Ulianova, Onega V.; Zaytsev, Sergey S.; Saltykov, Yury V.; Feodorova, Valentina A.

    2018-04-01

    The transformation mechanism for a nucleotide sequence of the Chlamydia trachomatis gene into a speckle pattern has been considered. The first and second-order statistics of gene-based speckles have been analyzed. It has been demonstrated that gene-based speckles do not obey Gaussian statistics and belong to the class of speckles with a small number of scatterers. It has been shown that gene polymorphism can be easily detected through analysis of the statistical characteristics of gene-based speckles.

  16. Aspects of First Year Statistics Students' Reasoning When Performing Intuitive Analysis of Variance: Effects of Within- and Between-Group Variability

    ERIC Educational Resources Information Center

    Trumpower, David L.

    2015-01-01

    Making inferences about population differences based on samples of data, that is, performing intuitive analysis of variance (IANOVA), is common in everyday life. However, the intuitive reasoning of individuals when making such inferences (even following statistics instruction), often differs from the normative logic of formal statistics. The…

  17. 10 CFR 431.173 - Requirements applicable to all manufacturers.

    Code of Federal Regulations, 2011 CFR

    2011-01-01

    ... COMMERCIAL AND INDUSTRIAL EQUIPMENT Provisions for Commercial Heating, Ventilating, Air-Conditioning and... is based on engineering or statistical analysis, computer simulation or modeling, or other analytic... method or methods used; (B) The mathematical model, the engineering or statistical analysis, computer...

  18. Two Paradoxes in Linear Regression Analysis

    PubMed Central

    FENG, Ge; PENG, Jing; TU, Dongke; ZHENG, Julia Z.; FENG, Changyong

    2016-01-01

    Summary Regression is one of the favorite tools in applied statistics. However, misuse and misinterpretation of results from regression analysis are common in biomedical research. In this paper we use statistical theory and simulation studies to clarify some paradoxes around this popular statistical method. In particular, we show that a widely used model selection procedure employed in many publications in top medical journals is wrong. Formal procedures based on solid statistical theory should be used in model selection. PMID:28638214

  19. Performance Analysis of Live-Virtual-Constructive and Distributed Virtual Simulations: Defining Requirements in Terms of Temporal Consistency

    DTIC Science & Technology

    2009-12-01

    events. Work associated with aperiodic tasks have the same statistical behavior and the same timing requirements. The timing deadlines are soft. • Sporadic...answers, but it is possible to calculate how precise the estimates are. Simulation-based performance analysis of a model includes a statistical ...to evaluate all pos- sible states in a timely manner. This is the principle reason for resorting to simulation and statistical analysis to evaluate

  20. Monte Carlo based statistical power analysis for mediation models: methods and software.

    PubMed

    Zhang, Zhiyong

    2014-12-01

    The existing literature on statistical power analysis for mediation models often assumes data normality and is based on a less powerful Sobel test instead of the more powerful bootstrap test. This study proposes to estimate statistical power to detect mediation effects on the basis of the bootstrap method through Monte Carlo simulation. Nonnormal data with excessive skewness and kurtosis are allowed in the proposed method. A free R package called bmem is developed to conduct the power analysis discussed in this study. Four examples, including a simple mediation model, a multiple-mediator model with a latent mediator, a multiple-group mediation model, and a longitudinal mediation model, are provided to illustrate the proposed method.

  1. Some Statistics for Assessing Person-Fit Based on Continuous-Response Models

    ERIC Educational Resources Information Center

    Ferrando, Pere Joan

    2010-01-01

    This article proposes several statistics for assessing individual fit based on two unidimensional models for continuous responses: linear factor analysis and Samejima's continuous response model. Both models are approached using a common framework based on underlying response variables and are formulated at the individual level as fixed regression…

  2. Nonlinear multi-analysis of agent-based financial market dynamics by epidemic system

    NASA Astrophysics Data System (ADS)

    Lu, Yunfan; Wang, Jun; Niu, Hongli

    2015-10-01

    Based on the epidemic dynamical system, we construct a new agent-based financial time series model. In order to check and testify its rationality, we compare the statistical properties of the time series model with the real stock market indices, Shanghai Stock Exchange Composite Index and Shenzhen Stock Exchange Component Index. For analyzing the statistical properties, we combine the multi-parameter analysis with the tail distribution analysis, the modified rescaled range analysis, and the multifractal detrended fluctuation analysis. For a better perspective, the three-dimensional diagrams are used to present the analysis results. The empirical research in this paper indicates that the long-range dependence property and the multifractal phenomenon exist in the real returns and the proposed model. Therefore, the new agent-based financial model can recurrence some important features of real stock markets.

  3. Application of Ontology Technology in Health Statistic Data Analysis.

    PubMed

    Guo, Minjiang; Hu, Hongpu; Lei, Xingyun

    2017-01-01

    Research Purpose: establish health management ontology for analysis of health statistic data. Proposed Methods: this paper established health management ontology based on the analysis of the concepts in China Health Statistics Yearbook, and used protégé to define the syntactic and semantic structure of health statistical data. six classes of top-level ontology concepts and their subclasses had been extracted and the object properties and data properties were defined to establish the construction of these classes. By ontology instantiation, we can integrate multi-source heterogeneous data and enable administrators to have an overall understanding and analysis of the health statistic data. ontology technology provides a comprehensive and unified information integration structure of the health management domain and lays a foundation for the efficient analysis of multi-source and heterogeneous health system management data and enhancement of the management efficiency.

  4. INTERFACING SAS TO ORACLE IN THE UNIX ENVIRONMENT

    EPA Science Inventory

    SAS is an EPA standard data and statistical analysis software package while ORACLE is EPA's standard data base management system software package. RACLE has the advantage over SAS in data retrieval and storage capabilities but has limited data and statistical analysis capability....

  5. DEIVA: a web application for interactive visual analysis of differential gene expression profiles.

    PubMed

    Harshbarger, Jayson; Kratz, Anton; Carninci, Piero

    2017-01-07

    Differential gene expression (DGE) analysis is a technique to identify statistically significant differences in RNA abundance for genes or arbitrary features between different biological states. The result of a DGE test is typically further analyzed using statistical software, spreadsheets or custom ad hoc algorithms. We identified a need for a web-based system to share DGE statistical test results, and locate and identify genes in DGE statistical test results with a very low barrier of entry. We have developed DEIVA, a free and open source, browser-based single page application (SPA) with a strong emphasis on being user friendly that enables locating and identifying single or multiple genes in an immediate, interactive, and intuitive manner. By design, DEIVA scales with very large numbers of users and datasets. Compared to existing software, DEIVA offers a unique combination of design decisions that enable inspection and analysis of DGE statistical test results with an emphasis on ease of use.

  6. Economic and statistical analysis of time limitations for spotting fluids and fishing operations

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Keller, P.S.; Brinkmann, P.E.; Taneja, P.K.

    1984-05-01

    This paper reviews the statistics of ''Spotting Fluids'' to free stuck drill pipe as well as the economics and statistics of drill string fishing operations. Data were taken from Mobil Oil Exploration and Producing Southeast Inc.'s (MOEPSI) records from 1970-1981. Only those events which occur after a drill string becomes stuck are discussed. The data collected were categorized as Directional Wells and Straight Wells. Bar diagrams are presented to show the Success Ratio vs. Soaking Time for each of the two categories. An analysis was made to identify the elapsed time limit to place the spotting fluid for maximum probabilitymore » of success. Also determined was the statistical minimum soaking time and the maximum soaking time. For determining the time limit for fishing operations, the following criteria were used: 1. The Risked ''Economic Breakeven Analysis'' concept was developed based on the work of Harrison. 2. Statistical Probability of Success based on MOEPSI's records from 1970-1981.« less

  7. The Use of a Context-Based Information Retrieval Technique

    DTIC Science & Technology

    2009-07-01

    provided in context. Latent Semantic Analysis (LSA) is a statistical technique for inferring contextual and structural information, and previous studies...WAIS). 10 DSTO-TR-2322 1.4.4 Latent Semantic Analysis LSA, which is also known as latent semantic indexing (LSI), uses a statistical and...1.4.6 Language Models In contrast, natural language models apply algorithms that combine statistical information with semantic information. Semantic

  8. Travelogue--a newcomer encounters statistics and the computer.

    PubMed

    Bruce, Peter

    2011-11-01

    Computer-intensive methods have revolutionized statistics, giving rise to new areas of analysis and expertise in predictive analytics, image processing, pattern recognition, machine learning, genomic analysis, and more. Interest naturally centers on the new capabilities the computer allows the analyst to bring to the table. This article, instead, focuses on the account of how computer-based resampling methods, with their relative simplicity and transparency, enticed one individual, untutored in statistics or mathematics, on a long journey into learning statistics, then teaching it, then starting an education institution.

  9. Comparison of future and base precipitation anomalies by SimCLIM statistical projection through ensemble approach in Pakistan

    NASA Astrophysics Data System (ADS)

    Amin, Asad; Nasim, Wajid; Mubeen, Muhammad; Kazmi, Dildar Hussain; Lin, Zhaohui; Wahid, Abdul; Sultana, Syeda Refat; Gibbs, Jim; Fahad, Shah

    2017-09-01

    Unpredictable precipitation trends have largely influenced by climate change which prolonged droughts or floods in South Asia. Statistical analysis of monthly, seasonal, and annual precipitation trend carried out for different temporal (1996-2015 and 2041-2060) and spatial scale (39 meteorological stations) in Pakistan. Statistical downscaling model (SimCLIM) was used for future precipitation projection (2041-2060) and analyzed by statistical approach. Ensemble approach combined with representative concentration pathways (RCPs) at medium level used for future projections. The magnitude and slop of trends were derived by applying Mann-Kendal and Sen's slop statistical approaches. Geo-statistical application used to generate precipitation trend maps. Comparison of base and projected precipitation by statistical analysis represented by maps and graphical visualization which facilitate to detect trends. Results of this study projects that precipitation trend was increasing more than 70% of weather stations for February, March, April, August, and September represented as base years. Precipitation trend was decreased in February to April but increase in July to October in projected years. Highest decreasing trend was reported in January for base years which was also decreased in projected years. Greater variation in precipitation trends for projected and base years was reported in February to April. Variations in projected precipitation trend for Punjab and Baluchistan highly accredited in March and April. Seasonal analysis shows large variation in winter, which shows increasing trend for more than 30% of weather stations and this increased trend approaches 40% for projected precipitation. High risk was reported in base year pre-monsoon season where 90% of weather station shows increasing trend but in projected years this trend decreased up to 33%. Finally, the annual precipitation trend has increased for more than 90% of meteorological stations in base (1996-2015) which has decreased for projected year (2041-2060) up to 76%. These result revealed that overall precipitation trend is decreasing in future year which may prolonged the drought in 14% of weather stations under study.

  10. A PLSPM-Based Test Statistic for Detecting Gene-Gene Co-Association in Genome-Wide Association Study with Case-Control Design

    PubMed Central

    Zhang, Xiaoshuai; Yang, Xiaowei; Yuan, Zhongshang; Liu, Yanxun; Li, Fangyu; Peng, Bin; Zhu, Dianwen; Zhao, Jinghua; Xue, Fuzhong

    2013-01-01

    For genome-wide association data analysis, two genes in any pathway, two SNPs in the two linked gene regions respectively or in the two linked exons respectively within one gene are often correlated with each other. We therefore proposed the concept of gene-gene co-association, which refers to the effects not only due to the traditional interaction under nearly independent condition but the correlation between two genes. Furthermore, we constructed a novel statistic for detecting gene-gene co-association based on Partial Least Squares Path Modeling (PLSPM). Through simulation, the relationship between traditional interaction and co-association was highlighted under three different types of co-association. Both simulation and real data analysis demonstrated that the proposed PLSPM-based statistic has better performance than single SNP-based logistic model, PCA-based logistic model, and other gene-based methods. PMID:23620809

  11. A PLSPM-based test statistic for detecting gene-gene co-association in genome-wide association study with case-control design.

    PubMed

    Zhang, Xiaoshuai; Yang, Xiaowei; Yuan, Zhongshang; Liu, Yanxun; Li, Fangyu; Peng, Bin; Zhu, Dianwen; Zhao, Jinghua; Xue, Fuzhong

    2013-01-01

    For genome-wide association data analysis, two genes in any pathway, two SNPs in the two linked gene regions respectively or in the two linked exons respectively within one gene are often correlated with each other. We therefore proposed the concept of gene-gene co-association, which refers to the effects not only due to the traditional interaction under nearly independent condition but the correlation between two genes. Furthermore, we constructed a novel statistic for detecting gene-gene co-association based on Partial Least Squares Path Modeling (PLSPM). Through simulation, the relationship between traditional interaction and co-association was highlighted under three different types of co-association. Both simulation and real data analysis demonstrated that the proposed PLSPM-based statistic has better performance than single SNP-based logistic model, PCA-based logistic model, and other gene-based methods.

  12. Hybrid Evidence Theory-based Finite Element/Statistical Energy Analysis method for mid-frequency analysis of built-up systems with epistemic uncertainties

    NASA Astrophysics Data System (ADS)

    Yin, Shengwen; Yu, Dejie; Yin, Hui; Lü, Hui; Xia, Baizhan

    2017-09-01

    Considering the epistemic uncertainties within the hybrid Finite Element/Statistical Energy Analysis (FE/SEA) model when it is used for the response analysis of built-up systems in the mid-frequency range, the hybrid Evidence Theory-based Finite Element/Statistical Energy Analysis (ETFE/SEA) model is established by introducing the evidence theory. Based on the hybrid ETFE/SEA model and the sub-interval perturbation technique, the hybrid Sub-interval Perturbation and Evidence Theory-based Finite Element/Statistical Energy Analysis (SIP-ETFE/SEA) approach is proposed. In the hybrid ETFE/SEA model, the uncertainty in the SEA subsystem is modeled by a non-parametric ensemble, while the uncertainty in the FE subsystem is described by the focal element and basic probability assignment (BPA), and dealt with evidence theory. Within the hybrid SIP-ETFE/SEA approach, the mid-frequency response of interest, such as the ensemble average of the energy response and the cross-spectrum response, is calculated analytically by using the conventional hybrid FE/SEA method. Inspired by the probability theory, the intervals of the mean value, variance and cumulative distribution are used to describe the distribution characteristics of mid-frequency responses of built-up systems with epistemic uncertainties. In order to alleviate the computational burdens for the extreme value analysis, the sub-interval perturbation technique based on the first-order Taylor series expansion is used in ETFE/SEA model to acquire the lower and upper bounds of the mid-frequency responses over each focal element. Three numerical examples are given to illustrate the feasibility and effectiveness of the proposed method.

  13. A new strategy for statistical analysis-based fingerprint establishment: Application to quality assessment of Semen sojae praeparatum.

    PubMed

    Guo, Hui; Zhang, Zhen; Yao, Yuan; Liu, Jialin; Chang, Ruirui; Liu, Zhao; Hao, Hongyuan; Huang, Taohong; Wen, Jun; Zhou, Tingting

    2018-08-30

    Semen sojae praeparatum with homology of medicine and food is a famous traditional Chinese medicine. A simple and effective quality fingerprint analysis, coupled with chemometrics methods, was developed for quality assessment of Semen sojae praeparatum. First, similarity analysis (SA) and hierarchical clusting analysis (HCA) were applied to select the qualitative markers, which obviously influence the quality of Semen sojae praeparatum. 21 chemicals were selected and characterized by high resolution ion trap/time-of-flight mass spectrometry (LC-IT-TOF-MS). Subsequently, principal components analysis (PCA) and orthogonal partial least squares discriminant analysis (OPLS-DA) were conducted to select the quantitative markers of Semen sojae praeparatum samples from different origins. Moreover, 11 compounds with statistical significance were determined quantitatively, which provided an accurate and informative data for quality evaluation. This study proposes a new strategy for "statistic analysis-based fingerprint establishment", which would be a valuable reference for further study. Copyright © 2018 Elsevier Ltd. All rights reserved.

  14. Mass detection, localization and estimation for wind turbine blades based on statistical pattern recognition

    NASA Astrophysics Data System (ADS)

    Colone, L.; Hovgaard, M. K.; Glavind, L.; Brincker, R.

    2018-07-01

    A method for mass change detection on wind turbine blades using natural frequencies is presented. The approach is based on two statistical tests. The first test decides if there is a significant mass change and the second test is a statistical group classification based on Linear Discriminant Analysis. The frequencies are identified by means of Operational Modal Analysis using natural excitation. Based on the assumption of Gaussianity of the frequencies, a multi-class statistical model is developed by combining finite element model sensitivities in 10 classes of change location on the blade, the smallest area being 1/5 of the span. The method is experimentally validated for a full scale wind turbine blade in a test setup and loaded by natural wind. Mass change from natural causes was imitated with sand bags and the algorithm was observed to perform well with an experimental detection rate of 1, localization rate of 0.88 and mass estimation rate of 0.72.

  15. Hybrid statistics-simulations based method for atom-counting from ADF STEM images.

    PubMed

    De Wael, Annelies; De Backer, Annick; Jones, Lewys; Nellist, Peter D; Van Aert, Sandra

    2017-06-01

    A hybrid statistics-simulations based method for atom-counting from annular dark field scanning transmission electron microscopy (ADF STEM) images of monotype crystalline nanostructures is presented. Different atom-counting methods already exist for model-like systems. However, the increasing relevance of radiation damage in the study of nanostructures demands a method that allows atom-counting from low dose images with a low signal-to-noise ratio. Therefore, the hybrid method directly includes prior knowledge from image simulations into the existing statistics-based method for atom-counting, and accounts in this manner for possible discrepancies between actual and simulated experimental conditions. It is shown by means of simulations and experiments that this hybrid method outperforms the statistics-based method, especially for low electron doses and small nanoparticles. The analysis of a simulated low dose image of a small nanoparticle suggests that this method allows for far more reliable quantitative analysis of beam-sensitive materials. Copyright © 2017 Elsevier B.V. All rights reserved.

  16. Cloud-based solution to identify statistically significant MS peaks differentiating sample categories.

    PubMed

    Ji, Jun; Ling, Jeffrey; Jiang, Helen; Wen, Qiaojun; Whitin, John C; Tian, Lu; Cohen, Harvey J; Ling, Xuefeng B

    2013-03-23

    Mass spectrometry (MS) has evolved to become the primary high throughput tool for proteomics based biomarker discovery. Until now, multiple challenges in protein MS data analysis remain: large-scale and complex data set management; MS peak identification, indexing; and high dimensional peak differential analysis with the concurrent statistical tests based false discovery rate (FDR). "Turnkey" solutions are needed for biomarker investigations to rapidly process MS data sets to identify statistically significant peaks for subsequent validation. Here we present an efficient and effective solution, which provides experimental biologists easy access to "cloud" computing capabilities to analyze MS data. The web portal can be accessed at http://transmed.stanford.edu/ssa/. Presented web application supplies large scale MS data online uploading and analysis with a simple user interface. This bioinformatic tool will facilitate the discovery of the potential protein biomarkers using MS.

  17. Space station software reliability analysis based on failures observed during testing at the multisystem integration facility

    NASA Technical Reports Server (NTRS)

    Tamayo, Tak Chai

    1987-01-01

    Quality of software not only is vital to the successful operation of the space station, it is also an important factor in establishing testing requirements, time needed for software verification and integration as well as launching schedules for the space station. Defense of management decisions can be greatly strengthened by combining engineering judgments with statistical analysis. Unlike hardware, software has the characteristics of no wearout and costly redundancies, thus making traditional statistical analysis not suitable in evaluating reliability of software. A statistical model was developed to provide a representation of the number as well as types of failures occur during software testing and verification. From this model, quantitative measure of software reliability based on failure history during testing are derived. Criteria to terminate testing based on reliability objectives and methods to estimate the expected number of fixings required are also presented.

  18. Teaching statistics in biology: using inquiry-based learning to strengthen understanding of statistical analysis in biology laboratory courses.

    PubMed

    Metz, Anneke M

    2008-01-01

    There is an increasing need for students in the biological sciences to build a strong foundation in quantitative approaches to data analyses. Although most science, engineering, and math field majors are required to take at least one statistics course, statistical analysis is poorly integrated into undergraduate biology course work, particularly at the lower-division level. Elements of statistics were incorporated into an introductory biology course, including a review of statistics concepts and opportunity for students to perform statistical analysis in a biological context. Learning gains were measured with an 11-item statistics learning survey instrument developed for the course. Students showed a statistically significant 25% (p < 0.005) increase in statistics knowledge after completing introductory biology. Students improved their scores on the survey after completing introductory biology, even if they had previously completed an introductory statistics course (9%, improvement p < 0.005). Students retested 1 yr after completing introductory biology showed no loss of their statistics knowledge as measured by this instrument, suggesting that the use of statistics in biology course work may aid long-term retention of statistics knowledge. No statistically significant differences in learning were detected between male and female students in the study.

  19. Multivariate statistical analysis software technologies for astrophysical research involving large data bases

    NASA Technical Reports Server (NTRS)

    Djorgovski, George

    1993-01-01

    The existing and forthcoming data bases from NASA missions contain an abundance of information whose complexity cannot be efficiently tapped with simple statistical techniques. Powerful multivariate statistical methods already exist which can be used to harness much of the richness of these data. Automatic classification techniques have been developed to solve the problem of identifying known types of objects in multiparameter data sets, in addition to leading to the discovery of new physical phenomena and classes of objects. We propose an exploratory study and integration of promising techniques in the development of a general and modular classification/analysis system for very large data bases, which would enhance and optimize data management and the use of human research resource.

  20. Multivariate statistical analysis software technologies for astrophysical research involving large data bases

    NASA Technical Reports Server (NTRS)

    Djorgovski, Stanislav

    1992-01-01

    The existing and forthcoming data bases from NASA missions contain an abundance of information whose complexity cannot be efficiently tapped with simple statistical techniques. Powerful multivariate statistical methods already exist which can be used to harness much of the richness of these data. Automatic classification techniques have been developed to solve the problem of identifying known types of objects in multi parameter data sets, in addition to leading to the discovery of new physical phenomena and classes of objects. We propose an exploratory study and integration of promising techniques in the development of a general and modular classification/analysis system for very large data bases, which would enhance and optimize data management and the use of human research resources.

  1. Manipulating measurement scales in medical statistical analysis and data mining: A review of methodologies

    PubMed Central

    Marateb, Hamid Reza; Mansourian, Marjan; Adibi, Peyman; Farina, Dario

    2014-01-01

    Background: selecting the correct statistical test and data mining method depends highly on the measurement scale of data, type of variables, and purpose of the analysis. Different measurement scales are studied in details and statistical comparison, modeling, and data mining methods are studied based upon using several medical examples. We have presented two ordinal–variables clustering examples, as more challenging variable in analysis, using Wisconsin Breast Cancer Data (WBCD). Ordinal-to-Interval scale conversion example: a breast cancer database of nine 10-level ordinal variables for 683 patients was analyzed by two ordinal-scale clustering methods. The performance of the clustering methods was assessed by comparison with the gold standard groups of malignant and benign cases that had been identified by clinical tests. Results: the sensitivity and accuracy of the two clustering methods were 98% and 96%, respectively. Their specificity was comparable. Conclusion: by using appropriate clustering algorithm based on the measurement scale of the variables in the study, high performance is granted. Moreover, descriptive and inferential statistics in addition to modeling approach must be selected based on the scale of the variables. PMID:24672565

  2. Statistical flaws in design and analysis of fertility treatment studies on cryopreservation raise doubts on the conclusions

    PubMed Central

    van Gelder, P.H.A.J.M.; Nijs, M.

    2011-01-01

    Decisions about pharmacotherapy are being taken by medical doctors and authorities based on comparative studies on the use of medications. In studies on fertility treatments in particular, the methodological quality is of utmost importance in the application of evidence-based medicine and systematic reviews. Nevertheless, flaws and omissions appear quite regularly in these types of studies. Current study aims to present an overview of some of the typical statistical flaws, illustrated by a number of example studies which have been published in peer reviewed journals. Based on an investigation of eleven studies at random selected on fertility treatments with cryopreservation, it appeared that the methodological quality of these studies often did not fulfil the required statistical criteria. The following statistical flaws were identified: flaws in study design, patient selection, and units of analysis or in the definition of the primary endpoints. Other errors could be found in p-value and power calculations or in critical p-value definitions. Proper interpretation of the results and/or use of these study results in a meta analysis should therefore be conducted with care. PMID:24753877

  3. Statistical flaws in design and analysis of fertility treatment -studies on cryopreservation raise doubts on the conclusions.

    PubMed

    van Gelder, P H A J M; Nijs, M

    2011-01-01

    Decisions about pharmacotherapy are being taken by medical doctors and authorities based on comparative studies on the use of medications. In studies on fertility treatments in particular, the methodological quality is of utmost -importance in the application of evidence-based medicine and systematic reviews. Nevertheless, flaws and omissions appear quite regularly in these types of studies. Current study aims to present an overview of some of the typical statistical flaws, illustrated by a number of example studies which have been published in peer reviewed journals. Based on an investigation of eleven studies at random selected on fertility treatments with cryopreservation, it appeared that the methodological quality of these studies often did not fulfil the -required statistical criteria. The following statistical flaws were identified: flaws in study design, patient selection, and units of analysis or in the definition of the primary endpoints. Other errors could be found in p-value and power calculations or in critical p-value definitions. Proper -interpretation of the results and/or use of these study results in a meta analysis should therefore be conducted with care.

  4. ISSUES IN THE STATISTICAL ANALYSIS OF SMALL-AREA HEALTH DATA. (R825173)

    EPA Science Inventory

    The availability of geographically indexed health and population data, with advances in computing, geographical information systems and statistical methodology, have opened the way for serious exploration of small area health statistics based on routine data. Such analyses may be...

  5. The Practicality of Statistical Physics Handout Based on KKNI and the Constructivist Approach

    NASA Astrophysics Data System (ADS)

    Sari, S. Y.; Afrizon, R.

    2018-04-01

    Statistical physics lecture shows that: 1) the performance of lecturers, social climate, students’ competence and soft skills needed at work are in enough category, 2) students feel difficulties in following the lectures of statistical physics because it is abstract, 3) 40.72% of students needs more understanding in the form of repetition, practice questions and structured tasks, and 4) the depth of statistical physics material needs to be improved gradually and structured. This indicates that learning materials in accordance of The Indonesian National Qualification Framework or Kerangka Kualifikasi Nasional Indonesia (KKNI) with the appropriate learning approach are needed to help lecturers and students in lectures. The author has designed statistical physics handouts which have very valid criteria (90.89%) according to expert judgment. In addition, the practical level of handouts designed also needs to be considered in order to be easy to use, interesting and efficient in lectures. The purpose of this research is to know the practical level of statistical physics handout based on KKNI and a constructivist approach. This research is a part of research and development with 4-D model developed by Thiagarajan. This research activity has reached part of development test at Development stage. Data collection took place by using a questionnaire distributed to lecturers and students. Data analysis using descriptive data analysis techniques in the form of percentage. The analysis of the questionnaire shows that the handout of statistical physics has very practical criteria. The conclusion of this study is statistical physics handouts based on the KKNI and constructivist approach have been practically used in lectures.

  6. The Ontology of Biological and Clinical Statistics (OBCS) for standardized and reproducible statistical analysis.

    PubMed

    Zheng, Jie; Harris, Marcelline R; Masci, Anna Maria; Lin, Yu; Hero, Alfred; Smith, Barry; He, Yongqun

    2016-09-14

    Statistics play a critical role in biological and clinical research. However, most reports of scientific results in the published literature make it difficult for the reader to reproduce the statistical analyses performed in achieving those results because they provide inadequate documentation of the statistical tests and algorithms applied. The Ontology of Biological and Clinical Statistics (OBCS) is put forward here as a step towards solving this problem. The terms in OBCS including 'data collection', 'data transformation in statistics', 'data visualization', 'statistical data analysis', and 'drawing a conclusion based on data', cover the major types of statistical processes used in basic biological research and clinical outcome studies. OBCS is aligned with the Basic Formal Ontology (BFO) and extends the Ontology of Biomedical Investigations (OBI), an OBO (Open Biological and Biomedical Ontologies) Foundry ontology supported by over 20 research communities. Currently, OBCS comprehends 878 terms, representing 20 BFO classes, 403 OBI classes, 229 OBCS specific classes, and 122 classes imported from ten other OBO ontologies. We discuss two examples illustrating how the ontology is being applied. In the first (biological) use case, we describe how OBCS was applied to represent the high throughput microarray data analysis of immunological transcriptional profiles in human subjects vaccinated with an influenza vaccine. In the second (clinical outcomes) use case, we applied OBCS to represent the processing of electronic health care data to determine the associations between hospital staffing levels and patient mortality. Our case studies were designed to show how OBCS can be used for the consistent representation of statistical analysis pipelines under two different research paradigms. Other ongoing projects using OBCS for statistical data processing are also discussed. The OBCS source code and documentation are available at: https://github.com/obcs/obcs . The Ontology of Biological and Clinical Statistics (OBCS) is a community-based open source ontology in the domain of biological and clinical statistics. OBCS is a timely ontology that represents statistics-related terms and their relations in a rigorous fashion, facilitates standard data analysis and integration, and supports reproducible biological and clinical research.

  7. A Guideline to Univariate Statistical Analysis for LC/MS-Based Untargeted Metabolomics-Derived Data

    PubMed Central

    Vinaixa, Maria; Samino, Sara; Saez, Isabel; Duran, Jordi; Guinovart, Joan J.; Yanes, Oscar

    2012-01-01

    Several metabolomic software programs provide methods for peak picking, retention time alignment and quantification of metabolite features in LC/MS-based metabolomics. Statistical analysis, however, is needed in order to discover those features significantly altered between samples. By comparing the retention time and MS/MS data of a model compound to that from the altered feature of interest in the research sample, metabolites can be then unequivocally identified. This paper reports on a comprehensive overview of a workflow for statistical analysis to rank relevant metabolite features that will be selected for further MS/MS experiments. We focus on univariate data analysis applied in parallel on all detected features. Characteristics and challenges of this analysis are discussed and illustrated using four different real LC/MS untargeted metabolomic datasets. We demonstrate the influence of considering or violating mathematical assumptions on which univariate statistical test rely, using high-dimensional LC/MS datasets. Issues in data analysis such as determination of sample size, analytical variation, assumption of normality and homocedasticity, or correction for multiple testing are discussed and illustrated in the context of our four untargeted LC/MS working examples. PMID:24957762

  8. A Guideline to Univariate Statistical Analysis for LC/MS-Based Untargeted Metabolomics-Derived Data.

    PubMed

    Vinaixa, Maria; Samino, Sara; Saez, Isabel; Duran, Jordi; Guinovart, Joan J; Yanes, Oscar

    2012-10-18

    Several metabolomic software programs provide methods for peak picking, retention time alignment and quantification of metabolite features in LC/MS-based metabolomics. Statistical analysis, however, is needed in order to discover those features significantly altered between samples. By comparing the retention time and MS/MS data of a model compound to that from the altered feature of interest in the research sample, metabolites can be then unequivocally identified. This paper reports on a comprehensive overview of a workflow for statistical analysis to rank relevant metabolite features that will be selected for further MS/MS experiments. We focus on univariate data analysis applied in parallel on all detected features. Characteristics and challenges of this analysis are discussed and illustrated using four different real LC/MS untargeted metabolomic datasets. We demonstrate the influence of considering or violating mathematical assumptions on which univariate statistical test rely, using high-dimensional LC/MS datasets. Issues in data analysis such as determination of sample size, analytical variation, assumption of normality and homocedasticity, or correction for multiple testing are discussed and illustrated in the context of our four untargeted LC/MS working examples.

  9. Background Information and User’s Guide for MIL-F-9490

    DTIC Science & Technology

    1975-01-01

    requirements, although different analysis results will apply to each requirement. Basic differences between the two realibility requirements are: MIL-F-8785B...provides the rationale for establishing such limits. The specific risk analysis comprises the same data which formed the average risk analysis , except...statistical analysis will be based on statistical data taken using limited exposure Limes of components and equipment. The exposure times and resulting

  10. A Statistical Analysis of Brain Morphology Using Wild Bootstrapping

    PubMed Central

    Ibrahim, Joseph G.; Tang, Niansheng; Rowe, Daniel B.; Hao, Xuejun; Bansal, Ravi; Peterson, Bradley S.

    2008-01-01

    Methods for the analysis of brain morphology, including voxel-based morphology and surface-based morphometries, have been used to detect associations between brain structure and covariates of interest, such as diagnosis, severity of disease, age, IQ, and genotype. The statistical analysis of morphometric measures usually involves two statistical procedures: 1) invoking a statistical model at each voxel (or point) on the surface of the brain or brain subregion, followed by mapping test statistics (e.g., t test) or their associated p values at each of those voxels; 2) correction for the multiple statistical tests conducted across all voxels on the surface of the brain region under investigation. We propose the use of new statistical methods for each of these procedures. We first use a heteroscedastic linear model to test the associations between the morphological measures at each voxel on the surface of the specified subregion (e.g., cortical or subcortical surfaces) and the covariates of interest. Moreover, we develop a robust test procedure that is based on a resampling method, called wild bootstrapping. This procedure assesses the statistical significance of the associations between a measure of given brain structure and the covariates of interest. The value of this robust test procedure lies in its computationally simplicity and in its applicability to a wide range of imaging data, including data from both anatomical and functional magnetic resonance imaging (fMRI). Simulation studies demonstrate that this robust test procedure can accurately control the family-wise error rate. We demonstrate the application of this robust test procedure to the detection of statistically significant differences in the morphology of the hippocampus over time across gender groups in a large sample of healthy subjects. PMID:17649909

  11. TRAPR: R Package for Statistical Analysis and Visualization of RNA-Seq Data.

    PubMed

    Lim, Jae Hyun; Lee, Soo Youn; Kim, Ju Han

    2017-03-01

    High-throughput transcriptome sequencing, also known as RNA sequencing (RNA-Seq), is a standard technology for measuring gene expression with unprecedented accuracy. Numerous bioconductor packages have been developed for the statistical analysis of RNA-Seq data. However, these tools focus on specific aspects of the data analysis pipeline, and are difficult to appropriately integrate with one another due to their disparate data structures and processing methods. They also lack visualization methods to confirm the integrity of the data and the process. In this paper, we propose an R-based RNA-Seq analysis pipeline called TRAPR, an integrated tool that facilitates the statistical analysis and visualization of RNA-Seq expression data. TRAPR provides various functions for data management, the filtering of low-quality data, normalization, transformation, statistical analysis, data visualization, and result visualization that allow researchers to build customized analysis pipelines.

  12. Multivariate meta-analysis: a robust approach based on the theory of U-statistic.

    PubMed

    Ma, Yan; Mazumdar, Madhu

    2011-10-30

    Meta-analysis is the methodology for combining findings from similar research studies asking the same question. When the question of interest involves multiple outcomes, multivariate meta-analysis is used to synthesize the outcomes simultaneously taking into account the correlation between the outcomes. Likelihood-based approaches, in particular restricted maximum likelihood (REML) method, are commonly utilized in this context. REML assumes a multivariate normal distribution for the random-effects model. This assumption is difficult to verify, especially for meta-analysis with small number of component studies. The use of REML also requires iterative estimation between parameters, needing moderately high computation time, especially when the dimension of outcomes is large. A multivariate method of moments (MMM) is available and is shown to perform equally well to REML. However, there is a lack of information on the performance of these two methods when the true data distribution is far from normality. In this paper, we propose a new nonparametric and non-iterative method for multivariate meta-analysis on the basis of the theory of U-statistic and compare the properties of these three procedures under both normal and skewed data through simulation studies. It is shown that the effect on estimates from REML because of non-normal data distribution is marginal and that the estimates from MMM and U-statistic-based approaches are very similar. Therefore, we conclude that for performing multivariate meta-analysis, the U-statistic estimation procedure is a viable alternative to REML and MMM. Easy implementation of all three methods are illustrated by their application to data from two published meta-analysis from the fields of hip fracture and periodontal disease. We discuss ideas for future research based on U-statistic for testing significance of between-study heterogeneity and for extending the work to meta-regression setting. Copyright © 2011 John Wiley & Sons, Ltd.

  13. Statistical significance of task related deep brain EEG dynamic changes in the time-frequency domain.

    PubMed

    Chládek, J; Brázdil, M; Halámek, J; Plešinger, F; Jurák, P

    2013-01-01

    We present an off-line analysis procedure for exploring brain activity recorded from intra-cerebral electroencephalographic data (SEEG). The objective is to determine the statistical differences between different types of stimulations in the time-frequency domain. The procedure is based on computing relative signal power change and subsequent statistical analysis. An example of characteristic statistically significant event-related de/synchronization (ERD/ERS) detected across different frequency bands following different oddball stimuli is presented. The method is used for off-line functional classification of different brain areas.

  14. Coordinate based random effect size meta-analysis of neuroimaging studies.

    PubMed

    Tench, C R; Tanasescu, Radu; Constantinescu, C S; Auer, D P; Cottam, W J

    2017-06-01

    Low power in neuroimaging studies can make them difficult to interpret, and Coordinate based meta-analysis (CBMA) may go some way to mitigating this issue. CBMA has been used in many analyses to detect where published functional MRI or voxel-based morphometry studies testing similar hypotheses report significant summary results (coordinates) consistently. Only the reported coordinates and possibly t statistics are analysed, and statistical significance of clusters is determined by coordinate density. Here a method of performing coordinate based random effect size meta-analysis and meta-regression is introduced. The algorithm (ClusterZ) analyses both coordinates and reported t statistic or Z score, standardised by the number of subjects. Statistical significance is determined not by coordinate density, but by a random effects meta-analyses of reported effects performed cluster-wise using standard statistical methods and taking account of censoring inherent in the published summary results. Type 1 error control is achieved using the false cluster discovery rate (FCDR), which is based on the false discovery rate. This controls both the family wise error rate under the null hypothesis that coordinates are randomly drawn from a standard stereotaxic space, and the proportion of significant clusters that are expected under the null. Such control is necessary to avoid propagating and even amplifying the very issues motivating the meta-analysis in the first place. ClusterZ is demonstrated on both numerically simulated data and on real data from reports of grey matter loss in multiple sclerosis (MS) and syndromes suggestive of MS, and of painful stimulus in healthy controls. The software implementation is available to download and use freely. Copyright © 2017 Elsevier Inc. All rights reserved.

  15. Analysis of a Rocket Based Combined Cycle Engine during Rocket Only Operation

    NASA Technical Reports Server (NTRS)

    Smith, T. D.; Steffen, C. J., Jr.; Yungster, S.; Keller, D. J.

    1998-01-01

    The all rocket mode of operation is a critical factor in the overall performance of a rocket based combined cycle (RBCC) vehicle. However, outside of performing experiments or a full three dimensional analysis, there are no first order parametric models to estimate performance. As a result, an axisymmetric RBCC engine was used to analytically determine specific impulse efficiency values based upon both full flow and gas generator configurations. Design of experiments methodology was used to construct a test matrix and statistical regression analysis was used to build parametric models. The main parameters investigated in this study were: rocket chamber pressure, rocket exit area ratio, percent of injected secondary flow, mixer-ejector inlet area, mixer-ejector area ratio, and mixer-ejector length-to-inject diameter ratio. A perfect gas computational fluid dynamics analysis was performed to obtain values of vacuum specific impulse. Statistical regression analysis was performed based on both full flow and gas generator engine cycles. Results were also found to be dependent upon the entire cycle assumptions. The statistical regression analysis determined that there were five significant linear effects, six interactions, and one second-order effect. Two parametric models were created to provide performance assessments of an RBCC engine in the all rocket mode of operation.

  16. RooStatsCms: A tool for analysis modelling, combination and statistical studies

    NASA Astrophysics Data System (ADS)

    Piparo, D.; Schott, G.; Quast, G.

    2010-04-01

    RooStatsCms is an object oriented statistical framework based on the RooFit technology. Its scope is to allow the modelling, statistical analysis and combination of multiple search channels for new phenomena in High Energy Physics. It provides a variety of methods described in literature implemented as classes, whose design is oriented to the execution of multiple CPU intensive jobs on batch systems or on the Grid.

  17. metaCCA: summary statistics-based multivariate meta-analysis of genome-wide association studies using canonical correlation analysis.

    PubMed

    Cichonska, Anna; Rousu, Juho; Marttinen, Pekka; Kangas, Antti J; Soininen, Pasi; Lehtimäki, Terho; Raitakari, Olli T; Järvelin, Marjo-Riitta; Salomaa, Veikko; Ala-Korpela, Mika; Ripatti, Samuli; Pirinen, Matti

    2016-07-01

    A dominant approach to genetic association studies is to perform univariate tests between genotype-phenotype pairs. However, analyzing related traits together increases statistical power, and certain complex associations become detectable only when several variants are tested jointly. Currently, modest sample sizes of individual cohorts, and restricted availability of individual-level genotype-phenotype data across the cohorts limit conducting multivariate tests. We introduce metaCCA, a computational framework for summary statistics-based analysis of a single or multiple studies that allows multivariate representation of both genotype and phenotype. It extends the statistical technique of canonical correlation analysis to the setting where original individual-level records are not available, and employs a covariance shrinkage algorithm to achieve robustness.Multivariate meta-analysis of two Finnish studies of nuclear magnetic resonance metabolomics by metaCCA, using standard univariate output from the program SNPTEST, shows an excellent agreement with the pooled individual-level analysis of original data. Motivated by strong multivariate signals in the lipid genes tested, we envision that multivariate association testing using metaCCA has a great potential to provide novel insights from already published summary statistics from high-throughput phenotyping technologies. Code is available at https://github.com/aalto-ics-kepaco anna.cichonska@helsinki.fi or matti.pirinen@helsinki.fi Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.

  18. metaCCA: summary statistics-based multivariate meta-analysis of genome-wide association studies using canonical correlation analysis

    PubMed Central

    Cichonska, Anna; Rousu, Juho; Marttinen, Pekka; Kangas, Antti J.; Soininen, Pasi; Lehtimäki, Terho; Raitakari, Olli T.; Järvelin, Marjo-Riitta; Salomaa, Veikko; Ala-Korpela, Mika; Ripatti, Samuli; Pirinen, Matti

    2016-01-01

    Motivation: A dominant approach to genetic association studies is to perform univariate tests between genotype-phenotype pairs. However, analyzing related traits together increases statistical power, and certain complex associations become detectable only when several variants are tested jointly. Currently, modest sample sizes of individual cohorts, and restricted availability of individual-level genotype-phenotype data across the cohorts limit conducting multivariate tests. Results: We introduce metaCCA, a computational framework for summary statistics-based analysis of a single or multiple studies that allows multivariate representation of both genotype and phenotype. It extends the statistical technique of canonical correlation analysis to the setting where original individual-level records are not available, and employs a covariance shrinkage algorithm to achieve robustness. Multivariate meta-analysis of two Finnish studies of nuclear magnetic resonance metabolomics by metaCCA, using standard univariate output from the program SNPTEST, shows an excellent agreement with the pooled individual-level analysis of original data. Motivated by strong multivariate signals in the lipid genes tested, we envision that multivariate association testing using metaCCA has a great potential to provide novel insights from already published summary statistics from high-throughput phenotyping technologies. Availability and implementation: Code is available at https://github.com/aalto-ics-kepaco Contacts: anna.cichonska@helsinki.fi or matti.pirinen@helsinki.fi Supplementary information: Supplementary data are available at Bioinformatics online. PMID:27153689

  19. On-Line Analysis of Southern FIA Data

    Treesearch

    Michael P. Spinney; Paul C. Van Deusen; Francis A. Roesch

    2006-01-01

    The Southern On-Line Estimator (SOLE) is a web-based FIA database analysis tool designed with an emphasis on modularity. The Java-based user interface is simple and intuitive to use and the R-based analysis engine is fast and stable. Each component of the program (data retrieval, statistical analysis and output) can be individually modified to accommodate major...

  20. Data analysis report on ATS-F COMSAT millimeter wave propagation experiment, part 1. [effects of hydrometeors on ground to satellite communication

    NASA Technical Reports Server (NTRS)

    Hyde, G.

    1976-01-01

    The 13/18 GHz COMSAT Propagation Experiment (CPE) was performed to measure attenuation caused by hydrometeors along slant paths from transmitting terminals on the ground to the ATS-6 satellite. The effectiveness of site diversity in overcoming this impairment was also studied. Problems encountered in assembling a valid data base of rain induced attenuation data for statistical analysis are considered. The procedures used to obtain the various statistics are then outlined. The graphs and tables of statistical data for the 15 dual frequency (13 and 18 GHz) site diversity locations are discussed. Cumulative rain rate statistics for the Fayetteville and Boston sites based on point rainfall data collected are presented along with extrapolations of the attenuation and point rainfall data.

  1. Relationship between teacher preparedness and inquiry-based instructional practices to students' science achievement: Evidence from TIMSS 2007

    NASA Astrophysics Data System (ADS)

    Martin, Lynn A.

    The purpose of this study was to examine the relationship between teachers' self-reported preparedness for teaching science content and their instructional practices to the science achievement of eighth grade science students in the United States as demonstrated by TIMSS 2007. Six hundred eighty-seven eighth grade science teachers in the United States representing 7,377 students responded to the TIMSS 2007 questionnaire about their instructional preparedness and their instructional practices. Quantitative data were reported. Through correlation analysis, the researcher found statistically significant positive relationships emerge between eighth grade science teachers' main area of study and their self-reported beliefs about their preparedness to teach that same content area. Another correlation analysis found a statistically significant negative relationship existed between teachers' self-reported use of inquiry-based instruction and preparedness to teach chemistry, physics and earth science. Another correlation analysis discovered a statistically significant positive relationship existed between physics preparedness and student science achievement. Finally, a correlation analysis found a statistically significant positive relationship existed between science teachers' self-reported implementation of inquiry-based instructional practices and student achievement. The data findings support the conclusion that teachers who have feelings of preparedness to teach science content and implement more inquiry-based instruction and less didactic instruction produce high achieving science students. As science teachers obtain the appropriate knowledge in science content and pedagogy, science teachers will feel prepared and will implement inquiry-based instruction in science classrooms.

  2. Comprehension-Based versus Production-Based Grammar Instruction: A Meta-Analysis of Comparative Studies

    ERIC Educational Resources Information Center

    Shintani, Natsuko; Li, Shaofeng; Ellis, Rod

    2013-01-01

    This article reports a meta-analysis of studies that investigated the relative effectiveness of comprehension-based instruction (CBI) and production-based instruction (PBI). The meta-analysis only included studies that featured a direct comparison of CBI and PBI in order to ensure methodological and statistical robustness. A total of 35 research…

  3. A Weibull statistics-based lignocellulose saccharification model and a built-in parameter accurately predict lignocellulose hydrolysis performance.

    PubMed

    Wang, Mingyu; Han, Lijuan; Liu, Shasha; Zhao, Xuebing; Yang, Jinghua; Loh, Soh Kheang; Sun, Xiaomin; Zhang, Chenxi; Fang, Xu

    2015-09-01

    Renewable energy from lignocellulosic biomass has been deemed an alternative to depleting fossil fuels. In order to improve this technology, we aim to develop robust mathematical models for the enzymatic lignocellulose degradation process. By analyzing 96 groups of previously published and newly obtained lignocellulose saccharification results and fitting them to Weibull distribution, we discovered Weibull statistics can accurately predict lignocellulose saccharification data, regardless of the type of substrates, enzymes and saccharification conditions. A mathematical model for enzymatic lignocellulose degradation was subsequently constructed based on Weibull statistics. Further analysis of the mathematical structure of the model and experimental saccharification data showed the significance of the two parameters in this model. In particular, the λ value, defined the characteristic time, represents the overall performance of the saccharification system. This suggestion was further supported by statistical analysis of experimental saccharification data and analysis of the glucose production levels when λ and n values change. In conclusion, the constructed Weibull statistics-based model can accurately predict lignocellulose hydrolysis behavior and we can use the λ parameter to assess the overall performance of enzymatic lignocellulose degradation. Advantages and potential applications of the model and the λ value in saccharification performance assessment were discussed. Copyright © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  4. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bennett, Janine Camille; Thompson, David; Pebay, Philippe Pierre

    Statistical analysis is typically used to reduce the dimensionality of and infer meaning from data. A key challenge of any statistical analysis package aimed at large-scale, distributed data is to address the orthogonal issues of parallel scalability and numerical stability. Many statistical techniques, e.g., descriptive statistics or principal component analysis, are based on moments and co-moments and, using robust online update formulas, can be computed in an embarrassingly parallel manner, amenable to a map-reduce style implementation. In this paper we focus on contingency tables, through which numerous derived statistics such as joint and marginal probability, point-wise mutual information, information entropy,more » and {chi}{sup 2} independence statistics can be directly obtained. However, contingency tables can become large as data size increases, requiring a correspondingly large amount of communication between processors. This potential increase in communication prevents optimal parallel speedup and is the main difference with moment-based statistics (which we discussed in [1]) where the amount of inter-processor communication is independent of data size. Here we present the design trade-offs which we made to implement the computation of contingency tables in parallel. We also study the parallel speedup and scalability properties of our open source implementation. In particular, we observe optimal speed-up and scalability when the contingency statistics are used in their appropriate context, namely, when the data input is not quasi-diffuse.« less

  5. The analysis of morphometric data on rocky mountain wolves and artic wolves using statistical method

    NASA Astrophysics Data System (ADS)

    Ammar Shafi, Muhammad; Saifullah Rusiman, Mohd; Hamzah, Nor Shamsidah Amir; Nor, Maria Elena; Ahmad, Noor’ani; Azia Hazida Mohamad Azmi, Nur; Latip, Muhammad Faez Ab; Hilmi Azman, Ahmad

    2018-04-01

    Morphometrics is a quantitative analysis depending on the shape and size of several specimens. Morphometric quantitative analyses are commonly used to analyse fossil record, shape and size of specimens and others. The aim of the study is to find the differences between rocky mountain wolves and arctic wolves based on gender. The sample utilised secondary data which included seven variables as independent variables and two dependent variables. Statistical modelling was used in the analysis such was the analysis of variance (ANOVA) and multivariate analysis of variance (MANOVA). The results showed there exist differentiating results between arctic wolves and rocky mountain wolves based on independent factors and gender.

  6. Statistical analysis of hydrological response in urbanising catchments based on adaptive sampling using inter-amount times

    NASA Astrophysics Data System (ADS)

    ten Veldhuis, Marie-Claire; Schleiss, Marc

    2017-04-01

    In this study, we introduced an alternative approach for analysis of hydrological flow time series, using an adaptive sampling framework based on inter-amount times (IATs). The main difference with conventional flow time series is the rate at which low and high flows are sampled: the unit of analysis for IATs is a fixed flow amount, instead of a fixed time window. We analysed statistical distributions of flows and IATs across a wide range of sampling scales to investigate sensitivity of statistical properties such as quantiles, variance, skewness, scaling parameters and flashiness indicators to the sampling scale. We did this based on streamflow time series for 17 (semi)urbanised basins in North Carolina, US, ranging from 13 km2 to 238 km2 in size. Results showed that adaptive sampling of flow time series based on inter-amounts leads to a more balanced representation of low flow and peak flow values in the statistical distribution. While conventional sampling gives a lot of weight to low flows, as these are most ubiquitous in flow time series, IAT sampling gives relatively more weight to high flow values, when given flow amounts are accumulated in shorter time. As a consequence, IAT sampling gives more information about the tail of the distribution associated with high flows, while conventional sampling gives relatively more information about low flow periods. We will present results of statistical analyses across a range of subdaily to seasonal scales and will highlight some interesting insights that can be derived from IAT statistics with respect to basin flashiness and impact urbanisation on hydrological response.

  7. Connectivity-based fixel enhancement: Whole-brain statistical analysis of diffusion MRI measures in the presence of crossing fibres

    PubMed Central

    Raffelt, David A.; Smith, Robert E.; Ridgway, Gerard R.; Tournier, J-Donald; Vaughan, David N.; Rose, Stephen; Henderson, Robert; Connelly, Alan

    2015-01-01

    In brain regions containing crossing fibre bundles, voxel-average diffusion MRI measures such as fractional anisotropy (FA) are difficult to interpret, and lack within-voxel single fibre population specificity. Recent work has focused on the development of more interpretable quantitative measures that can be associated with a specific fibre population within a voxel containing crossing fibres (herein we use fixel to refer to a specific fibre population within a single voxel). Unfortunately, traditional 3D methods for smoothing and cluster-based statistical inference cannot be used for voxel-based analysis of these measures, since the local neighbourhood for smoothing and cluster formation can be ambiguous when adjacent voxels may have different numbers of fixels, or ill-defined when they belong to different tracts. Here we introduce a novel statistical method to perform whole-brain fixel-based analysis called connectivity-based fixel enhancement (CFE). CFE uses probabilistic tractography to identify structurally connected fixels that are likely to share underlying anatomy and pathology. Probabilistic connectivity information is then used for tract-specific smoothing (prior to the statistical analysis) and enhancement of the statistical map (using a threshold-free cluster enhancement-like approach). To investigate the characteristics of the CFE method, we assessed sensitivity and specificity using a large number of combinations of CFE enhancement parameters and smoothing extents, using simulated pathology generated with a range of test-statistic signal-to-noise ratios in five different white matter regions (chosen to cover a broad range of fibre bundle features). The results suggest that CFE input parameters are relatively insensitive to the characteristics of the simulated pathology. We therefore recommend a single set of CFE parameters that should give near optimal results in future studies where the group effect is unknown. We then demonstrate the proposed method by comparing apparent fibre density between motor neurone disease (MND) patients with control subjects. The MND results illustrate the benefit of fixel-specific statistical inference in white matter regions that contain crossing fibres. PMID:26004503

  8. [Comparison of application of Cochran-Armitage trend test and linear regression analysis for rate trend analysis in epidemiology study].

    PubMed

    Wang, D Z; Wang, C; Shen, C F; Zhang, Y; Zhang, H; Song, G D; Xue, X D; Xu, Z L; Zhang, S; Jiang, G H

    2017-05-10

    We described the time trend of acute myocardial infarction (AMI) from 1999 to 2013 in Tianjin incidence rate with Cochran-Armitage trend (CAT) test and linear regression analysis, and the results were compared. Based on actual population, CAT test had much stronger statistical power than linear regression analysis for both overall incidence trend and age specific incidence trend (Cochran-Armitage trend P value

  9. Handling nonnormality and variance heterogeneity for quantitative sublethal toxicity tests.

    PubMed

    Ritz, Christian; Van der Vliet, Leana

    2009-09-01

    The advantages of using regression-based techniques to derive endpoints from environmental toxicity data are clear, and slowly, this superior analytical technique is gaining acceptance. As use of regression-based analysis becomes more widespread, some of the associated nuances and potential problems come into sharper focus. Looking at data sets that cover a broad spectrum of standard test species, we noticed that some model fits to data failed to meet two key assumptions-variance homogeneity and normality-that are necessary for correct statistical analysis via regression-based techniques. Failure to meet these assumptions often is caused by reduced variance at the concentrations showing severe adverse effects. Although commonly used with linear regression analysis, transformation of the response variable only is not appropriate when fitting data using nonlinear regression techniques. Through analysis of sample data sets, including Lemna minor, Eisenia andrei (terrestrial earthworm), and algae, we show that both the so-called Box-Cox transformation and use of the Poisson distribution can help to correct variance heterogeneity and nonnormality and so allow nonlinear regression analysis to be implemented. Both the Box-Cox transformation and the Poisson distribution can be readily implemented into existing protocols for statistical analysis. By correcting for nonnormality and variance heterogeneity, these two statistical tools can be used to encourage the transition to regression-based analysis and the depreciation of less-desirable and less-flexible analytical techniques, such as linear interpolation.

  10. Power-law statistics of neurophysiological processes analyzed using short signals

    NASA Astrophysics Data System (ADS)

    Pavlova, Olga N.; Runnova, Anastasiya E.; Pavlov, Alexey N.

    2018-04-01

    We discuss the problem of quantifying power-law statistics of complex processes from short signals. Based on the analysis of electroencephalograms (EEG) we compare three interrelated approaches which enable characterization of the power spectral density (PSD) and show that an application of the detrended fluctuation analysis (DFA) or the wavelet-transform modulus maxima (WTMM) method represents a useful way of indirect characterization of the PSD features from short data sets. We conclude that despite DFA- and WTMM-based measures can be obtained from the estimated PSD, these tools outperform the standard spectral analysis when characterization of the analyzed regime should be provided based on a very limited amount of data.

  11. QPROT: Statistical method for testing differential expression using protein-level intensity data in label-free quantitative proteomics.

    PubMed

    Choi, Hyungwon; Kim, Sinae; Fermin, Damian; Tsou, Chih-Chiang; Nesvizhskii, Alexey I

    2015-11-03

    We introduce QPROT, a statistical framework and computational tool for differential protein expression analysis using protein intensity data. QPROT is an extension of the QSPEC suite, originally developed for spectral count data, adapted for the analysis using continuously measured protein-level intensity data. QPROT offers a new intensity normalization procedure and model-based differential expression analysis, both of which account for missing data. Determination of differential expression of each protein is based on the standardized Z-statistic based on the posterior distribution of the log fold change parameter, guided by the false discovery rate estimated by a well-known Empirical Bayes method. We evaluated the classification performance of QPROT using the quantification calibration data from the clinical proteomic technology assessment for cancer (CPTAC) study and a recently published Escherichia coli benchmark dataset, with evaluation of FDR accuracy in the latter. QPROT is a statistical framework with computational software tool for comparative quantitative proteomics analysis. It features various extensions of QSPEC method originally built for spectral count data analysis, including probabilistic treatment of missing values in protein intensity data. With the increasing popularity of label-free quantitative proteomics data, the proposed method and accompanying software suite will be immediately useful for many proteomics laboratories. This article is part of a Special Issue entitled: Computational Proteomics. Copyright © 2015 Elsevier B.V. All rights reserved.

  12. Multivariate analysis: A statistical approach for computations

    NASA Astrophysics Data System (ADS)

    Michu, Sachin; Kaushik, Vandana

    2014-10-01

    Multivariate analysis is a type of multivariate statistical approach commonly used in, automotive diagnosis, education evaluating clusters in finance etc and more recently in the health-related professions. The objective of the paper is to provide a detailed exploratory discussion about factor analysis (FA) in image retrieval method and correlation analysis (CA) of network traffic. Image retrieval methods aim to retrieve relevant images from a collected database, based on their content. The problem is made more difficult due to the high dimension of the variable space in which the images are represented. Multivariate correlation analysis proposes an anomaly detection and analysis method based on the correlation coefficient matrix. Anomaly behaviors in the network include the various attacks on the network like DDOs attacks and network scanning.

  13. A powerful score-based test statistic for detecting gene-gene co-association.

    PubMed

    Xu, Jing; Yuan, Zhongshang; Ji, Jiadong; Zhang, Xiaoshuai; Li, Hongkai; Wu, Xuesen; Xue, Fuzhong; Liu, Yanxun

    2016-01-29

    The genetic variants identified by Genome-wide association study (GWAS) can only account for a small proportion of the total heritability for complex disease. The existence of gene-gene joint effects which contains the main effects and their co-association is one of the possible explanations for the "missing heritability" problems. Gene-gene co-association refers to the extent to which the joint effects of two genes differ from the main effects, not only due to the traditional interaction under nearly independent condition but the correlation between genes. Generally, genes tend to work collaboratively within specific pathway or network contributing to the disease and the specific disease-associated locus will often be highly correlated (e.g. single nucleotide polymorphisms (SNPs) in linkage disequilibrium). Therefore, we proposed a novel score-based statistic (SBS) as a gene-based method for detecting gene-gene co-association. Various simulations illustrate that, under different sample sizes, marginal effects of causal SNPs and co-association levels, the proposed SBS has the better performance than other existed methods including single SNP-based and principle component analysis (PCA)-based logistic regression model, the statistics based on canonical correlations (CCU), kernel canonical correlation analysis (KCCU), partial least squares path modeling (PLSPM) and delta-square (δ (2)) statistic. The real data analysis of rheumatoid arthritis (RA) further confirmed its advantages in practice. SBS is a powerful and efficient gene-based method for detecting gene-gene co-association.

  14. Teaching Statistics in Biology: Using Inquiry-based Learning to Strengthen Understanding of Statistical Analysis in Biology Laboratory Courses

    PubMed Central

    2008-01-01

    There is an increasing need for students in the biological sciences to build a strong foundation in quantitative approaches to data analyses. Although most science, engineering, and math field majors are required to take at least one statistics course, statistical analysis is poorly integrated into undergraduate biology course work, particularly at the lower-division level. Elements of statistics were incorporated into an introductory biology course, including a review of statistics concepts and opportunity for students to perform statistical analysis in a biological context. Learning gains were measured with an 11-item statistics learning survey instrument developed for the course. Students showed a statistically significant 25% (p < 0.005) increase in statistics knowledge after completing introductory biology. Students improved their scores on the survey after completing introductory biology, even if they had previously completed an introductory statistics course (9%, improvement p < 0.005). Students retested 1 yr after completing introductory biology showed no loss of their statistics knowledge as measured by this instrument, suggesting that the use of statistics in biology course work may aid long-term retention of statistics knowledge. No statistically significant differences in learning were detected between male and female students in the study. PMID:18765754

  15. Statistical analysis of Geopotential Height (GH) timeseries based on Tsallis non-extensive statistical mechanics

    NASA Astrophysics Data System (ADS)

    Karakatsanis, L. P.; Iliopoulos, A. C.; Pavlos, E. G.; Pavlos, G. P.

    2018-02-01

    In this paper, we perform statistical analysis of time series deriving from Earth's climate. The time series are concerned with Geopotential Height (GH) and correspond to temporal and spatial components of the global distribution of month average values, during the period (1948-2012). The analysis is based on Tsallis non-extensive statistical mechanics and in particular on the estimation of Tsallis' q-triplet, namely {qstat, qsens, qrel}, the reconstructed phase space and the estimation of correlation dimension and the Hurst exponent of rescaled range analysis (R/S). The deviation of Tsallis q-triplet from unity indicates non-Gaussian (Tsallis q-Gaussian) non-extensive character with heavy tails probability density functions (PDFs), multifractal behavior and long range dependences for all timeseries considered. Also noticeable differences of the q-triplet estimation found in the timeseries at distinct local or temporal regions. Moreover, in the reconstructive phase space revealed a lower-dimensional fractal set in the GH dynamical phase space (strong self-organization) and the estimation of Hurst exponent indicated multifractality, non-Gaussianity and persistence. The analysis is giving significant information identifying and characterizing the dynamical characteristics of the earth's climate.

  16. The role of ensemble-based statistics in variational assimilation of cloud-affected observations from infrared imagers

    NASA Astrophysics Data System (ADS)

    Hacker, Joshua; Vandenberghe, Francois; Jung, Byoung-Jo; Snyder, Chris

    2017-04-01

    Effective assimilation of cloud-affected radiance observations from space-borne imagers, with the aim of improving cloud analysis and forecasting, has proven to be difficult. Large observation biases, nonlinear observation operators, and non-Gaussian innovation statistics present many challenges. Ensemble-variational data assimilation (EnVar) systems offer the benefits of flow-dependent background error statistics from an ensemble, and the ability of variational minimization to handle nonlinearity. The specific benefits of ensemble statistics, relative to static background errors more commonly used in variational systems, have not been quantified for the problem of assimilating cloudy radiances. A simple experiment framework is constructed with a regional NWP model and operational variational data assimilation system, to provide the basis understanding the importance of ensemble statistics in cloudy radiance assimilation. Restricting the observations to those corresponding to clouds in the background forecast leads to innovations that are more Gaussian. The number of large innovations is reduced compared to the more general case of all observations, but not eliminated. The Huber norm is investigated to handle the fat tails of the distributions, and allow more observations to be assimilated without the need for strict background checks that eliminate them. Comparing assimilation using only ensemble background error statistics with assimilation using only static background error statistics elucidates the importance of the ensemble statistics. Although the cost functions in both experiments converge to similar values after sufficient outer-loop iterations, the resulting cloud water, ice, and snow content are greater in the ensemble-based analysis. The subsequent forecasts from the ensemble-based analysis also retain more condensed water species, indicating that the local environment is more supportive of clouds. In this presentation we provide details that explain the apparent benefit from using ensembles for cloudy radiance assimilation in an EnVar context.

  17. Reporting quality of statistical methods in surgical observational studies: protocol for systematic review.

    PubMed

    Wu, Robert; Glen, Peter; Ramsay, Tim; Martel, Guillaume

    2014-06-28

    Observational studies dominate the surgical literature. Statistical adjustment is an important strategy to account for confounders in observational studies. Research has shown that published articles are often poor in statistical quality, which may jeopardize their conclusions. The Statistical Analyses and Methods in the Published Literature (SAMPL) guidelines have been published to help establish standards for statistical reporting.This study will seek to determine whether the quality of statistical adjustment and the reporting of these methods are adequate in surgical observational studies. We hypothesize that incomplete reporting will be found in all surgical observational studies, and that the quality and reporting of these methods will be of lower quality in surgical journals when compared with medical journals. Finally, this work will seek to identify predictors of high-quality reporting. This work will examine the top five general surgical and medical journals, based on a 5-year impact factor (2007-2012). All observational studies investigating an intervention related to an essential component area of general surgery (defined by the American Board of Surgery), with an exposure, outcome, and comparator, will be included in this systematic review. Essential elements related to statistical reporting and quality were extracted from the SAMPL guidelines and include domains such as intent of analysis, primary analysis, multiple comparisons, numbers and descriptive statistics, association and correlation analyses, linear regression, logistic regression, Cox proportional hazard analysis, analysis of variance, survival analysis, propensity analysis, and independent and correlated analyses. Each article will be scored as a proportion based on fulfilling criteria in relevant analyses used in the study. A logistic regression model will be built to identify variables associated with high-quality reporting. A comparison will be made between the scores of surgical observational studies published in medical versus surgical journals. Secondary outcomes will pertain to individual domains of analysis. Sensitivity analyses will be conducted. This study will explore the reporting and quality of statistical analyses in surgical observational studies published in the most referenced surgical and medical journals in 2013 and examine whether variables (including the type of journal) can predict high-quality reporting.

  18. On the implications of the classical ergodic theorems: analysis of developmental processes has to focus on intra-individual variation.

    PubMed

    Molenaar, Peter C M

    2008-01-01

    It is argued that general mathematical-statistical theorems imply that standard statistical analysis techniques of inter-individual variation are invalid to investigate developmental processes. Developmental processes have to be analyzed at the level of individual subjects, using time series data characterizing the patterns of intra-individual variation. It is shown that standard statistical techniques based on the analysis of inter-individual variation appear to be insensitive to the presence of arbitrary large degrees of inter-individual heterogeneity in the population. An important class of nonlinear epigenetic models of neural growth is described which can explain the occurrence of such heterogeneity in brain structures and behavior. Links with models of developmental instability are discussed. A simulation study based on a chaotic growth model illustrates the invalidity of standard analysis of inter-individual variation, whereas time series analysis of intra-individual variation is able to recover the true state of affairs. (c) 2007 Wiley Periodicals, Inc.

  19. Is math anxiety in the secondary classroom limiting physics mastery? A study of math anxiety and physics performance

    NASA Astrophysics Data System (ADS)

    Mercer, Gary J.

    This quantitative study examined the relationship between secondary students with math anxiety and physics performance in an inquiry-based constructivist classroom. The Revised Math Anxiety Rating Scale was used to evaluate math anxiety levels. The results were then compared to the performance on a physics standardized final examination. A simple correlation was performed, followed by a multivariate regression analysis to examine effects based on gender and prior math background. The correlation showed statistical significance between math anxiety and physics performance. The regression analysis showed statistical significance for math anxiety, physics performance, and prior math background, but did not show statistical significance for math anxiety, physics performance, and gender.

  20. Statistical power and optimal design in experiments in which samples of participants respond to samples of stimuli.

    PubMed

    Westfall, Jacob; Kenny, David A; Judd, Charles M

    2014-10-01

    Researchers designing experiments in which a sample of participants responds to a sample of stimuli are faced with difficult questions about optimal study design. The conventional procedures of statistical power analysis fail to provide appropriate answers to these questions because they are based on statistical models in which stimuli are not assumed to be a source of random variation in the data, models that are inappropriate for experiments involving crossed random factors of participants and stimuli. In this article, we present new methods of power analysis for designs with crossed random factors, and we give detailed, practical guidance to psychology researchers planning experiments in which a sample of participants responds to a sample of stimuli. We extensively examine 5 commonly used experimental designs, describe how to estimate statistical power in each, and provide power analysis results based on a reasonable set of default parameter values. We then develop general conclusions and formulate rules of thumb concerning the optimal design of experiments in which a sample of participants responds to a sample of stimuli. We show that in crossed designs, statistical power typically does not approach unity as the number of participants goes to infinity but instead approaches a maximum attainable power value that is possibly small, depending on the stimulus sample. We also consider the statistical merits of designs involving multiple stimulus blocks. Finally, we provide a simple and flexible Web-based power application to aid researchers in planning studies with samples of stimuli.

  1. [Regression on order statistics and its application in estimating nondetects for food exposure assessment].

    PubMed

    Yu, Xiaojin; Liu, Pei; Min, Jie; Chen, Qiguang

    2009-01-01

    To explore the application of regression on order statistics (ROS) in estimating nondetects for food exposure assessment. Regression on order statistics was adopted in analysis of cadmium residual data set from global food contaminant monitoring, the mean residual was estimated basing SAS programming and compared with the results from substitution methods. The results show that ROS method performs better obviously than substitution methods for being robust and convenient for posterior analysis. Regression on order statistics is worth to adopt,but more efforts should be make for details of application of this method.

  2. The effect of project-based learning on students' statistical literacy levels for data representation

    NASA Astrophysics Data System (ADS)

    Koparan, Timur; Güven, Bülent

    2015-07-01

    The point of this study is to define the effect of project-based learning approach on 8th Grade secondary-school students' statistical literacy levels for data representation. To achieve this goal, a test which consists of 12 open-ended questions in accordance with the views of experts was developed. Seventy 8th grade secondary-school students, 35 in the experimental group and 35 in the control group, took this test twice, one before the application and one after the application. All the raw scores were turned into linear points by using the Winsteps 3.72 modelling program that makes the Rasch analysis and t-tests, and an ANCOVA analysis was carried out with the linear points. Depending on the findings, it was concluded that the project-based learning approach increases students' level of statistical literacy for data representation. Students' levels of statistical literacy before and after the application were shown through the obtained person-item maps.

  3. A common base method for analysis of qPCR data and the application of simple blocking in qPCR experiments.

    PubMed

    Ganger, Michael T; Dietz, Geoffrey D; Ewing, Sarah J

    2017-12-01

    qPCR has established itself as the technique of choice for the quantification of gene expression. Procedures for conducting qPCR have received significant attention; however, more rigorous approaches to the statistical analysis of qPCR data are needed. Here we develop a mathematical model, termed the Common Base Method, for analysis of qPCR data based on threshold cycle values (C q ) and efficiencies of reactions (E). The Common Base Method keeps all calculations in the logscale as long as possible by working with log 10 (E) ∙ C q , which we call the efficiency-weighted C q value; subsequent statistical analyses are then applied in the logscale. We show how efficiency-weighted C q values may be analyzed using a simple paired or unpaired experimental design and develop blocking methods to help reduce unexplained variation. The Common Base Method has several advantages. It allows for the incorporation of well-specific efficiencies and multiple reference genes. The method does not necessitate the pairing of samples that must be performed using traditional analysis methods in order to calculate relative expression ratios. Our method is also simple enough to be implemented in any spreadsheet or statistical software without additional scripts or proprietary components.

  4. Statistical Power Analysis with Microsoft Excel: Normal Tests for One or Two Means as a Prelude to Using Non-Central Distributions to Calculate Power

    ERIC Educational Resources Information Center

    Texeira, Antonio; Rosa, Alvaro; Calapez, Teresa

    2009-01-01

    This article presents statistical power analysis (SPA) based on the normal distribution using Excel, adopting textbook and SPA approaches. The objective is to present the latter in a comparative way within a framework that is familiar to textbook level readers, as a first step to understand SPA with other distributions. The analysis focuses on the…

  5. Materials of acoustic analysis: sustained vowel versus sentence.

    PubMed

    Moon, Kyung Ray; Chung, Sung Min; Park, Hae Sang; Kim, Han Su

    2012-09-01

    Sustained vowel is a widely used material of acoustic analysis. However, vowel phonation does not sufficiently demonstrate sentence-based real-life phonation, and biases may occur depending on the test subjects intent during pronunciation. The purpose of this study was to investigate the differences between the results of acoustic analysis using each material. An individual prospective study. Two hundred two individuals (87 men and 115 women) with normal findings in videostroboscopy were enrolled. Acoustic analysis was done using the speech pattern element acquisition and display program. Fundamental frequency (Fx), amplitude (Ax), contact quotient (Qx), jitter, and shimmer were measured with sustained vowel-based acoustic analysis. Average fundamental frequency (FxM), average amplitude (AxM), average contact quotient (QxM), Fx perturbation (CFx), and amplitude perturbation (CAx) were measured with sentence-based acoustic analysis. Corresponding data of the two methods were compared with each other. SPSS (Statistical Package for the Social Sciences, Version 12.0; SPSS, Inc., Chicago, IL) software was used for statistical analysis. FxM was higher than Fx in men (Fx, 124.45 Hz; FxM, 133.09 Hz; P=0.000). In women, FxM seemed to be lower than Fx, but the results were not statistically significant (Fx, 210.58 Hz; FxM, 208.34 Hz; P=0.065). There was no statistical significance between Ax and AxM in both the groups. QxM was higher than Qx in men and women. Jitter was lower in men, but CFx was lower in women. Both Shimmer and CAx were higher in men. Sustained vowel phonation could not be a complete substitute for real-time phonation in acoustic analysis. Characteristics of acoustic materials should be considered when choosing the material for acoustic analysis and interpreting the results. Copyright © 2012 The Voice Foundation. Published by Mosby, Inc. All rights reserved.

  6. Collagen morphology and texture analysis: from statistics to classification

    PubMed Central

    Mostaço-Guidolin, Leila B.; Ko, Alex C.-T.; Wang, Fei; Xiang, Bo; Hewko, Mark; Tian, Ganghong; Major, Arkady; Shiomi, Masashi; Sowa, Michael G.

    2013-01-01

    In this study we present an image analysis methodology capable of quantifying morphological changes in tissue collagen fibril organization caused by pathological conditions. Texture analysis based on first-order statistics (FOS) and second-order statistics such as gray level co-occurrence matrix (GLCM) was explored to extract second-harmonic generation (SHG) image features that are associated with the structural and biochemical changes of tissue collagen networks. Based on these extracted quantitative parameters, multi-group classification of SHG images was performed. With combined FOS and GLCM texture values, we achieved reliable classification of SHG collagen images acquired from atherosclerosis arteries with >90% accuracy, sensitivity and specificity. The proposed methodology can be applied to a wide range of conditions involving collagen re-modeling, such as in skin disorders, different types of fibrosis and muscular-skeletal diseases affecting ligaments and cartilage. PMID:23846580

  7. Correlation of RNA secondary structure statistics with thermodynamic stability and applications to folding.

    PubMed

    Wu, Johnny C; Gardner, David P; Ozer, Stuart; Gutell, Robin R; Ren, Pengyu

    2009-08-28

    The accurate prediction of the secondary and tertiary structure of an RNA with different folding algorithms is dependent on several factors, including the energy functions. However, an RNA higher-order structure cannot be predicted accurately from its sequence based on a limited set of energy parameters. The inter- and intramolecular forces between this RNA and other small molecules and macromolecules, in addition to other factors in the cell such as pH, ionic strength, and temperature, influence the complex dynamics associated with transition of a single stranded RNA to its secondary and tertiary structure. Since all of the factors that affect the formation of an RNAs 3D structure cannot be determined experimentally, statistically derived potential energy has been used in the prediction of protein structure. In the current work, we evaluate the statistical free energy of various secondary structure motifs, including base-pair stacks, hairpin loops, and internal loops, using their statistical frequency obtained from the comparative analysis of more than 50,000 RNA sequences stored in the RNA Comparative Analysis Database (rCAD) at the Comparative RNA Web (CRW) Site. Statistical energy was computed from the structural statistics for several datasets. While the statistical energy for a base-pair stack correlates with experimentally derived free energy values, suggesting a Boltzmann-like distribution, variation is observed between different molecules and their location on the phylogenetic tree of life. Our statistical energy values calculated for several structural elements were utilized in the Mfold RNA-folding algorithm. The combined statistical energy values for base-pair stacks, hairpins and internal loop flanks result in a significant improvement in the accuracy of secondary structure prediction; the hairpin flanks contribute the most.

  8. Hydration sites of unpaired RNA bases: a statistical analysis of the PDB structures.

    PubMed

    Kirillova, Svetlana; Carugo, Oliviero

    2011-10-19

    Hydration is crucial for RNA structure and function. X-ray crystallography is the most commonly used method to determine RNA structures and hydration and, therefore, statistical surveys are based on crystallographic results, the number of which is quickly increasing. A statistical analysis of the water molecule distribution in high-resolution X-ray structures of unpaired RNA nucleotides showed that: different bases have the same penchant to be surrounded by water molecules; clusters of water molecules indicate possible hydration sites, which, in some cases, match those of the major and minor grooves of RNA and DNA double helices; complex hydrogen bond networks characterize the solvation of the nucleotides, resulting in a significant rigidity of the base and its surrounding water molecules. Interestingly, the hydration sites around unpaired RNA bases do not match, in general, the positions that are occupied by the second nucleotide when the base-pair is formed. The hydration sites around unpaired RNA bases were found. They do not replicate the atom positions of complementary bases in the Watson-Crick pairs.

  9. Hydration sites of unpaired RNA bases: a statistical analysis of the PDB structures

    PubMed Central

    2011-01-01

    Background Hydration is crucial for RNA structure and function. X-ray crystallography is the most commonly used method to determine RNA structures and hydration and, therefore, statistical surveys are based on crystallographic results, the number of which is quickly increasing. Results A statistical analysis of the water molecule distribution in high-resolution X-ray structures of unpaired RNA nucleotides showed that: different bases have the same penchant to be surrounded by water molecules; clusters of water molecules indicate possible hydration sites, which, in some cases, match those of the major and minor grooves of RNA and DNA double helices; complex hydrogen bond networks characterize the solvation of the nucleotides, resulting in a significant rigidity of the base and its surrounding water molecules. Interestingly, the hydration sites around unpaired RNA bases do not match, in general, the positions that are occupied by the second nucleotide when the base-pair is formed. Conclusions The hydration sites around unpaired RNA bases were found. They do not replicate the atom positions of complementary bases in the Watson-Crick pairs. PMID:22011380

  10. A Bootstrap Generalization of Modified Parallel Analysis for IRT Dimensionality Assessment

    ERIC Educational Resources Information Center

    Finch, Holmes; Monahan, Patrick

    2008-01-01

    This article introduces a bootstrap generalization to the Modified Parallel Analysis (MPA) method of test dimensionality assessment using factor analysis. This methodology, based on the use of Marginal Maximum Likelihood nonlinear factor analysis, provides for the calculation of a test statistic based on a parametric bootstrap using the MPA…

  11. Markov Random Fields, Stochastic Quantization and Image Analysis

    DTIC Science & Technology

    1990-01-01

    Markov random fields based on the lattice Z2 have been extensively used in image analysis in a Bayesian framework as a-priori models for the...of Image Analysis can be given some fundamental justification then there is a remarkable connection between Probabilistic Image Analysis , Statistical Mechanics and Lattice-based Euclidean Quantum Field Theory.

  12. Approaching Career Criminals With An Intelligence Cycle

    DTIC Science & Technology

    2015-12-01

    including arrest statistics and “arrest statistics have been used as the main barometer of juvenile delinquent activity, (but) many juvenile... Statistical Briefing Book,” 187. 26 guided by theories about the causes of delinquent behavior, but there was no determination if those efforts achieved the...children.”110 However, the most evidence-based comparison of juvenile delinquency reduction programs is the statistical meta-analysis (a systematic

  13. Root Cause Analysis of Quality Defects Using HPLC-MS Fingerprint Knowledgebase for Batch-to-batch Quality Control of Herbal Drugs.

    PubMed

    Yan, Binjun; Fang, Zhonghua; Shen, Lijuan; Qu, Haibin

    2015-01-01

    The batch-to-batch quality consistency of herbal drugs has always been an important issue. To propose a methodology for batch-to-batch quality control based on HPLC-MS fingerprints and process knowledgebase. The extraction process of Compound E-jiao Oral Liquid was taken as a case study. After establishing the HPLC-MS fingerprint analysis method, the fingerprints of the extract solutions produced under normal and abnormal operation conditions were obtained. Multivariate statistical models were built for fault detection and a discriminant analysis model was built using the probabilistic discriminant partial-least-squares method for fault diagnosis. Based on multivariate statistical analysis, process knowledge was acquired and the cause-effect relationship between process deviations and quality defects was revealed. The quality defects were detected successfully by multivariate statistical control charts and the type of process deviations were diagnosed correctly by discriminant analysis. This work has demonstrated the benefits of combining HPLC-MS fingerprints, process knowledge and multivariate analysis for the quality control of herbal drugs. Copyright © 2015 John Wiley & Sons, Ltd.

  14. Adverse effects of metallic artifacts on voxel-wise analysis and tract-based spatial statistics in diffusion tensor imaging.

    PubMed

    Goto, Masami; Abe, Osamu; Hata, Junichi; Fukunaga, Issei; Shimoji, Keigo; Kunimatsu, Akira; Gomi, Tsutomu

    2017-02-01

    Background Diffusion tensor imaging (DTI) is a magnetic resonance imaging (MRI) technique that reflects the Brownian motion of water molecules constrained within brain tissue. Fractional anisotropy (FA) is one of the most commonly measured DTI parameters, and can be applied to quantitative analysis of white matter as tract-based spatial statistics (TBSS) and voxel-wise analysis. Purpose To show an association between metallic implants and the results of statistical analysis (voxel-wise group comparison and TBSS) for fractional anisotropy (FA) mapping, in DTI of healthy adults. Material and Methods Sixteen healthy volunteers were scanned with 3-Tesla MRI. A magnetic keeper type of dental implant was used as the metallic implant. DTI was acquired three times in each participant: (i) without a magnetic keeper (FAnon1); (ii) with a magnetic keeper (FAimp); and (iii) without a magnetic keeper (FAnon2) as reproducibility of FAnon1. Group comparisons with paired t-test were performed as FAnon1 vs. FAnon2, and as FAnon1 vs. FAimp. Results Regions of significantly reduced and increased local FA values were revealed by voxel-wise group comparison analysis (a P value of less than 0.05, corrected with family-wise error), but not by TBSS. Conclusion Metallic implants existing outside the field of view produce artifacts that affect the statistical analysis (voxel-wise group comparisons) for FA mapping. When statistical analysis for FA mapping is conducted by researchers, it is important to pay attention to any dental implants present in the mouths of the participants.

  15. Mapping Quantitative Traits in Unselected Families: Algorithms and Examples

    PubMed Central

    Dupuis, Josée; Shi, Jianxin; Manning, Alisa K.; Benjamin, Emelia J.; Meigs, James B.; Cupples, L. Adrienne; Siegmund, David

    2009-01-01

    Linkage analysis has been widely used to identify from family data genetic variants influencing quantitative traits. Common approaches have both strengths and limitations. Likelihood ratio tests typically computed in variance component analysis can accommodate large families but are highly sensitive to departure from normality assumptions. Regression-based approaches are more robust but their use has primarily been restricted to nuclear families. In this paper, we develop methods for mapping quantitative traits in moderately large pedigrees. Our methods are based on the score statistic which in contrast to the likelihood ratio statistic, can use nonparametric estimators of variability to achieve robustness of the false positive rate against departures from the hypothesized phenotypic model. Because the score statistic is easier to calculate than the likelihood ratio statistic, our basic mapping methods utilize relatively simple computer code that performs statistical analysis on output from any program that computes estimates of identity-by-descent. This simplicity also permits development and evaluation of methods to deal with multivariate and ordinal phenotypes, and with gene-gene and gene-environment interaction. We demonstrate our methods on simulated data and on fasting insulin, a quantitative trait measured in the Framingham Heart Study. PMID:19278016

  16. Primer of statistics in dental research: part I.

    PubMed

    Shintani, Ayumi

    2014-01-01

    Statistics play essential roles in evidence-based dentistry (EBD) practice and research. It ranges widely from formulating scientific questions, designing studies, collecting and analyzing data to interpreting, reporting, and presenting study findings. Mastering statistical concepts appears to be an unreachable goal among many dental researchers in part due to statistical authorities' limitations of explaining statistical principles to health researchers without elaborating complex mathematical concepts. This series of 2 articles aim to introduce dental researchers to 9 essential topics in statistics to conduct EBD with intuitive examples. The part I of the series includes the first 5 topics (1) statistical graph, (2) how to deal with outliers, (3) p-value and confidence interval, (4) testing equivalence, and (5) multiplicity adjustment. Part II will follow to cover the remaining topics including (6) selecting the proper statistical tests, (7) repeated measures analysis, (8) epidemiological consideration for causal association, and (9) analysis of agreement. Copyright © 2014. Published by Elsevier Ltd.

  17. ASCS online fault detection and isolation based on an improved MPCA

    NASA Astrophysics Data System (ADS)

    Peng, Jianxin; Liu, Haiou; Hu, Yuhui; Xi, Junqiang; Chen, Huiyan

    2014-09-01

    Multi-way principal component analysis (MPCA) has received considerable attention and been widely used in process monitoring. A traditional MPCA algorithm unfolds multiple batches of historical data into a two-dimensional matrix and cut the matrix along the time axis to form subspaces. However, low efficiency of subspaces and difficult fault isolation are the common disadvantages for the principal component model. This paper presents a new subspace construction method based on kernel density estimation function that can effectively reduce the storage amount of the subspace information. The MPCA model and the knowledge base are built based on the new subspace. Then, fault detection and isolation with the squared prediction error (SPE) statistic and the Hotelling ( T 2) statistic are also realized in process monitoring. When a fault occurs, fault isolation based on the SPE statistic is achieved by residual contribution analysis of different variables. For fault isolation of subspace based on the T 2 statistic, the relationship between the statistic indicator and state variables is constructed, and the constraint conditions are presented to check the validity of fault isolation. Then, to improve the robustness of fault isolation to unexpected disturbances, the statistic method is adopted to set the relation between single subspace and multiple subspaces to increase the corrective rate of fault isolation. Finally fault detection and isolation based on the improved MPCA is used to monitor the automatic shift control system (ASCS) to prove the correctness and effectiveness of the algorithm. The research proposes a new subspace construction method to reduce the required storage capacity and to prove the robustness of the principal component model, and sets the relationship between the state variables and fault detection indicators for fault isolation.

  18. Categorical data processing for real estate objects valuation using statistical analysis

    NASA Astrophysics Data System (ADS)

    Parygin, D. S.; Malikov, V. P.; Golubev, A. V.; Sadovnikova, N. P.; Petrova, T. M.; Finogeev, A. G.

    2018-05-01

    Theoretical and practical approaches to the use of statistical methods for studying various properties of infrastructure objects are analyzed in the paper. Methods of forecasting the value of objects are considered. A method for coding categorical variables describing properties of real estate objects is proposed. The analysis of the results of modeling the price of real estate objects using regression analysis and an algorithm based on a comparative approach is carried out.

  19. P-MartCancer-Interactive Online Software to Enable Analysis of Shotgun Cancer Proteomic Datasets.

    PubMed

    Webb-Robertson, Bobbie-Jo M; Bramer, Lisa M; Jensen, Jeffrey L; Kobold, Markus A; Stratton, Kelly G; White, Amanda M; Rodland, Karin D

    2017-11-01

    P-MartCancer is an interactive web-based software environment that enables statistical analyses of peptide or protein data, quantitated from mass spectrometry-based global proteomics experiments, without requiring in-depth knowledge of statistical programming. P-MartCancer offers a series of statistical modules associated with quality assessment, peptide and protein statistics, protein quantification, and exploratory data analyses driven by the user via customized workflows and interactive visualization. Currently, P-MartCancer offers access and the capability to analyze multiple cancer proteomic datasets generated through the Clinical Proteomics Tumor Analysis Consortium at the peptide, gene, and protein levels. P-MartCancer is deployed as a web service (https://pmart.labworks.org/cptac.html), alternatively available via Docker Hub (https://hub.docker.com/r/pnnl/pmart-web/). Cancer Res; 77(21); e47-50. ©2017 AACR . ©2017 American Association for Cancer Research.

  20. Early Millennials: The Sophomore Class of 2002 a Decade Later. Statistical Analysis Report. NCES 2017-437

    ERIC Educational Resources Information Center

    Chen, Xianglei; Lauff, Erich; Arbeit, Caren A.; Henke, Robin; Skomsvold, Paul; Hufford, Justine

    2017-01-01

    This Statistical Analysis Report tracks a cohort of 2002 high school sophomores over 10 years, examining the extent to which cohort members had reached such life course milestones as finishing school, starting a job, leaving home, getting married, and having children. The analyses in this report are based on data from the Education Longitudinal…

  1. Metamodels for Computer-Based Engineering Design: Survey and Recommendations

    NASA Technical Reports Server (NTRS)

    Simpson, Timothy W.; Peplinski, Jesse; Koch, Patrick N.; Allen, Janet K.

    1997-01-01

    The use of statistical techniques to build approximations of expensive computer analysis codes pervades much of todays engineering design. These statistical approximations, or metamodels, are used to replace the actual expensive computer analyses, facilitating multidisciplinary, multiobjective optimization and concept exploration. In this paper we review several of these techniques including design of experiments, response surface methodology, Taguchi methods, neural networks, inductive learning, and kriging. We survey their existing application in engineering design and then address the dangers of applying traditional statistical techniques to approximate deterministic computer analysis codes. We conclude with recommendations for the appropriate use of statistical approximation techniques in given situations and how common pitfalls can be avoided.

  2. A Third Moment Adjusted Test Statistic for Small Sample Factor Analysis.

    PubMed

    Lin, Johnny; Bentler, Peter M

    2012-01-01

    Goodness of fit testing in factor analysis is based on the assumption that the test statistic is asymptotically chi-square; but this property may not hold in small samples even when the factors and errors are normally distributed in the population. Robust methods such as Browne's asymptotically distribution-free method and Satorra Bentler's mean scaling statistic were developed under the presumption of non-normality in the factors and errors. This paper finds new application to the case where factors and errors are normally distributed in the population but the skewness of the obtained test statistic is still high due to sampling error in the observed indicators. An extension of Satorra Bentler's statistic is proposed that not only scales the mean but also adjusts the degrees of freedom based on the skewness of the obtained test statistic in order to improve its robustness under small samples. A simple simulation study shows that this third moment adjusted statistic asymptotically performs on par with previously proposed methods, and at a very small sample size offers superior Type I error rates under a properly specified model. Data from Mardia, Kent and Bibby's study of students tested for their ability in five content areas that were either open or closed book were used to illustrate the real-world performance of this statistic.

  3. Why Are People Bad at Detecting Randomness? A Statistical Argument

    ERIC Educational Resources Information Center

    Williams, Joseph J.; Griffiths, Thomas L.

    2013-01-01

    Errors in detecting randomness are often explained in terms of biases and misconceptions. We propose and provide evidence for an account that characterizes the contribution of the inherent statistical difficulty of the task. Our account is based on a Bayesian statistical analysis, focusing on the fact that a random process is a special case of…

  4. Statistical performance and information content of time lag analysis and redundancy analysis in time series modeling.

    PubMed

    Angeler, David G; Viedma, Olga; Moreno, José M

    2009-11-01

    Time lag analysis (TLA) is a distance-based approach used to study temporal dynamics of ecological communities by measuring community dissimilarity over increasing time lags. Despite its increased use in recent years, its performance in comparison with other more direct methods (i.e., canonical ordination) has not been evaluated. This study fills this gap using extensive simulations and real data sets from experimental temporary ponds (true zooplankton communities) and landscape studies (landscape categories as pseudo-communities) that differ in community structure and anthropogenic stress history. Modeling time with a principal coordinate of neighborhood matrices (PCNM) approach, the canonical ordination technique (redundancy analysis; RDA) consistently outperformed the other statistical tests (i.e., TLAs, Mantel test, and RDA based on linear time trends) using all real data. In addition, the RDA-PCNM revealed different patterns of temporal change, and the strength of each individual time pattern, in terms of adjusted variance explained, could be evaluated, It also identified species contributions to these patterns of temporal change. This additional information is not provided by distance-based methods. The simulation study revealed better Type I error properties of the canonical ordination techniques compared with the distance-based approaches when no deterministic component of change was imposed on the communities. The simulation also revealed that strong emphasis on uniform deterministic change and low variability at other temporal scales is needed to result in decreased statistical power of the RDA-PCNM approach relative to the other methods. Based on the statistical performance of and information content provided by RDA-PCNM models, this technique serves ecologists as a powerful tool for modeling temporal change of ecological (pseudo-) communities.

  5. AMOVA ["Accumulative Manifold Validation Analysis"]: An Advanced Statistical Methodology Designed to Measure and Test the Validity, Reliability, and Overall Efficacy of Inquiry-Based Psychometric Instruments

    ERIC Educational Resources Information Center

    Osler, James Edward, II

    2015-01-01

    This monograph provides an epistemological rational for the Accumulative Manifold Validation Analysis [also referred by the acronym "AMOVA"] statistical methodology designed to test psychometric instruments. This form of inquiry is a form of mathematical optimization in the discipline of linear stochastic modelling. AMOVA is an in-depth…

  6. Dark matter constraints from a joint analysis of dwarf Spheroidal galaxy observations with VERITAS

    DOE PAGES

    Archambault, S.; Archer, A.; Benbow, W.; ...

    2017-04-05

    We present constraints on the annihilation cross section of weakly interacting massive particles dark matter based on the joint statistical analysis of four dwarf galaxies with VERITAS. These results are derived from an optimized photon weighting statistical technique that improves on standard imaging atmospheric Cherenkov telescope (IACT) analyses by utilizing the spectral and spatial properties of individual photon events.

  7. Comparing the Fit of Item Response Theory and Factor Analysis Models

    ERIC Educational Resources Information Center

    Maydeu-Olivares, Alberto; Cai, Li; Hernandez, Adolfo

    2011-01-01

    Linear factor analysis (FA) models can be reliably tested using test statistics based on residual covariances. We show that the same statistics can be used to reliably test the fit of item response theory (IRT) models for ordinal data (under some conditions). Hence, the fit of an FA model and of an IRT model to the same data set can now be…

  8. Finding Groups Using Model-Based Cluster Analysis: Heterogeneous Emotional Self-Regulatory Processes and Heavy Alcohol Use Risk

    ERIC Educational Resources Information Center

    Mun, Eun Young; von Eye, Alexander; Bates, Marsha E.; Vaschillo, Evgeny G.

    2008-01-01

    Model-based cluster analysis is a new clustering procedure to investigate population heterogeneity utilizing finite mixture multivariate normal densities. It is an inferentially based, statistically principled procedure that allows comparison of nonnested models using the Bayesian information criterion to compare multiple models and identify the…

  9. A comparison of performance of automatic cloud coverage assessment algorithm for Formosat-2 image using clustering-based and spatial thresholding methods

    NASA Astrophysics Data System (ADS)

    Hsu, Kuo-Hsien

    2012-11-01

    Formosat-2 image is a kind of high-spatial-resolution (2 meters GSD) remote sensing satellite data, which includes one panchromatic band and four multispectral bands (Blue, Green, Red, near-infrared). An essential sector in the daily processing of received Formosat-2 image is to estimate the cloud statistic of image using Automatic Cloud Coverage Assessment (ACCA) algorithm. The information of cloud statistic of image is subsequently recorded as an important metadata for image product catalog. In this paper, we propose an ACCA method with two consecutive stages: preprocessing and post-processing analysis. For pre-processing analysis, the un-supervised K-means classification, Sobel's method, thresholding method, non-cloudy pixels reexamination, and cross-band filter method are implemented in sequence for cloud statistic determination. For post-processing analysis, Box-Counting fractal method is implemented. In other words, the cloud statistic is firstly determined via pre-processing analysis, the correctness of cloud statistic of image of different spectral band is eventually cross-examined qualitatively and quantitatively via post-processing analysis. The selection of an appropriate thresholding method is very critical to the result of ACCA method. Therefore, in this work, We firstly conduct a series of experiments of the clustering-based and spatial thresholding methods that include Otsu's, Local Entropy(LE), Joint Entropy(JE), Global Entropy(GE), and Global Relative Entropy(GRE) method, for performance comparison. The result shows that Otsu's and GE methods both perform better than others for Formosat-2 image. Additionally, our proposed ACCA method by selecting Otsu's method as the threshoding method has successfully extracted the cloudy pixels of Formosat-2 image for accurate cloud statistic estimation.

  10. Augmenting Latent Dirichlet Allocation and Rank Threshold Detection with Ontologies

    DTIC Science & Technology

    2010-03-01

    Probabilistic Latent Semantic Indexing (PLSI) is an automated indexing information retrieval model [20]. It is based on a statistical latent class model which is...uses a statistical foundation that is more accurate in finding hidden semantic relationships [20]. The model uses factor analysis of count data, number...principle of statistical infer- ence which asserts that all of the information in a sample is contained in the likelihood function [20]. The statistical

  11. Using assemblage data in ecological indicators: A comparison and evaluation of commonly available statistical tools

    USGS Publications Warehouse

    Smith, Joseph M.; Mather, Martha E.

    2012-01-01

    Ecological indicators are science-based tools used to assess how human activities have impacted environmental resources. For monitoring and environmental assessment, existing species assemblage data can be used to make these comparisons through time or across sites. An impediment to using assemblage data, however, is that these data are complex and need to be simplified in an ecologically meaningful way. Because multivariate statistics are mathematical relationships, statistical groupings may not make ecological sense and will not have utility as indicators. Our goal was to define a process to select defensible and ecologically interpretable statistical simplifications of assemblage data in which researchers and managers can have confidence. For this, we chose a suite of statistical methods, compared the groupings that resulted from these analyses, identified convergence among groupings, then we interpreted the groupings using species and ecological guilds. When we tested this approach using a statewide stream fish dataset, not all statistical methods worked equally well. For our dataset, logistic regression (Log), detrended correspondence analysis (DCA), cluster analysis (CL), and non-metric multidimensional scaling (NMDS) provided consistent, simplified output. Specifically, the Log, DCA, CL-1, and NMDS-1 groupings were ≥60% similar to each other, overlapped with the fluvial-specialist ecological guild, and contained a common subset of species. Groupings based on number of species (e.g., Log, DCA, CL and NMDS) outperformed groupings based on abundance [e.g., principal components analysis (PCA) and Poisson regression]. Although the specific methods that worked on our test dataset have generality, here we are advocating a process (e.g., identifying convergent groupings with redundant species composition that are ecologically interpretable) rather than the automatic use of any single statistical tool. We summarize this process in step-by-step guidance for the future use of these commonly available ecological and statistical methods in preparing assemblage data for use in ecological indicators.

  12. SWToolbox: A surface-water tool-box for statistical analysis of streamflow time series

    USGS Publications Warehouse

    Kiang, Julie E.; Flynn, Kate; Zhai, Tong; Hummel, Paul; Granato, Gregory

    2018-03-07

    This report is a user guide for the low-flow analysis methods provided with version 1.0 of the Surface Water Toolbox (SWToolbox) computer program. The software combines functionality from two software programs—U.S. Geological Survey (USGS) SWSTAT and U.S. Environmental Protection Agency (EPA) DFLOW. Both of these programs have been used primarily for computation of critical low-flow statistics. The main analysis methods are the computation of hydrologic frequency statistics such as the 7-day minimum flow that occurs on average only once every 10 years (7Q10), computation of design flows including biologically based flows, and computation of flow-duration curves and duration hydrographs. Other annual, monthly, and seasonal statistics can also be computed. The interface facilitates retrieval of streamflow discharge data from the USGS National Water Information System and outputs text reports for a record of the analysis. Tools for graphing data and screening tests are available to assist the analyst in conducting the analysis.

  13. Quantification of integrated HIV DNA by repetitive-sampling Alu-HIV PCR on the basis of poisson statistics.

    PubMed

    De Spiegelaere, Ward; Malatinkova, Eva; Lynch, Lindsay; Van Nieuwerburgh, Filip; Messiaen, Peter; O'Doherty, Una; Vandekerckhove, Linos

    2014-06-01

    Quantification of integrated proviral HIV DNA by repetitive-sampling Alu-HIV PCR is a candidate virological tool to monitor the HIV reservoir in patients. However, the experimental procedures and data analysis of the assay are complex and hinder its widespread use. Here, we provide an improved and simplified data analysis method by adopting binomial and Poisson statistics. A modified analysis method on the basis of Poisson statistics was used to analyze the binomial data of positive and negative reactions from a 42-replicate Alu-HIV PCR by use of dilutions of an integration standard and on samples of 57 HIV-infected patients. Results were compared with the quantitative output of the previously described Alu-HIV PCR method. Poisson-based quantification of the Alu-HIV PCR was linearly correlated with the standard dilution series, indicating that absolute quantification with the Poisson method is a valid alternative for data analysis of repetitive-sampling Alu-HIV PCR data. Quantitative outputs of patient samples assessed by the Poisson method correlated with the previously described Alu-HIV PCR analysis, indicating that this method is a valid alternative for quantifying integrated HIV DNA. Poisson-based analysis of the Alu-HIV PCR data enables absolute quantification without the need of a standard dilution curve. Implementation of the CI estimation permits improved qualitative analysis of the data and provides a statistical basis for the required minimal number of technical replicates. © 2014 The American Association for Clinical Chemistry.

  14. Novel Image Encryption Scheme Based on Chebyshev Polynomial and Duffing Map

    PubMed Central

    2014-01-01

    We present a novel image encryption algorithm using Chebyshev polynomial based on permutation and substitution and Duffing map based on substitution. Comprehensive security analysis has been performed on the designed scheme using key space analysis, visual testing, histogram analysis, information entropy calculation, correlation coefficient analysis, differential analysis, key sensitivity test, and speed test. The study demonstrates that the proposed image encryption algorithm shows advantages of more than 10113 key space and desirable level of security based on the good statistical results and theoretical arguments. PMID:25143970

  15. Clustering, randomness and regularity in cloud fields. I - Theoretical considerations. II - Cumulus cloud fields

    NASA Technical Reports Server (NTRS)

    Weger, R. C.; Lee, J.; Zhu, Tianri; Welch, R. M.

    1992-01-01

    The current controversy existing in reference to the regularity vs. clustering in cloud fields is examined by means of analysis and simulation studies based upon nearest-neighbor cumulative distribution statistics. It is shown that the Poisson representation of random point processes is superior to pseudorandom-number-generated models and that pseudorandom-number-generated models bias the observed nearest-neighbor statistics towards regularity. Interpretation of this nearest-neighbor statistics is discussed for many cases of superpositions of clustering, randomness, and regularity. A detailed analysis is carried out of cumulus cloud field spatial distributions based upon Landsat, AVHRR, and Skylab data, showing that, when both large and small clouds are included in the cloud field distributions, the cloud field always has a strong clustering signal.

  16. Gene-Based Association Analysis for Censored Traits Via Fixed Effect Functional Regressions.

    PubMed

    Fan, Ruzong; Wang, Yifan; Yan, Qi; Ding, Ying; Weeks, Daniel E; Lu, Zhaohui; Ren, Haobo; Cook, Richard J; Xiong, Momiao; Swaroop, Anand; Chew, Emily Y; Chen, Wei

    2016-02-01

    Genetic studies of survival outcomes have been proposed and conducted recently, but statistical methods for identifying genetic variants that affect disease progression are rarely developed. Motivated by our ongoing real studies, here we develop Cox proportional hazard models using functional regression (FR) to perform gene-based association analysis of survival traits while adjusting for covariates. The proposed Cox models are fixed effect models where the genetic effects of multiple genetic variants are assumed to be fixed. We introduce likelihood ratio test (LRT) statistics to test for associations between the survival traits and multiple genetic variants in a genetic region. Extensive simulation studies demonstrate that the proposed Cox RF LRT statistics have well-controlled type I error rates. To evaluate power, we compare the Cox FR LRT with the previously developed burden test (BT) in a Cox model and sequence kernel association test (SKAT), which is based on mixed effect Cox models. The Cox FR LRT statistics have higher power than or similar power as Cox SKAT LRT except when 50%/50% causal variants had negative/positive effects and all causal variants are rare. In addition, the Cox FR LRT statistics have higher power than Cox BT LRT. The models and related test statistics can be useful in the whole genome and whole exome association studies. An age-related macular degeneration dataset was analyzed as an example. © 2016 WILEY PERIODICALS, INC.

  17. Gene-based Association Analysis for Censored Traits Via Fixed Effect Functional Regressions

    PubMed Central

    Fan, Ruzong; Wang, Yifan; Yan, Qi; Ding, Ying; Weeks, Daniel E.; Lu, Zhaohui; Ren, Haobo; Cook, Richard J; Xiong, Momiao; Swaroop, Anand; Chew, Emily Y.; Chen, Wei

    2015-01-01

    Summary Genetic studies of survival outcomes have been proposed and conducted recently, but statistical methods for identifying genetic variants that affect disease progression are rarely developed. Motivated by our ongoing real studies, we develop here Cox proportional hazard models using functional regression (FR) to perform gene-based association analysis of survival traits while adjusting for covariates. The proposed Cox models are fixed effect models where the genetic effects of multiple genetic variants are assumed to be fixed. We introduce likelihood ratio test (LRT) statistics to test for associations between the survival traits and multiple genetic variants in a genetic region. Extensive simulation studies demonstrate that the proposed Cox RF LRT statistics have well-controlled type I error rates. To evaluate power, we compare the Cox FR LRT with the previously developed burden test (BT) in a Cox model and sequence kernel association test (SKAT) which is based on mixed effect Cox models. The Cox FR LRT statistics have higher power than or similar power as Cox SKAT LRT except when 50%/50% causal variants had negative/positive effects and all causal variants are rare. In addition, the Cox FR LRT statistics have higher power than Cox BT LRT. The models and related test statistics can be useful in the whole genome and whole exome association studies. An age-related macular degeneration dataset was analyzed as an example. PMID:26782979

  18. A critique of Rasch residual fit statistics.

    PubMed

    Karabatsos, G

    2000-01-01

    In test analysis involving the Rasch model, a large degree of importance is placed on the "objective" measurement of individual abilities and item difficulties. The degree to which the objectivity properties are attained, of course, depends on the degree to which the data fit the Rasch model. It is therefore important to utilize fit statistics that accurately and reliably detect the person-item response inconsistencies that threaten the measurement objectivity of persons and items. Given this argument, it is somewhat surprising that there is far more emphasis placed in the objective measurement of person and items than there is in the measurement quality of Rasch fit statistics. This paper provides a critical analysis of the residual fit statistics of the Rasch model, arguably the most often used fit statistics, in an effort to illustrate that the task of Rasch fit analysis is not as simple and straightforward as it appears to be. The faulty statistical properties of the residual fit statistics do not allow either a convenient or a straightforward approach to Rasch fit analysis. For instance, given a residual fit statistic, the use of a single minimum critical value for misfit diagnosis across different testing situations, where the situations vary in sample and test properties, leads to both the overdetection and underdetection of misfit. To improve this situation, it is argued that psychometricians need to implement residual-free Rasch fit statistics that are based on the number of Guttman response errors, or use indices that are statistically optimal in detecting measurement disturbances.

  19. Linear retrieval and global measurements of wind speed from the Seasat SMMR

    NASA Technical Reports Server (NTRS)

    Pandey, P. C.

    1983-01-01

    Retrievals of wind speed (WS) from Seasat Scanning Multichannel Microwave Radiometer (SMMR) were performed using a two-step statistical technique. Nine subsets of two to five SMMR channels were examined for wind speed retrieval. These subsets were derived by using a leaps and bound procedure based on the coefficient of determination selection criteria to a statistical data base of brightness temperatures and geophysical parameters. Analysis of Monsoon Experiment and ocean station PAPA data showed a strong correlation between sea surface temperature and water vapor. This relation was used in generating the statistical data base. Global maps of WS were produced for one and three month periods.

  20. Rule-based statistical data mining agents for an e-commerce application

    NASA Astrophysics Data System (ADS)

    Qin, Yi; Zhang, Yan-Qing; King, K. N.; Sunderraman, Rajshekhar

    2003-03-01

    Intelligent data mining techniques have useful e-Business applications. Because an e-Commerce application is related to multiple domains such as statistical analysis, market competition, price comparison, profit improvement and personal preferences, this paper presents a hybrid knowledge-based e-Commerce system fusing intelligent techniques, statistical data mining, and personal information to enhance QoS (Quality of Service) of e-Commerce. A Web-based e-Commerce application software system, eDVD Web Shopping Center, is successfully implemented uisng Java servlets and an Oracle81 database server. Simulation results have shown that the hybrid intelligent e-Commerce system is able to make smart decisions for different customers.

  1. A Comparative Analysis of the Minuteman Education Programs as Currently Offered at Six SAC Bases.

    DTIC Science & Technology

    1980-06-01

    Principles of Marketing 3 Business Statistics 3 Business Law 3 Management Total... Principles of Marketing 3 Mathematics Methods I Total prerequisite hours 26 Required Graduate Courses Policy Formulation and Administration 3 Management...Business and Economic Statistics 3 Intermediate Business and Economic Statistics 3 Principles of Management 3 Corporation Finance 3 Principles of Marketing

  2. Investigation of Weibull statistics in fracture analysis of cast aluminum

    NASA Technical Reports Server (NTRS)

    Holland, Frederic A., Jr.; Zaretsky, Erwin V.

    1989-01-01

    The fracture strengths of two large batches of A357-T6 cast aluminum coupon specimens were compared by using two-parameter Weibull analysis. The minimum number of these specimens necessary to find the fracture strength of the material was determined. The applicability of three-parameter Weibull analysis was also investigated. A design methodology based on the combination of elementary stress analysis and Weibull statistical analysis is advanced and applied to the design of a spherical pressure vessel shell. The results from this design methodology are compared with results from the applicable ASME pressure vessel code.

  3. Statistical analysis of 4 types of neck whiplash injuries based on classical meridian theory.

    PubMed

    Chen, Yemeng; Zhao, Yan; Xue, Xiaolin; Li, Hui; Wu, Xiuyan; Zhang, Qunce; Zheng, Xin; Wang, Tianfang

    2015-01-01

    As one component of the Chinese medicine meridian system, the meridian sinew (Jingjin, (see text), tendino-musculo) is specially described as being for acupuncture treatment of the musculoskeletal system because of its dynamic attributes and tender point correlations. In recent decades, the therapeutic importance of the sinew meridian has become revalued in clinical application. Based on this theory, the authors have established therapeutic strategies of acupuncture treatment in Whiplash-Associated Disorders (WAD) by categorizing four types of neck symptom presentations. The advantage of this new system is to make it much easier for the clinician to find effective acupuncture points. This study attempts to prove the significance of the proposed therapeutic strategies by analyzing data collected from a clinical survey of various WAD using non-supervised statistical methods, such as correlation analysis, factor analysis, and cluster analysis. The clinical survey data have successfully verified discrete characteristics of four neck syndromes, based upon the range of motion (ROM) and tender point location findings. A summary of the relationships among the symptoms of the four neck syndromes has shown the correlation coefficient as having a statistical significance (P < 0.01 or P < 0.05), especially with regard to ROM. Furthermore, factor and cluster analyses resulted in a total of 11 categories of general symptoms, which implies syndrome factors are more related to the Liver, as originally described in classical theory. The hypothesis of meridian sinew syndromes in WAD is clearly supported by the statistical analysis of the clinical trials. This new discovery should be beneficial in improving therapeutic outcomes.

  4. In defence of model-based inference in phylogeography

    PubMed Central

    Beaumont, Mark A.; Nielsen, Rasmus; Robert, Christian; Hey, Jody; Gaggiotti, Oscar; Knowles, Lacey; Estoup, Arnaud; Panchal, Mahesh; Corander, Jukka; Hickerson, Mike; Sisson, Scott A.; Fagundes, Nelson; Chikhi, Lounès; Beerli, Peter; Vitalis, Renaud; Cornuet, Jean-Marie; Huelsenbeck, John; Foll, Matthieu; Yang, Ziheng; Rousset, Francois; Balding, David; Excoffier, Laurent

    2017-01-01

    Recent papers have promoted the view that model-based methods in general, and those based on Approximate Bayesian Computation (ABC) in particular, are flawed in a number of ways, and are therefore inappropriate for the analysis of phylogeographic data. These papers further argue that Nested Clade Phylogeographic Analysis (NCPA) offers the best approach in statistical phylogeography. In order to remove the confusion and misconceptions introduced by these papers, we justify and explain the reasoning behind model-based inference. We argue that ABC is a statistically valid approach, alongside other computational statistical techniques that have been successfully used to infer parameters and compare models in population genetics. We also examine the NCPA method and highlight numerous deficiencies, either when used with single or multiple loci. We further show that the ages of clades are carelessly used to infer ages of demographic events, that these ages are estimated under a simple model of panmixia and population stationarity but are then used under different and unspecified models to test hypotheses, a usage the invalidates these testing procedures. We conclude by encouraging researchers to study and use model-based inference in population genetics. PMID:29284924

  5. Implementing a Web-Based Decision Support System to Spatially and Statistically Analyze Ecological Conditions of the Sierra Nevada

    NASA Astrophysics Data System (ADS)

    Nguyen, A.; Mueller, C.; Brooks, A. N.; Kislik, E. A.; Baney, O. N.; Ramirez, C.; Schmidt, C.; Torres-Perez, J. L.

    2014-12-01

    The Sierra Nevada is experiencing changes in hydrologic regimes, such as decreases in snowmelt and peak runoff, which affect forest health and the availability of water resources. Currently, the USDA Forest Service Region 5 is undergoing Forest Plan revisions to include climate change impacts into mitigation and adaptation strategies. However, there are few processes in place to conduct quantitative assessments of forest conditions in relation to mountain hydrology, while easily and effectively delivering that information to forest managers. To assist the USDA Forest Service, this study is the final phase of a three-term project to create a Decision Support System (DSS) to allow ease of access to historical and forecasted hydrologic, climatic, and terrestrial conditions for the entire Sierra Nevada. This data is featured within three components of the DSS: the Mapping Viewer, Statistical Analysis Portal, and Geospatial Data Gateway. Utilizing ArcGIS Online, the Sierra DSS Mapping Viewer enables users to visually analyze and locate areas of interest. Once the areas of interest are targeted, the Statistical Analysis Portal provides subbasin level statistics for each variable over time by utilizing a recently developed web-based data analysis and visualization tool called Plotly. This tool allows users to generate graphs and conduct statistical analyses for the Sierra Nevada without the need to download the dataset of interest. For more comprehensive analysis, users are also able to download datasets via the Geospatial Data Gateway. The third phase of this project focused on Python-based data processing, the adaptation of the multiple capabilities of ArcGIS Online and Plotly, and the integration of the three Sierra DSS components within a website designed specifically for the USDA Forest Service.

  6. BaTMAn: Bayesian Technique for Multi-image Analysis

    NASA Astrophysics Data System (ADS)

    Casado, J.; Ascasibar, Y.; García-Benito, R.; Guidi, G.; Choudhury, O. S.; Bellocchi, E.; Sánchez, S. F.; Díaz, A. I.

    2016-12-01

    Bayesian Technique for Multi-image Analysis (BaTMAn) characterizes any astronomical dataset containing spatial information and performs a tessellation based on the measurements and errors provided as input. The algorithm iteratively merges spatial elements as long as they are statistically consistent with carrying the same information (i.e. identical signal within the errors). The output segmentations successfully adapt to the underlying spatial structure, regardless of its morphology and/or the statistical properties of the noise. BaTMAn identifies (and keeps) all the statistically-significant information contained in the input multi-image (e.g. an IFS datacube). The main aim of the algorithm is to characterize spatially-resolved data prior to their analysis.

  7. The social construction of "evidence-based'' drug prevention programs: a reanalysis of data from the Drug Abuse Resistance Education (DARE) program.

    PubMed

    Gorman, Dennis M; Huber, J Charles

    2009-08-01

    This study explores the possibility that any drug prevention program might be considered ;;evidence-based'' given the use of data analysis procedures that optimize the chance of producing statistically significant results by reanalyzing data from a Drug Abuse Resistance Education (DARE) program evaluation. The analysis produced a number of statistically significant differences between the DARE and control conditions on alcohol and marijuana use measures. Many of these differences occurred at cutoff points on the assessment scales for which post hoc meaningful labels were created. Our results are compared to those from evaluations of programs that appear on evidence-based drug prevention lists.

  8. 75 FR 73972 - Medicaid Program; Cost Limit for Providers Operated by Units of Government and Provisions To...

    Federal Register 2010, 2011, 2012, 2013, 2014

    2010-11-30

    ... Based on Customary Charges In Sec. 447.271(a), DHHS is adding an introductory phrase to read ``Except as... hospital that is located outside of a Core-Based Statistical Area (for Medicaid) and outside a Metropolitan Statistical Area for Medicare) and has fewer than 100 beds. DHHS is not preparing an analysis for section 1102...

  9. Identifiability of PBPK Models with Applications to Dimethylarsinic Acid Exposure

    EPA Science Inventory

    Any statistical model should be identifiable in order for estimates and tests using it to be meaningful. We consider statistical analysis of physiologically-based pharmacokinetic (PBPK) models in which parameters cannot be estimated precisely from available data, and discuss diff...

  10. Model-Based Linkage Analysis of a Quantitative Trait.

    PubMed

    Song, Yeunjoo E; Song, Sunah; Schnell, Audrey H

    2017-01-01

    Linkage Analysis is a family-based method of analysis to examine whether any typed genetic markers cosegregate with a given trait, in this case a quantitative trait. If linkage exists, this is taken as evidence in support of a genetic basis for the trait. Historically, linkage analysis was performed using a binary disease trait, but has been extended to include quantitative disease measures. Quantitative traits are desirable as they provide more information than binary traits. Linkage analysis can be performed using single-marker methods (one marker at a time) or multipoint (using multiple markers simultaneously). In model-based linkage analysis the genetic model for the trait of interest is specified. There are many software options for performing linkage analysis. Here, we use the program package Statistical Analysis for Genetic Epidemiology (S.A.G.E.). S.A.G.E. was chosen because it also includes programs to perform data cleaning procedures and to generate and test genetic models for a quantitative trait, in addition to performing linkage analysis. We demonstrate in detail the process of running the program LODLINK to perform single-marker analysis, and MLOD to perform multipoint analysis using output from SEGREG, where SEGREG was used to determine the best fitting statistical model for the trait.

  11. Modified Bayesian Kriging for Noisy Response Problems for Reliability Analysis

    DTIC Science & Technology

    2015-01-01

    52242, USA nicholas-gaul@uiowa.edu Mary Kathryn Cowles Department of Statistics & Actuarial Science College of Liberal Arts and Sciences , The...Forrester, A. I. J., & Keane, A. J. (2009). Recent advances in surrogate-based optimization. Progress in Aerospace Sciences , 45(1–3), 50-79. doi...Wiley. [27] Sacks, J., Welch, W. J., Toby J. Mitchell, & Wynn, H. P. (1989). Design and analysis of computer experiments. Statistical Science , 4

  12. Status Quo and Outlook of the Studies of Entrepreneurship Education in China: Statistics and Analysis Based on Papers Indexed in CSSCI (2004-2013)

    ERIC Educational Resources Information Center

    Xia, Tian; Shumin, Zhang; Yifeng, Wu

    2016-01-01

    We utilized cross tabulation statistics, word frequency counts, and content analysis of research output to conduct a bibliometric study, and used CiteSpace software to depict a knowledge map for research on entrepreneurship education in China from 2004 to 2013. The study shows that, in this duration, the study of Chinese entrepreneurship education…

  13. Phenotypic mapping of metabolic profiles using self-organizing maps of high-dimensional mass spectrometry data.

    PubMed

    Goodwin, Cody R; Sherrod, Stacy D; Marasco, Christina C; Bachmann, Brian O; Schramm-Sapyta, Nicole; Wikswo, John P; McLean, John A

    2014-07-01

    A metabolic system is composed of inherently interconnected metabolic precursors, intermediates, and products. The analysis of untargeted metabolomics data has conventionally been performed through the use of comparative statistics or multivariate statistical analysis-based approaches; however, each falls short in representing the related nature of metabolic perturbations. Herein, we describe a complementary method for the analysis of large metabolite inventories using a data-driven approach based upon a self-organizing map algorithm. This workflow allows for the unsupervised clustering, and subsequent prioritization of, correlated features through Gestalt comparisons of metabolic heat maps. We describe this methodology in detail, including a comparison to conventional metabolomics approaches, and demonstrate the application of this method to the analysis of the metabolic repercussions of prolonged cocaine exposure in rat sera profiles.

  14. Fully Bayesian inference for structural MRI: application to segmentation and statistical analysis of T2-hypointensities.

    PubMed

    Schmidt, Paul; Schmid, Volker J; Gaser, Christian; Buck, Dorothea; Bührlen, Susanne; Förschler, Annette; Mühlau, Mark

    2013-01-01

    Aiming at iron-related T2-hypointensity, which is related to normal aging and neurodegenerative processes, we here present two practicable approaches, based on Bayesian inference, for preprocessing and statistical analysis of a complex set of structural MRI data. In particular, Markov Chain Monte Carlo methods were used to simulate posterior distributions. First, we rendered a segmentation algorithm that uses outlier detection based on model checking techniques within a Bayesian mixture model. Second, we rendered an analytical tool comprising a Bayesian regression model with smoothness priors (in the form of Gaussian Markov random fields) mitigating the necessity to smooth data prior to statistical analysis. For validation, we used simulated data and MRI data of 27 healthy controls (age: [Formula: see text]; range, [Formula: see text]). We first observed robust segmentation of both simulated T2-hypointensities and gray-matter regions known to be T2-hypointense. Second, simulated data and images of segmented T2-hypointensity were analyzed. We found not only robust identification of simulated effects but also a biologically plausible age-related increase of T2-hypointensity primarily within the dentate nucleus but also within the globus pallidus, substantia nigra, and red nucleus. Our results indicate that fully Bayesian inference can successfully be applied for preprocessing and statistical analysis of structural MRI data.

  15. Surface flaw reliability analysis of ceramic components with the SCARE finite element postprocessor program

    NASA Technical Reports Server (NTRS)

    Gyekenyesi, John P.; Nemeth, Noel N.

    1987-01-01

    The SCARE (Structural Ceramics Analysis and Reliability Evaluation) computer program on statistical fast fracture reliability analysis with quadratic elements for volume distributed imperfections is enhanced to include the use of linear finite elements and the capability of designing against concurrent surface flaw induced ceramic component failure. The SCARE code is presently coupled as a postprocessor to the MSC/NASTRAN general purpose, finite element analysis program. The improved version now includes the Weibull and Batdorf statistical failure theories for both surface and volume flaw based reliability analysis. The program uses the two-parameter Weibull fracture strength cumulative failure probability distribution model with the principle of independent action for poly-axial stress states, and Batdorf's shear-sensitive as well as shear-insensitive statistical theories. The shear-sensitive surface crack configurations include the Griffith crack and Griffith notch geometries, using the total critical coplanar strain energy release rate criterion to predict mixed-mode fracture. Weibull material parameters based on both surface and volume flaw induced fracture can also be calculated from modulus of rupture bar tests, using the least squares method with known specimen geometry and grouped fracture data. The statistical fast fracture theories for surface flaw induced failure, along with selected input and output formats and options, are summarized. An example problem to demonstrate various features of the program is included.

  16. A Third Moment Adjusted Test Statistic for Small Sample Factor Analysis

    PubMed Central

    Lin, Johnny; Bentler, Peter M.

    2012-01-01

    Goodness of fit testing in factor analysis is based on the assumption that the test statistic is asymptotically chi-square; but this property may not hold in small samples even when the factors and errors are normally distributed in the population. Robust methods such as Browne’s asymptotically distribution-free method and Satorra Bentler’s mean scaling statistic were developed under the presumption of non-normality in the factors and errors. This paper finds new application to the case where factors and errors are normally distributed in the population but the skewness of the obtained test statistic is still high due to sampling error in the observed indicators. An extension of Satorra Bentler’s statistic is proposed that not only scales the mean but also adjusts the degrees of freedom based on the skewness of the obtained test statistic in order to improve its robustness under small samples. A simple simulation study shows that this third moment adjusted statistic asymptotically performs on par with previously proposed methods, and at a very small sample size offers superior Type I error rates under a properly specified model. Data from Mardia, Kent and Bibby’s study of students tested for their ability in five content areas that were either open or closed book were used to illustrate the real-world performance of this statistic. PMID:23144511

  17. Improved Diagnostic Accuracy of SPECT Through Statistical Analysis and the Detection of Hot Spots at the Primary Sensorimotor Area for the Diagnosis of Alzheimer Disease in a Community-Based Study: "The Osaki-Tajiri Project".

    PubMed

    Kaneta, Tomohiro; Nakatsuka, Masahiro; Nakamura, Kei; Seki, Takashi; Yamaguchi, Satoshi; Tsuboi, Masahiro; Meguro, Kenichi

    2016-01-01

    SPECT is an important diagnostic tool for dementia. Recently, statistical analysis of SPECT has been commonly used for dementia research. In this study, we evaluated the accuracy of visual SPECT evaluation and/or statistical analysis for the diagnosis (Dx) of Alzheimer disease (AD) and other forms of dementia in our community-based study "The Osaki-Tajiri Project." Eighty-nine consecutive outpatients with dementia were enrolled and underwent brain perfusion SPECT with 99mTc-ECD. Diagnostic accuracy of SPECT was tested using 3 methods: visual inspection (SPECT Dx), automated diagnostic tool using statistical analysis with easy Z-score imaging system (eZIS Dx), and visual inspection plus eZIS (integrated Dx). Integrated Dx showed the highest sensitivity, specificity, and accuracy, whereas eZIS was the second most accurate method. We also observed that a higher than expected rate of SPECT images indicated false-negative cases of AD. Among these, 50% showed hypofrontality and were diagnosed as frontotemporal lobar degeneration. These cases typically showed regional "hot spots" in the primary sensorimotor cortex (ie, a sensorimotor hot spot sign), which we determined were associated with AD rather than frontotemporal lobar degeneration. We concluded that the diagnostic abilities were improved by the integrated use of visual assessment and statistical analysis. In addition, the detection of a sensorimotor hot spot sign was useful to detect AD when hypofrontality is present and improved the ability to properly diagnose AD.

  18. Comparison of a non-stationary voxelation-corrected cluster-size test with TFCE for group-Level MRI inference.

    PubMed

    Li, Huanjie; Nickerson, Lisa D; Nichols, Thomas E; Gao, Jia-Hong

    2017-03-01

    Two powerful methods for statistical inference on MRI brain images have been proposed recently, a non-stationary voxelation-corrected cluster-size test (CST) based on random field theory and threshold-free cluster enhancement (TFCE) based on calculating the level of local support for a cluster, then using permutation testing for inference. Unlike other statistical approaches, these two methods do not rest on the assumptions of a uniform and high degree of spatial smoothness of the statistic image. Thus, they are strongly recommended for group-level fMRI analysis compared to other statistical methods. In this work, the non-stationary voxelation-corrected CST and TFCE methods for group-level analysis were evaluated for both stationary and non-stationary images under varying smoothness levels, degrees of freedom and signal to noise ratios. Our results suggest that, both methods provide adequate control for the number of voxel-wise statistical tests being performed during inference on fMRI data and they are both superior to current CSTs implemented in popular MRI data analysis software packages. However, TFCE is more sensitive and stable for group-level analysis of VBM data. Thus, the voxelation-corrected CST approach may confer some advantages by being computationally less demanding for fMRI data analysis than TFCE with permutation testing and by also being applicable for single-subject fMRI analyses, while the TFCE approach is advantageous for VBM data. Hum Brain Mapp 38:1269-1280, 2017. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.

  19. Systems and methods for detection of blowout precursors in combustors

    DOEpatents

    Lieuwen, Tim C.; Nair, Suraj

    2006-08-15

    The present invention comprises systems and methods for detecting flame blowout precursors in combustors. The blowout precursor detection system comprises a combustor, a pressure measuring device, and blowout precursor detection unit. A combustion controller may also be used to control combustor parameters. The methods of the present invention comprise receiving pressure data measured by an acoustic pressure measuring device, performing one or a combination of spectral analysis, statistical analysis, and wavelet analysis on received pressure data, and determining the existence of a blowout precursor based on such analyses. The spectral analysis, statistical analysis, and wavelet analysis further comprise their respective sub-methods to determine the existence of blowout precursors.

  20. Improved Statistics for Genome-Wide Interaction Analysis

    PubMed Central

    Ueki, Masao; Cordell, Heather J.

    2012-01-01

    Recently, Wu and colleagues [1] proposed two novel statistics for genome-wide interaction analysis using case/control or case-only data. In computer simulations, their proposed case/control statistic outperformed competing approaches, including the fast-epistasis option in PLINK and logistic regression analysis under the correct model; however, reasons for its superior performance were not fully explored. Here we investigate the theoretical properties and performance of Wu et al.'s proposed statistics and explain why, in some circumstances, they outperform competing approaches. Unfortunately, we find minor errors in the formulae for their statistics, resulting in tests that have higher than nominal type 1 error. We also find minor errors in PLINK's fast-epistasis and case-only statistics, although theory and simulations suggest that these errors have only negligible effect on type 1 error. We propose adjusted versions of all four statistics that, both theoretically and in computer simulations, maintain correct type 1 error rates under the null hypothesis. We also investigate statistics based on correlation coefficients that maintain similar control of type 1 error. Although designed to test specifically for interaction, we show that some of these previously-proposed statistics can, in fact, be sensitive to main effects at one or both loci, particularly in the presence of linkage disequilibrium. We propose two new “joint effects” statistics that, provided the disease is rare, are sensitive only to genuine interaction effects. In computer simulations we find, in most situations considered, that highest power is achieved by analysis under the correct genetic model. Such an analysis is unachievable in practice, as we do not know this model. However, generally high power over a wide range of scenarios is exhibited by our joint effects and adjusted Wu statistics. We recommend use of these alternative or adjusted statistics and urge caution when using Wu et al.'s originally-proposed statistics, on account of the inflated error rate that can result. PMID:22496670

  1. Statistical analysis plan for the family-led rehabilitation after stroke in India (ATTEND) trial: A multicenter randomized controlled trial of a new model of stroke rehabilitation compared to usual care.

    PubMed

    Billot, Laurent; Lindley, Richard I; Harvey, Lisa A; Maulik, Pallab K; Hackett, Maree L; Murthy, Gudlavalleti Vs; Anderson, Craig S; Shamanna, Bindiganavale R; Jan, Stephen; Walker, Marion; Forster, Anne; Langhorne, Peter; Verma, Shweta J; Felix, Cynthia; Alim, Mohammed; Gandhi, Dorcas Bc; Pandian, Jeyaraj Durai

    2017-02-01

    Background In low- and middle-income countries, few patients receive organized rehabilitation after stroke, yet the burden of chronic diseases such as stroke is increasing in these countries. Affordable models of effective rehabilitation could have a major impact. The ATTEND trial is evaluating a family-led caregiver delivered rehabilitation program after stroke. Objective To publish the detailed statistical analysis plan for the ATTEND trial prior to trial unblinding. Methods Based upon the published registration and protocol, the blinded steering committee and management team, led by the trial statistician, have developed a statistical analysis plan. The plan has been informed by the chosen outcome measures, the data collection forms and knowledge of key baseline data. Results The resulting statistical analysis plan is consistent with best practice and will allow open and transparent reporting. Conclusions Publication of the trial statistical analysis plan reduces potential bias in trial reporting, and clearly outlines pre-specified analyses. Clinical Trial Registrations India CTRI/2013/04/003557; Australian New Zealand Clinical Trials Registry ACTRN1261000078752; Universal Trial Number U1111-1138-6707.

  2. Statistical analysis of vehicle crashes in Mississippi based on crash data from 2010 to 2014.

    DOT National Transportation Integrated Search

    2017-08-15

    Traffic crash data from 2010 to 2014 were collected by Mississippi Department of Transportation (MDOT) and extracted for the study. Three tasks were conducted in this study: (1) geographic distribution of crashes; (2) descriptive statistics of crash ...

  3. STATWIZ - AN ELECTRONIC STATISTICAL TOOL (ABSTRACT)

    EPA Science Inventory

    StatWiz is a web-based, interactive, and dynamic statistical tool for researchers. It will allow researchers to input information and/or data and then receive experimental design options, or outputs from data analysis. StatWiz is envisioned as an expert system that will walk rese...

  4. Enhancing efficiency and quality of statistical estimation of immunogenicity assay cut points through standardization and automation.

    PubMed

    Su, Cheng; Zhou, Lei; Hu, Zheng; Weng, Winnie; Subramani, Jayanthi; Tadkod, Vineet; Hamilton, Kortney; Bautista, Ami; Wu, Yu; Chirmule, Narendra; Zhong, Zhandong Don

    2015-10-01

    Biotherapeutics can elicit immune responses, which can alter the exposure, safety, and efficacy of the therapeutics. A well-designed and robust bioanalytical method is critical for the detection and characterization of relevant anti-drug antibody (ADA) and the success of an immunogenicity study. As a fundamental criterion in immunogenicity testing, assay cut points need to be statistically established with a risk-based approach to reduce subjectivity. This manuscript describes the development of a validated, web-based, multi-tier customized assay statistical tool (CAST) for assessing cut points of ADA assays. The tool provides an intuitive web interface that allows users to import experimental data generated from a standardized experimental design, select the assay factors, run the standardized analysis algorithms, and generate tables, figures, and listings (TFL). It allows bioanalytical scientists to perform complex statistical analysis at a click of the button to produce reliable assay parameters in support of immunogenicity studies. Copyright © 2015 Elsevier B.V. All rights reserved.

  5. SimHap GUI: an intuitive graphical user interface for genetic association analysis.

    PubMed

    Carter, Kim W; McCaskie, Pamela A; Palmer, Lyle J

    2008-12-25

    Researchers wishing to conduct genetic association analysis involving single nucleotide polymorphisms (SNPs) or haplotypes are often confronted with the lack of user-friendly graphical analysis tools, requiring sophisticated statistical and informatics expertise to perform relatively straightforward tasks. Tools, such as the SimHap package for the R statistics language, provide the necessary statistical operations to conduct sophisticated genetic analysis, but lacks a graphical user interface that allows anyone but a professional statistician to effectively utilise the tool. We have developed SimHap GUI, a cross-platform integrated graphical analysis tool for conducting epidemiological, single SNP and haplotype-based association analysis. SimHap GUI features a novel workflow interface that guides the user through each logical step of the analysis process, making it accessible to both novice and advanced users. This tool provides a seamless interface to the SimHap R package, while providing enhanced functionality such as sophisticated data checking, automated data conversion, and real-time estimations of haplotype simulation progress. SimHap GUI provides a novel, easy-to-use, cross-platform solution for conducting a range of genetic and non-genetic association analyses. This provides a free alternative to commercial statistics packages that is specifically designed for genetic association analysis.

  6. Implementation and evaluation of an efficient secure computation system using ‘R’ for healthcare statistics

    PubMed Central

    Chida, Koji; Morohashi, Gembu; Fuji, Hitoshi; Magata, Fumihiko; Fujimura, Akiko; Hamada, Koki; Ikarashi, Dai; Yamamoto, Ryuichi

    2014-01-01

    Background and objective While the secondary use of medical data has gained attention, its adoption has been constrained due to protection of patient privacy. Making medical data secure by de-identification can be problematic, especially when the data concerns rare diseases. We require rigorous security management measures. Materials and methods Using secure computation, an approach from cryptography, our system can compute various statistics over encrypted medical records without decrypting them. An issue of secure computation is that the amount of processing time required is immense. We implemented a system that securely computes healthcare statistics from the statistical computing software ‘R’ by effectively combining secret-sharing-based secure computation with original computation. Results Testing confirmed that our system could correctly complete computation of average and unbiased variance of approximately 50 000 records of dummy insurance claim data in a little over a second. Computation including conditional expressions and/or comparison of values, for example, t test and median, could also be correctly completed in several tens of seconds to a few minutes. Discussion If medical records are simply encrypted, the risk of leaks exists because decryption is usually required during statistical analysis. Our system possesses high-level security because medical records remain in encrypted state even during statistical analysis. Also, our system can securely compute some basic statistics with conditional expressions using ‘R’ that works interactively while secure computation protocols generally require a significant amount of processing time. Conclusions We propose a secure statistical analysis system using ‘R’ for medical data that effectively integrates secret-sharing-based secure computation and original computation. PMID:24763677

  7. Implementation and evaluation of an efficient secure computation system using 'R' for healthcare statistics.

    PubMed

    Chida, Koji; Morohashi, Gembu; Fuji, Hitoshi; Magata, Fumihiko; Fujimura, Akiko; Hamada, Koki; Ikarashi, Dai; Yamamoto, Ryuichi

    2014-10-01

    While the secondary use of medical data has gained attention, its adoption has been constrained due to protection of patient privacy. Making medical data secure by de-identification can be problematic, especially when the data concerns rare diseases. We require rigorous security management measures. Using secure computation, an approach from cryptography, our system can compute various statistics over encrypted medical records without decrypting them. An issue of secure computation is that the amount of processing time required is immense. We implemented a system that securely computes healthcare statistics from the statistical computing software 'R' by effectively combining secret-sharing-based secure computation with original computation. Testing confirmed that our system could correctly complete computation of average and unbiased variance of approximately 50,000 records of dummy insurance claim data in a little over a second. Computation including conditional expressions and/or comparison of values, for example, t test and median, could also be correctly completed in several tens of seconds to a few minutes. If medical records are simply encrypted, the risk of leaks exists because decryption is usually required during statistical analysis. Our system possesses high-level security because medical records remain in encrypted state even during statistical analysis. Also, our system can securely compute some basic statistics with conditional expressions using 'R' that works interactively while secure computation protocols generally require a significant amount of processing time. We propose a secure statistical analysis system using 'R' for medical data that effectively integrates secret-sharing-based secure computation and original computation. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.

  8. The sumLINK statistic for genetic linkage analysis in the presence of heterogeneity.

    PubMed

    Christensen, G B; Knight, S; Camp, N J

    2009-11-01

    We present the "sumLINK" statistic--the sum of multipoint LOD scores for the subset of pedigrees with nominally significant linkage evidence at a given locus--as an alternative to common methods to identify susceptibility loci in the presence of heterogeneity. We also suggest the "sumLOD" statistic (the sum of positive multipoint LOD scores) as a companion to the sumLINK. sumLINK analysis identifies genetic regions of extreme consistency across pedigrees without regard to negative evidence from unlinked or uninformative pedigrees. Significance is determined by an innovative permutation procedure based on genome shuffling that randomizes linkage information across pedigrees. This procedure for generating the empirical null distribution may be useful for other linkage-based statistics as well. Using 500 genome-wide analyses of simulated null data, we show that the genome shuffling procedure results in the correct type 1 error rates for both the sumLINK and sumLOD. The power of the statistics was tested using 100 sets of simulated genome-wide data from the alternative hypothesis from GAW13. Finally, we illustrate the statistics in an analysis of 190 aggressive prostate cancer pedigrees from the International Consortium for Prostate Cancer Genetics, where we identified a new susceptibility locus. We propose that the sumLINK and sumLOD are ideal for collaborative projects and meta-analyses, as they do not require any sharing of identifiable data between contributing institutions. Further, loci identified with the sumLINK have good potential for gene localization via statistical recombinant mapping, as, by definition, several linked pedigrees contribute to each peak.

  9. K-nearest neighbors based methods for identification of different gear crack levels under different motor speeds and loads: Revisited

    NASA Astrophysics Data System (ADS)

    Wang, Dong

    2016-03-01

    Gears are the most commonly used components in mechanical transmission systems. Their failures may cause transmission system breakdown and result in economic loss. Identification of different gear crack levels is important to prevent any unexpected gear failure because gear cracks lead to gear tooth breakage. Signal processing based methods mainly require expertize to explain gear fault signatures which is usually not easy to be achieved by ordinary users. In order to automatically identify different gear crack levels, intelligent gear crack identification methods should be developed. The previous case studies experimentally proved that K-nearest neighbors based methods exhibit high prediction accuracies for identification of 3 different gear crack levels under different motor speeds and loads. In this short communication, to further enhance prediction accuracies of existing K-nearest neighbors based methods and extend identification of 3 different gear crack levels to identification of 5 different gear crack levels, redundant statistical features are constructed by using Daubechies 44 (db44) binary wavelet packet transform at different wavelet decomposition levels, prior to the use of a K-nearest neighbors method. The dimensionality of redundant statistical features is 620, which provides richer gear fault signatures. Since many of these statistical features are redundant and highly correlated with each other, dimensionality reduction of redundant statistical features is conducted to obtain new significant statistical features. At last, the K-nearest neighbors method is used to identify 5 different gear crack levels under different motor speeds and loads. A case study including 3 experiments is investigated to demonstrate that the developed method provides higher prediction accuracies than the existing K-nearest neighbors based methods for recognizing different gear crack levels under different motor speeds and loads. Based on the new significant statistical features, some other popular statistical models including linear discriminant analysis, quadratic discriminant analysis, classification and regression tree and naive Bayes classifier, are compared with the developed method. The results show that the developed method has the highest prediction accuracies among these statistical models. Additionally, selection of the number of new significant features and parameter selection of K-nearest neighbors are thoroughly investigated.

  10. Finding differentially expressed genes in high dimensional data: Rank based test statistic via a distance measure.

    PubMed

    Mathur, Sunil; Sadana, Ajit

    2015-12-01

    We present a rank-based test statistic for the identification of differentially expressed genes using a distance measure. The proposed test statistic is highly robust against extreme values and does not assume the distribution of parent population. Simulation studies show that the proposed test is more powerful than some of the commonly used methods, such as paired t-test, Wilcoxon signed rank test, and significance analysis of microarray (SAM) under certain non-normal distributions. The asymptotic distribution of the test statistic, and the p-value function are discussed. The application of proposed method is shown using a real-life data set. © The Author(s) 2011.

  11. The Analysis of Organizational Diagnosis on Based Six Box Model in Universities

    ERIC Educational Resources Information Center

    Hamid, Rahimi; Siadat, Sayyed Ali; Reza, Hoveida; Arash, Shahin; Ali, Nasrabadi Hasan; Azizollah, Arbabisarjou

    2011-01-01

    Purpose: The analysis of organizational diagnosis on based six box model at universities. Research method: Research method was descriptive-survey. Statistical population consisted of 1544 faculty members of universities which through random strafed sampling method 218 persons were chosen as the sample. Research Instrument were organizational…

  12. Analysis of vector wind change with respect to time for Vandenberg Air Force Base, California

    NASA Technical Reports Server (NTRS)

    Adelfang, S. I.

    1978-01-01

    A statistical analysis of the temporal variability of wind vectors at 1 km altitude intervals from 0 to 27 km altitude taken from a 10-year data sample of twice-daily rawinsode wind measurements over Vandenberg Air Force Base, California is presented.

  13. "Magnitude-based inference": a statistical review.

    PubMed

    Welsh, Alan H; Knight, Emma J

    2015-04-01

    We consider "magnitude-based inference" and its interpretation by examining in detail its use in the problem of comparing two means. We extract from the spreadsheets, which are provided to users of the analysis (http://www.sportsci.org/), a precise description of how "magnitude-based inference" is implemented. We compare the implemented version of the method with general descriptions of it and interpret the method in familiar statistical terms. We show that "magnitude-based inference" is not a progressive improvement on modern statistics. The additional probabilities introduced are not directly related to the confidence interval but, rather, are interpretable either as P values for two different nonstandard tests (for different null hypotheses) or as approximate Bayesian calculations, which also lead to a type of test. We also discuss sample size calculations associated with "magnitude-based inference" and show that the substantial reduction in sample sizes claimed for the method (30% of the sample size obtained from standard frequentist calculations) is not justifiable so the sample size calculations should not be used. Rather than using "magnitude-based inference," a better solution is to be realistic about the limitations of the data and use either confidence intervals or a fully Bayesian analysis.

  14. Conditional statistics in a turbulent premixed flame derived from direct numerical simulation

    NASA Technical Reports Server (NTRS)

    Mantel, Thierry; Bilger, Robert W.

    1994-01-01

    The objective of this paper is to briefly introduce conditional moment closure (CMC) methods for premixed systems and to derive the transport equation for the conditional species mass fraction conditioned on the progress variable based on the enthalpy. Our statistical analysis will be based on the 3-D DNS database of Trouve and Poinsot available at the Center for Turbulence Research. The initial conditions and characteristics (turbulence, thermo-diffusive properties) as well as the numerical method utilized in the DNS of Trouve and Poinsot are presented, and some details concerning our statistical analysis are also given. From the analysis of DNS results, the effects of the position in the flame brush, of the Damkoehler and Lewis numbers on the conditional mean scalar dissipation, and conditional mean velocity are presented and discussed. Information concerning unconditional turbulent fluxes are also presented. The anomaly found in previous studies of counter-gradient diffusion for the turbulent flux of the progress variable is investigated.

  15. How to interpret the results of medical time series data analysis: Classical statistical approaches versus dynamic Bayesian network modeling.

    PubMed

    Onisko, Agnieszka; Druzdzel, Marek J; Austin, R Marshall

    2016-01-01

    Classical statistics is a well-established approach in the analysis of medical data. While the medical community seems to be familiar with the concept of a statistical analysis and its interpretation, the Bayesian approach, argued by many of its proponents to be superior to the classical frequentist approach, is still not well-recognized in the analysis of medical data. The goal of this study is to encourage data analysts to use the Bayesian approach, such as modeling with graphical probabilistic networks, as an insightful alternative to classical statistical analysis of medical data. This paper offers a comparison of two approaches to analysis of medical time series data: (1) classical statistical approach, such as the Kaplan-Meier estimator and the Cox proportional hazards regression model, and (2) dynamic Bayesian network modeling. Our comparison is based on time series cervical cancer screening data collected at Magee-Womens Hospital, University of Pittsburgh Medical Center over 10 years. The main outcomes of our comparison are cervical cancer risk assessments produced by the three approaches. However, our analysis discusses also several aspects of the comparison, such as modeling assumptions, model building, dealing with incomplete data, individualized risk assessment, results interpretation, and model validation. Our study shows that the Bayesian approach is (1) much more flexible in terms of modeling effort, and (2) it offers an individualized risk assessment, which is more cumbersome for classical statistical approaches.

  16. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Apte, A; Veeraraghavan, H; Oh, J

    Purpose: To present an open source and free platform to facilitate radiomics research — The “Radiomics toolbox” in CERR. Method: There is scarcity of open source tools that support end-to-end modeling of image features to predict patient outcomes. The “Radiomics toolbox” strives to fill the need for such a software platform. The platform supports (1) import of various kinds of image modalities like CT, PET, MR, SPECT, US. (2) Contouring tools to delineate structures of interest. (3) Extraction and storage of image based features like 1st order statistics, gray-scale co-occurrence and zonesize matrix based texture features and shape features andmore » (4) Statistical Analysis. Statistical analysis of the extracted features is supported with basic functionality that includes univariate correlations, Kaplan-Meir curves and advanced functionality that includes feature reduction and multivariate modeling. The graphical user interface and the data management are performed with Matlab for the ease of development and readability of code and features for wide audience. Open-source software developed with other programming languages is integrated to enhance various components of this toolbox. For example: Java-based DCM4CHE for import of DICOM, R for statistical analysis. Results: The Radiomics toolbox will be distributed as an open source, GNU copyrighted software. The toolbox was prototyped for modeling Oropharyngeal PET dataset at MSKCC. The analysis will be presented in a separate paper. Conclusion: The Radiomics Toolbox provides an extensible platform for extracting and modeling image features. To emphasize new uses of CERR for radiomics and image-based research, we have changed the name from the “Computational Environment for Radiotherapy Research” to the “Computational Environment for Radiological Research”.« less

  17. A Statistical Method for Synthesizing Mediation Analyses Using the Product of Coefficient Approach Across Multiple Trials

    PubMed Central

    Huang, Shi; MacKinnon, David P.; Perrino, Tatiana; Gallo, Carlos; Cruden, Gracelyn; Brown, C Hendricks

    2016-01-01

    Mediation analysis often requires larger sample sizes than main effect analysis to achieve the same statistical power. Combining results across similar trials may be the only practical option for increasing statistical power for mediation analysis in some situations. In this paper, we propose a method to estimate: 1) marginal means for mediation path a, the relation of the independent variable to the mediator; 2) marginal means for path b, the relation of the mediator to the outcome, across multiple trials; and 3) the between-trial level variance-covariance matrix based on a bivariate normal distribution. We present the statistical theory and an R computer program to combine regression coefficients from multiple trials to estimate a combined mediated effect and confidence interval under a random effects model. Values of coefficients a and b, along with their standard errors from each trial are the input for the method. This marginal likelihood based approach with Monte Carlo confidence intervals provides more accurate inference than the standard meta-analytic approach. We discuss computational issues, apply the method to two real-data examples and make recommendations for the use of the method in different settings. PMID:28239330

  18. [The main directions of reforming the service of medical statistics in Ukraine].

    PubMed

    Golubchykov, Mykhailo V; Orlova, Nataliia M; Bielikova, Inna V

    2018-01-01

    Introduction: Implementation of new methods of information support of managerial decision-making should ensure of the effective health system reform and create conditions for improving the quality of operational management, reasonable planning of medical care and increasing the efficiency of the use of system resources. Reforming of Medical Statistics Service of Ukraine should be considered only in the context of the reform of the entire health system. The aim: This work is an analysis of the current situation and justification of the main directions of reforming of Medical Statistics Service of Ukraine. Material and methods: In the work is used a range of methods: content analysis, bibliosemantic, systematic approach. The information base of the research became: WHO strategic and program documents, data of the Medical Statistics Center of the Ministry of Health of Ukraine. Review: The Medical Statistics Service of Ukraine has a completed and effective structure, headed by the State Institution "Medical Statistics Center of the Ministry of Health of Ukraine." This institution reports on behalf of the Ministry of Health of Ukraine to the State Statistical Service of Ukraine, the WHO European Office and other international organizations. An analysis of the current situation showed that to achieve this goal it is necessary: to improve the system of statistical indicators for an adequate assessment of the performance of health institutions, including in the economic aspect; creation of a developed medical and statistical base of administrative territories; change of existing technologies for the formation of information resources; strengthening the material-technical base of the structural units of Medical Statistics Service; improvement of the system of training and retraining of personnel for the service of medical statistics; development of international cooperation in the field of methodology and practice of medical statistics, implementation of internationally accepted methods for collecting, processing, analyzing and disseminating medical and statistical information; the creation of a medical and statistical service that adapted to the specifics of market relations in health care, flexible and sensitive to changes in international methodologies and standards. Conclusions: The data of medical statistics are the basis for taking managerial decisions by managers at all levels of health care. Reform of Medical Statistics Service of Ukraine should be considered only in the context of the reform of the entire health system. The main directions of the reform of the medical statistics service in Ukraine are: the introduction of information technologies, the improvement of the training of personnel for the service, the improvement of material and technical equipment, the maximum reuse of the data obtained, which provides for the unification of primary data and a system of indicators. The most difficult area is the formation of information funds and the introduction of modern information technologies.

  19. [The main directions of reforming the service of medical statistics in Ukraine].

    PubMed

    Golubchykov, Mykhailo V; Orlova, Nataliia M; Bielikova, Inna V

    Introduction: Implementation of new methods of information support of managerial decision-making should ensure of the effective health system reform and create conditions for improving the quality of operational management, reasonable planning of medical care and increasing the efficiency of the use of system resources. Reforming of Medical Statistics Service of Ukraine should be considered only in the context of the reform of the entire health system. The aim: This work is an analysis of the current situation and justification of the main directions of reforming of Medical Statistics Service of Ukraine. Material and methods: In the work is used a range of methods: content analysis, bibliosemantic, systematic approach. The information base of the research became: WHO strategic and program documents, data of the Medical Statistics Center of the Ministry of Health of Ukraine. Review: The Medical Statistics Service of Ukraine has a completed and effective structure, headed by the State Institution "Medical Statistics Center of the Ministry of Health of Ukraine." This institution reports on behalf of the Ministry of Health of Ukraine to the State Statistical Service of Ukraine, the WHO European Office and other international organizations. An analysis of the current situation showed that to achieve this goal it is necessary: to improve the system of statistical indicators for an adequate assessment of the performance of health institutions, including in the economic aspect; creation of a developed medical and statistical base of administrative territories; change of existing technologies for the formation of information resources; strengthening the material-technical base of the structural units of Medical Statistics Service; improvement of the system of training and retraining of personnel for the service of medical statistics; development of international cooperation in the field of methodology and practice of medical statistics, implementation of internationally accepted methods for collecting, processing, analyzing and disseminating medical and statistical information; the creation of a medical and statistical service that adapted to the specifics of market relations in health care, flexible and sensitive to changes in international methodologies and standards. Conclusions: The data of medical statistics are the basis for taking managerial decisions by managers at all levels of health care. Reform of Medical Statistics Service of Ukraine should be considered only in the context of the reform of the entire health system. The main directions of the reform of the medical statistics service in Ukraine are: the introduction of information technologies, the improvement of the training of personnel for the service, the improvement of material and technical equipment, the maximum reuse of the data obtained, which provides for the unification of primary data and a system of indicators. The most difficult area is the formation of information funds and the introduction of modern information technologies.

  20. Gene- and pathway-based association tests for multiple traits with GWAS summary statistics.

    PubMed

    Kwak, Il-Youp; Pan, Wei

    2017-01-01

    To identify novel genetic variants associated with complex traits and to shed new insights on underlying biology, in addition to the most popular single SNP-single trait association analysis, it would be useful to explore multiple correlated (intermediate) traits at the gene- or pathway-level by mining existing single GWAS or meta-analyzed GWAS data. For this purpose, we present an adaptive gene-based test and a pathway-based test for association analysis of multiple traits with GWAS summary statistics. The proposed tests are adaptive at both the SNP- and trait-levels; that is, they account for possibly varying association patterns (e.g. signal sparsity levels) across SNPs and traits, thus maintaining high power across a wide range of situations. Furthermore, the proposed methods are general: they can be applied to mixed types of traits, and to Z-statistics or P-values as summary statistics obtained from either a single GWAS or a meta-analysis of multiple GWAS. Our numerical studies with simulated and real data demonstrated the promising performance of the proposed methods. The methods are implemented in R package aSPU, freely and publicly available at: https://cran.r-project.org/web/packages/aSPU/ CONTACT: weip@biostat.umn.eduSupplementary information: Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  1. Structure-Specific Statistical Mapping of White Matter Tracts

    PubMed Central

    Yushkevich, Paul A.; Zhang, Hui; Simon, Tony; Gee, James C.

    2008-01-01

    We present a new model-based framework for the statistical analysis of diffusion imaging data associated with specific white matter tracts. The framework takes advantage of the fact that several of the major white matter tracts are thin sheet-like structures that can be effectively modeled by medial representations. The approach involves segmenting major tracts and fitting them with deformable geometric medial models. The medial representation makes it possible to average and combine tensor-based features along directions locally perpendicular to the tracts, thus reducing data dimensionality and accounting for errors in normalization. The framework enables the analysis of individual white matter structures, and provides a range of possibilities for computing statistics and visualizing differences between cohorts. The framework is demonstrated in a study of white matter differences in pediatric chromosome 22q11.2 deletion syndrome. PMID:18407524

  2. IGESS: a statistical approach to integrating individual-level genotype data and summary statistics in genome-wide association studies.

    PubMed

    Dai, Mingwei; Ming, Jingsi; Cai, Mingxuan; Liu, Jin; Yang, Can; Wan, Xiang; Xu, Zongben

    2017-09-15

    Results from genome-wide association studies (GWAS) suggest that a complex phenotype is often affected by many variants with small effects, known as 'polygenicity'. Tens of thousands of samples are often required to ensure statistical power of identifying these variants with small effects. However, it is often the case that a research group can only get approval for the access to individual-level genotype data with a limited sample size (e.g. a few hundreds or thousands). Meanwhile, summary statistics generated using single-variant-based analysis are becoming publicly available. The sample sizes associated with the summary statistics datasets are usually quite large. How to make the most efficient use of existing abundant data resources largely remains an open question. In this study, we propose a statistical approach, IGESS, to increasing statistical power of identifying risk variants and improving accuracy of risk prediction by i ntegrating individual level ge notype data and s ummary s tatistics. An efficient algorithm based on variational inference is developed to handle the genome-wide analysis. Through comprehensive simulation studies, we demonstrated the advantages of IGESS over the methods which take either individual-level data or summary statistics data as input. We applied IGESS to perform integrative analysis of Crohns Disease from WTCCC and summary statistics from other studies. IGESS was able to significantly increase the statistical power of identifying risk variants and improve the risk prediction accuracy from 63.2% ( ±0.4% ) to 69.4% ( ±0.1% ) using about 240 000 variants. The IGESS software is available at https://github.com/daviddaigithub/IGESS . zbxu@xjtu.edu.cn or xwan@comp.hkbu.edu.hk or eeyang@hkbu.edu.hk. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com

  3. K-means cluster analysis of tourist destination in special region of Yogyakarta using spatial approach and social network analysis (a case study: post of @explorejogja instagram account in 2016)

    NASA Astrophysics Data System (ADS)

    Iswandhani, N.; Muhajir, M.

    2018-03-01

    This research was conducted in Department of Statistics Islamic University of Indonesia. The data used are primary data obtained by post @explorejogja instagram account from January until December 2016. In the @explorejogja instagram account found many tourist destinations that can be visited by tourists both in the country and abroad, Therefore it is necessary to form a cluster of existing tourist destinations based on the number of likes from user instagram assumed as the most popular. The purpose of this research is to know the most popular distribution of tourist spot, the cluster formation of tourist destinations, and central popularity of tourist destinations based on @explorejogja instagram account in 2016. Statistical analysis used is descriptive statistics, k-means clustering, and social network analysis. The results of this research were obtained the top 10 most popular destinations in Yogyakarta, map of html-based tourist destination distribution consisting of 121 tourist destination points, formed 3 clusters each consisting of cluster 1 with 52 destinations, cluster 2 with 9 destinations and cluster 3 with 60 destinations, and Central popularity of tourist destinations in the special region of Yogyakarta by district.

  4. MODEL ANALYSIS OF RIPARIAN BUFFER EFFECTIVENESS FOR REDUCING NUTRIENT INPUTS TO STREAMS IN AGRICULTURAL LANDSCAPES

    EPA Science Inventory

    Federal and state agencies responsible for protecting water quality rely mainly on statistically-based methods to assess and manage risks to the nation's streams, lakes and estuaries. Although statistical approaches provide valuable information on current trends in water quality...

  5. Event coincidence analysis for quantifying statistical interrelationships between event time series. On the role of flood events as triggers of epidemic outbreaks

    NASA Astrophysics Data System (ADS)

    Donges, J. F.; Schleussner, C.-F.; Siegmund, J. F.; Donner, R. V.

    2016-05-01

    Studying event time series is a powerful approach for analyzing the dynamics of complex dynamical systems in many fields of science. In this paper, we describe the method of event coincidence analysis to provide a framework for quantifying the strength, directionality and time lag of statistical interrelationships between event series. Event coincidence analysis allows to formulate and test null hypotheses on the origin of the observed interrelationships including tests based on Poisson processes or, more generally, stochastic point processes with a prescribed inter-event time distribution and other higher-order properties. Applying the framework to country-level observational data yields evidence that flood events have acted as triggers of epidemic outbreaks globally since the 1950s. Facing projected future changes in the statistics of climatic extreme events, statistical techniques such as event coincidence analysis will be relevant for investigating the impacts of anthropogenic climate change on human societies and ecosystems worldwide.

  6. Landing Site Dispersion Analysis and Statistical Assessment for the Mars Phoenix Lander

    NASA Technical Reports Server (NTRS)

    Bonfiglio, Eugene P.; Adams, Douglas; Craig, Lynn; Spencer, David A.; Strauss, William; Seelos, Frank P.; Seelos, Kimberly D.; Arvidson, Ray; Heet, Tabatha

    2008-01-01

    The Mars Phoenix Lander launched on August 4, 2007 and successfully landed on Mars 10 months later on May 25, 2008. Landing ellipse predicts and hazard maps were key in selecting safe surface targets for Phoenix. Hazard maps were based on terrain slopes, geomorphology maps and automated rock counts of MRO's High Resolution Imaging Science Experiment (HiRISE) images. The expected landing dispersion which led to the selection of Phoenix's surface target is discussed as well as the actual landing dispersion predicts determined during operations in the weeks, days, and hours before landing. A statistical assessment of these dispersions is performed, comparing the actual landing-safety probabilities to criteria levied by the project. Also discussed are applications for this statistical analysis which were used by the Phoenix project. These include using the statistical analysis used to verify the effectiveness of a pre-planned maneuver menu and calculating the probability of future maneuvers.

  7. Parametric Analysis to Study the Influence of Aerogel-Based Renders' Components on Thermal and Mechanical Performance.

    PubMed

    Ximenes, Sofia; Silva, Ana; Soares, António; Flores-Colen, Inês; de Brito, Jorge

    2016-05-04

    Statistical models using multiple linear regression are some of the most widely used methods to study the influence of independent variables in a given phenomenon. This study's objective is to understand the influence of the various components of aerogel-based renders on their thermal and mechanical performance, namely cement (three types), fly ash, aerial lime, silica sand, expanded clay, type of aerogel, expanded cork granules, expanded perlite, air entrainers, resins (two types), and rheological agent. The statistical analysis was performed using SPSS (Statistical Package for Social Sciences), based on 85 mortar mixes produced in the laboratory and on their values of thermal conductivity and compressive strength obtained using tests in small-scale samples. The results showed that aerial lime assumes the main role in improving the thermal conductivity of the mortars. Aerogel type, fly ash, expanded perlite and air entrainers are also relevant components for a good thermal conductivity. Expanded clay can improve the mechanical behavior and aerogel has the opposite effect.

  8. Automated Cognitive Health Assessment From Smart Home-Based Behavior Data.

    PubMed

    Dawadi, Prafulla Nath; Cook, Diane Joyce; Schmitter-Edgecombe, Maureen

    2016-07-01

    Smart home technologies offer potential benefits for assisting clinicians by automating health monitoring and well-being assessment. In this paper, we examine the actual benefits of smart home-based analysis by monitoring daily behavior in the home and predicting clinical scores of the residents. To accomplish this goal, we propose a clinical assessment using activity behavior (CAAB) approach to model a smart home resident's daily behavior and predict the corresponding clinical scores. CAAB uses statistical features that describe characteristics of a resident's daily activity performance to train machine learning algorithms that predict the clinical scores. We evaluate the performance of CAAB utilizing smart home sensor data collected from 18 smart homes over two years. We obtain a statistically significant correlation ( r=0.72) between CAAB-predicted and clinician-provided cognitive scores and a statistically significant correlation ( r=0.45) between CAAB-predicted and clinician-provided mobility scores. These prediction results suggest that it is feasible to predict clinical scores using smart home sensor data and learning-based data analysis.

  9. Parametric Analysis to Study the Influence of Aerogel-Based Renders’ Components on Thermal and Mechanical Performance

    PubMed Central

    Ximenes, Sofia; Silva, Ana; Soares, António; Flores-Colen, Inês; de Brito, Jorge

    2016-01-01

    Statistical models using multiple linear regression are some of the most widely used methods to study the influence of independent variables in a given phenomenon. This study’s objective is to understand the influence of the various components of aerogel-based renders on their thermal and mechanical performance, namely cement (three types), fly ash, aerial lime, silica sand, expanded clay, type of aerogel, expanded cork granules, expanded perlite, air entrainers, resins (two types), and rheological agent. The statistical analysis was performed using SPSS (Statistical Package for Social Sciences), based on 85 mortar mixes produced in the laboratory and on their values of thermal conductivity and compressive strength obtained using tests in small-scale samples. The results showed that aerial lime assumes the main role in improving the thermal conductivity of the mortars. Aerogel type, fly ash, expanded perlite and air entrainers are also relevant components for a good thermal conductivity. Expanded clay can improve the mechanical behavior and aerogel has the opposite effect. PMID:28773460

  10. Haplotype-based association analysis of general cognitive ability in Generation Scotland, the English Longitudinal Study of Ageing, and UK Biobank.

    PubMed

    Howard, David M; Adams, Mark J; Clarke, Toni-Kim; Wigmore, Eleanor M; Zeng, Yanni; Hagenaars, Saskia P; Lyall, Donald M; Thomson, Pippa A; Evans, Kathryn L; Porteous, David J; Nagy, Reka; Hayward, Caroline; Haley, Chris S; Smith, Blair H; Murray, Alison D; Batty, G David; Deary, Ian J; McIntosh, Andrew M

    2017-01-01

    Cognitive ability is a heritable trait with a polygenic architecture, for which several associated variants have been identified using genotype-based and candidate gene approaches. Haplotype-based analyses are a complementary technique that take phased genotype data into account, and potentially provide greater statistical power to detect lower frequency variants. In the present analysis, three cohort studies (n total = 48,002) were utilised: Generation Scotland: Scottish Family Health Study (GS:SFHS), the English Longitudinal Study of Ageing (ELSA), and the UK Biobank. A genome-wide haplotype-based meta-analysis of cognitive ability was performed, as well as a targeted meta-analysis of several gene coding regions. None of the assessed haplotypes provided evidence of a statistically significant association with cognitive ability in either the individual cohorts or the meta-analysis. Within the meta-analysis, the haplotype with the lowest observed P -value overlapped with the D-amino acid oxidase activator ( DAOA ) gene coding region. This coding region has previously been associated with bipolar disorder, schizophrenia and Alzheimer's disease, which have all been shown to impact upon cognitive ability. Another potentially interesting region highlighted within the current genome-wide association analysis (GS:SFHS: P = 4.09 x 10 -7 ), was the butyrylcholinesterase ( BCHE ) gene coding region. The protein encoded by BCHE has been shown to influence the progression of Alzheimer's disease and its role in cognitive ability merits further investigation. Although no evidence was found for any haplotypes with a statistically significant association with cognitive ability, our results did provide further evidence that the genetic variants contributing to the variance of cognitive ability are likely to be of small effect.

  11. A Dynamic Intrusion Detection System Based on Multivariate Hotelling's T2 Statistics Approach for Network Environments

    PubMed Central

    Avalappampatty Sivasamy, Aneetha; Sundan, Bose

    2015-01-01

    The ever expanding communication requirements in today's world demand extensive and efficient network systems with equally efficient and reliable security features integrated for safe, confident, and secured communication and data transfer. Providing effective security protocols for any network environment, therefore, assumes paramount importance. Attempts are made continuously for designing more efficient and dynamic network intrusion detection models. In this work, an approach based on Hotelling's T2 method, a multivariate statistical analysis technique, has been employed for intrusion detection, especially in network environments. Components such as preprocessing, multivariate statistical analysis, and attack detection have been incorporated in developing the multivariate Hotelling's T2 statistical model and necessary profiles have been generated based on the T-square distance metrics. With a threshold range obtained using the central limit theorem, observed traffic profiles have been classified either as normal or attack types. Performance of the model, as evaluated through validation and testing using KDD Cup'99 dataset, has shown very high detection rates for all classes with low false alarm rates. Accuracy of the model presented in this work, in comparison with the existing models, has been found to be much better. PMID:26357668

  12. A Dynamic Intrusion Detection System Based on Multivariate Hotelling's T2 Statistics Approach for Network Environments.

    PubMed

    Sivasamy, Aneetha Avalappampatty; Sundan, Bose

    2015-01-01

    The ever expanding communication requirements in today's world demand extensive and efficient network systems with equally efficient and reliable security features integrated for safe, confident, and secured communication and data transfer. Providing effective security protocols for any network environment, therefore, assumes paramount importance. Attempts are made continuously for designing more efficient and dynamic network intrusion detection models. In this work, an approach based on Hotelling's T(2) method, a multivariate statistical analysis technique, has been employed for intrusion detection, especially in network environments. Components such as preprocessing, multivariate statistical analysis, and attack detection have been incorporated in developing the multivariate Hotelling's T(2) statistical model and necessary profiles have been generated based on the T-square distance metrics. With a threshold range obtained using the central limit theorem, observed traffic profiles have been classified either as normal or attack types. Performance of the model, as evaluated through validation and testing using KDD Cup'99 dataset, has shown very high detection rates for all classes with low false alarm rates. Accuracy of the model presented in this work, in comparison with the existing models, has been found to be much better.

  13. A spatial scan statistic for survival data based on Weibull distribution.

    PubMed

    Bhatt, Vijaya; Tiwari, Neeraj

    2014-05-20

    The spatial scan statistic has been developed as a geographical cluster detection analysis tool for different types of data sets such as Bernoulli, Poisson, ordinal, normal and exponential. We propose a scan statistic for survival data based on Weibull distribution. It may also be used for other survival distributions, such as exponential, gamma, and log normal. The proposed method is applied on the survival data of tuberculosis patients for the years 2004-2005 in Nainital district of Uttarakhand, India. Simulation studies reveal that the proposed method performs well for different survival distribution functions. Copyright © 2013 John Wiley & Sons, Ltd.

  14. Error Analysis for RADAR Neighbor Matching Localization in Linear Logarithmic Strength Varying Wi-Fi Environment

    PubMed Central

    Tian, Zengshan; Xu, Kunjie; Yu, Xiang

    2014-01-01

    This paper studies the statistical errors for the fingerprint-based RADAR neighbor matching localization with the linearly calibrated reference points (RPs) in logarithmic received signal strength (RSS) varying Wi-Fi environment. To the best of our knowledge, little comprehensive analysis work has appeared on the error performance of neighbor matching localization with respect to the deployment of RPs. However, in order to achieve the efficient and reliable location-based services (LBSs) as well as the ubiquitous context-awareness in Wi-Fi environment, much attention has to be paid to the highly accurate and cost-efficient localization systems. To this end, the statistical errors by the widely used neighbor matching localization are significantly discussed in this paper to examine the inherent mathematical relations between the localization errors and the locations of RPs by using a basic linear logarithmic strength varying model. Furthermore, based on the mathematical demonstrations and some testing results, the closed-form solutions to the statistical errors by RADAR neighbor matching localization can be an effective tool to explore alternative deployment of fingerprint-based neighbor matching localization systems in the future. PMID:24683349

  15. Error analysis for RADAR neighbor matching localization in linear logarithmic strength varying Wi-Fi environment.

    PubMed

    Zhou, Mu; Tian, Zengshan; Xu, Kunjie; Yu, Xiang; Wu, Haibo

    2014-01-01

    This paper studies the statistical errors for the fingerprint-based RADAR neighbor matching localization with the linearly calibrated reference points (RPs) in logarithmic received signal strength (RSS) varying Wi-Fi environment. To the best of our knowledge, little comprehensive analysis work has appeared on the error performance of neighbor matching localization with respect to the deployment of RPs. However, in order to achieve the efficient and reliable location-based services (LBSs) as well as the ubiquitous context-awareness in Wi-Fi environment, much attention has to be paid to the highly accurate and cost-efficient localization systems. To this end, the statistical errors by the widely used neighbor matching localization are significantly discussed in this paper to examine the inherent mathematical relations between the localization errors and the locations of RPs by using a basic linear logarithmic strength varying model. Furthermore, based on the mathematical demonstrations and some testing results, the closed-form solutions to the statistical errors by RADAR neighbor matching localization can be an effective tool to explore alternative deployment of fingerprint-based neighbor matching localization systems in the future.

  16. Interpretation of correlations in clinical research.

    PubMed

    Hung, Man; Bounsanga, Jerry; Voss, Maren Wright

    2017-11-01

    Critically analyzing research is a key skill in evidence-based practice and requires knowledge of research methods, results interpretation, and applications, all of which rely on a foundation based in statistics. Evidence-based practice makes high demands on trained medical professionals to interpret an ever-expanding array of research evidence. As clinical training emphasizes medical care rather than statistics, it is useful to review the basics of statistical methods and what they mean for interpreting clinical studies. We reviewed the basic concepts of correlational associations, violations of normality, unobserved variable bias, sample size, and alpha inflation. The foundations of causal inference were discussed and sound statistical analyses were examined. We discuss four ways in which correlational analysis is misused, including causal inference overreach, over-reliance on significance, alpha inflation, and sample size bias. Recent published studies in the medical field provide evidence of causal assertion overreach drawn from correlational findings. The findings present a primer on the assumptions and nature of correlational methods of analysis and urge clinicians to exercise appropriate caution as they critically analyze the evidence before them and evaluate evidence that supports practice. Critically analyzing new evidence requires statistical knowledge in addition to clinical knowledge. Studies can overstate relationships, expressing causal assertions when only correlational evidence is available. Failure to account for the effect of sample size in the analyses tends to overstate the importance of predictive variables. It is important not to overemphasize the statistical significance without consideration of effect size and whether differences could be considered clinically meaningful.

  17. No-Reference Video Quality Assessment Based on Statistical Analysis in 3D-DCT Domain.

    PubMed

    Li, Xuelong; Guo, Qun; Lu, Xiaoqiang

    2016-05-13

    It is an important task to design models for universal no-reference video quality assessment (NR-VQA) in multiple video processing and computer vision applications. However, most existing NR-VQA metrics are designed for specific distortion types which are not often aware in practical applications. A further deficiency is that the spatial and temporal information of videos is hardly considered simultaneously. In this paper, we propose a new NR-VQA metric based on the spatiotemporal natural video statistics (NVS) in 3D discrete cosine transform (3D-DCT) domain. In the proposed method, a set of features are firstly extracted based on the statistical analysis of 3D-DCT coefficients to characterize the spatiotemporal statistics of videos in different views. These features are used to predict the perceived video quality via the efficient linear support vector regression (SVR) model afterwards. The contributions of this paper are: 1) we explore the spatiotemporal statistics of videos in 3DDCT domain which has the inherent spatiotemporal encoding advantage over other widely used 2D transformations; 2) we extract a small set of simple but effective statistical features for video visual quality prediction; 3) the proposed method is universal for multiple types of distortions and robust to different databases. The proposed method is tested on four widely used video databases. Extensive experimental results demonstrate that the proposed method is competitive with the state-of-art NR-VQA metrics and the top-performing FR-VQA and RR-VQA metrics.

  18. Application of multivariate statistical techniques for differentiation of ripe banana flour based on the composition of elements.

    PubMed

    Alkarkhi, Abbas F M; Ramli, Saifullah Bin; Easa, Azhar Mat

    2009-01-01

    Major (sodium, potassium, calcium, magnesium) and minor elements (iron, copper, zinc, manganese) and one heavy metal (lead) of Cavendish banana flour and Dream banana flour were determined, and data were analyzed using multivariate statistical techniques of factor analysis and discriminant analysis. Factor analysis yielded four factors explaining more than 81% of the total variance: the first factor explained 28.73%, comprising magnesium, sodium, and iron; the second factor explained 21.47%, comprising only manganese and copper; the third factor explained 15.66%, comprising zinc and lead; while the fourth factor explained 15.50%, comprising potassium. Discriminant analysis showed that magnesium and sodium exhibited a strong contribution in discriminating the two types of banana flour, affording 100% correct assignation. This study presents the usefulness of multivariate statistical techniques for analysis and interpretation of complex mineral content data from banana flour of different varieties.

  19. Technology, Data Bases and System Analysis for Space-to-Ground Optical Communications

    NASA Technical Reports Server (NTRS)

    Lesh, James

    1995-01-01

    Optical communications is becoming an ever-increasingly important option for designers of space-to- ground communications links, whether it be for government or commercial applications. In this paper the technology being developed by NASA for use in space-to-ground optical communications is presented. Next, a program which is collecting a long term data base of atmospheric visibility statistics for optical propagation through the atmosphere will be described. Finally, a methodology for utilizing the statistics of the atmospheric data base in the analysis of space-to-ground links will be presented. This methodology takes into account the effects of station availability, is useful when comparing optical communications with microwave systems, and provides a rationale establishing the recommended link margin.

  20. Towards evidence-based computational statistics: lessons from clinical research on the role and design of real-data benchmark studies.

    PubMed

    Boulesteix, Anne-Laure; Wilson, Rory; Hapfelmeier, Alexander

    2017-09-09

    The goal of medical research is to develop interventions that are in some sense superior, with respect to patient outcome, to interventions currently in use. Similarly, the goal of research in methodological computational statistics is to develop data analysis tools that are themselves superior to the existing tools. The methodology of the evaluation of medical interventions continues to be discussed extensively in the literature and it is now well accepted that medicine should be at least partly "evidence-based". Although we statisticians are convinced of the importance of unbiased, well-thought-out study designs and evidence-based approaches in the context of clinical research, we tend to ignore these principles when designing our own studies for evaluating statistical methods in the context of our methodological research. In this paper, we draw an analogy between clinical trials and real-data-based benchmarking experiments in methodological statistical science, with datasets playing the role of patients and methods playing the role of medical interventions. Through this analogy, we suggest directions for improvement in the design and interpretation of studies which use real data to evaluate statistical methods, in particular with respect to dataset inclusion criteria and the reduction of various forms of bias. More generally, we discuss the concept of "evidence-based" statistical research, its limitations and its impact on the design and interpretation of real-data-based benchmark experiments. We suggest that benchmark studies-a method of assessment of statistical methods using real-world datasets-might benefit from adopting (some) concepts from evidence-based medicine towards the goal of more evidence-based statistical research.

  1. Analysis of uncertainties and convergence of the statistical quantities in turbulent wall-bounded flows by means of a physically based criterion

    NASA Astrophysics Data System (ADS)

    Andrade, João Rodrigo; Martins, Ramon Silva; Thompson, Roney Leon; Mompean, Gilmar; da Silveira Neto, Aristeu

    2018-04-01

    The present paper provides an analysis of the statistical uncertainties associated with direct numerical simulation (DNS) results and experimental data for turbulent channel and pipe flows, showing a new physically based quantification of these errors, to improve the determination of the statistical deviations between DNSs and experiments. The analysis is carried out using a recently proposed criterion by Thompson et al. ["A methodology to evaluate statistical errors in DNS data of plane channel flows," Comput. Fluids 130, 1-7 (2016)] for fully turbulent plane channel flows, where the mean velocity error is estimated by considering the Reynolds stress tensor, and using the balance of the mean force equation. It also presents how the residual error evolves in time for a DNS of a plane channel flow, and the influence of the Reynolds number on its convergence rate. The root mean square of the residual error is shown in order to capture a single quantitative value of the error associated with the dimensionless averaging time. The evolution in time of the error norm is compared with the final error provided by DNS data of similar Reynolds numbers available in the literature. A direct consequence of this approach is that it was possible to compare different numerical results and experimental data, providing an improved understanding of the convergence of the statistical quantities in turbulent wall-bounded flows.

  2. Objective research of auscultation signals in Traditional Chinese Medicine based on wavelet packet energy and support vector machine.

    PubMed

    Yan, Jianjun; Shen, Xiaojing; Wang, Yiqin; Li, Fufeng; Xia, Chunming; Guo, Rui; Chen, Chunfeng; Shen, Qingwei

    2010-01-01

    This study aims at utilising Wavelet Packet Transform (WPT) and Support Vector Machine (SVM) algorithm to make objective analysis and quantitative research for the auscultation in Traditional Chinese Medicine (TCM) diagnosis. First, Wavelet Packet Decomposition (WPD) at level 6 was employed to split more elaborate frequency bands of the auscultation signals. Then statistic analysis was made based on the extracted Wavelet Packet Energy (WPE) features from WPD coefficients. Furthermore, the pattern recognition was used to distinguish mixed subjects' statistical feature values of sample groups through SVM. Finally, the experimental results showed that the classification accuracies were at a high level.

  3. Soil erosion assessment and its correlation with landslide events using remote sensing data and GIS: a case study at Penang Island, Malaysia.

    PubMed

    Pradhan, Biswajeet; Chaudhari, Amruta; Adinarayana, J; Buchroithner, Manfred F

    2012-01-01

    In this paper, an attempt has been made to assess, prognosis and observe dynamism of soil erosion by universal soil loss equation (USLE) method at Penang Island, Malaysia. Multi-source (map-, space- and ground-based) datasets were used to obtain both static and dynamic factors of USLE, and an integrated analysis was carried out in raster format of GIS. A landslide location map was generated on the basis of image elements interpretation from aerial photos, satellite data and field observations and was used to validate soil erosion intensity in the study area. Further, a statistical-based frequency ratio analysis was carried out in the study area for correlation purposes. The results of the statistical correlation showed a satisfactory agreement between the prepared USLE-based soil erosion map and landslide events/locations, and are directly proportional to each other. Prognosis analysis on soil erosion helps the user agencies/decision makers to design proper conservation planning program to reduce soil erosion. Temporal statistics on soil erosion in these dynamic and rapid developments in Penang Island indicate the co-existence and balance of ecosystem.

  4. Longitudinal Assessment of Self-Reported Recent Back Pain and Combat Deployment in the Millennium Cohort Study

    DTIC Science & Technology

    2016-11-15

    participants who were followed for the development of back pain for an average of 3.9 years. Methods. Descriptive statistics and longitudinal...health, military personnel, occupational health, outcome assessment, statistics, survey methodology . Level of Evidence: 3 Spine 2016;41:1754–1763ack...based on the National Health and Nutrition Examination Survey.21 Statistical Analysis Descriptive and univariate analyses compared character- istics

  5. [The principal components analysis--method to classify the statistical variables with applications in medicine].

    PubMed

    Dascălu, Cristina Gena; Antohe, Magda Ecaterina

    2009-01-01

    Based on the eigenvalues and the eigenvectors analysis, the principal component analysis has the purpose to identify the subspace of the main components from a set of parameters, which are enough to characterize the whole set of parameters. Interpreting the data for analysis as a cloud of points, we find through geometrical transformations the directions where the cloud's dispersion is maximal--the lines that pass through the cloud's center of weight and have a maximal density of points around them (by defining an appropriate criteria function and its minimization. This method can be successfully used in order to simplify the statistical analysis on questionnaires--because it helps us to select from a set of items only the most relevant ones, which cover the variations of the whole set of data. For instance, in the presented sample we started from a questionnaire with 28 items and, applying the principal component analysis we identified 7 principal components--or main items--fact that simplifies significantly the further data statistical analysis.

  6. Effects of atmospheric turbulence on microwave and millimeter wave satellite communications systems. [attenuation statistics and antenna design

    NASA Technical Reports Server (NTRS)

    Devasirvatham, D. M. J.; Hodge, D. B.

    1981-01-01

    A model of the microwave and millimeter wave link in the presence of atmospheric turbulence is presented with emphasis on satellite communications systems. The analysis is based on standard methods of statistical theory. The results are directly usable by the design engineer.

  7. STATISTICAL TECHNIQUES FOR DETERMINATION AND PREDICTION OF FUNDAMENTAL FISH ASSEMBLAGES OF THE MID-ATLANTIC HIGHLANDS

    EPA Science Inventory

    A statistical software tool, Stream Fish Community Predictor (SFCP), based on EMAP stream sampling in the mid-Atlantic Highlands, was developed to predict stream fish communities using stream and watershed characteristics. Step one in the tool development was a cluster analysis t...

  8. Which statistics should tropical biologists learn?

    PubMed

    Loaiza Velásquez, Natalia; González Lutz, María Isabel; Monge-Nájera, Julián

    2011-09-01

    Tropical biologists study the richest and most endangered biodiversity in the planet, and in these times of climate change and mega-extinctions, the need for efficient, good quality research is more pressing than in the past. However, the statistical component in research published by tropical authors sometimes suffers from poor quality in data collection; mediocre or bad experimental design and a rigid and outdated view of data analysis. To suggest improvements in their statistical education, we listed all the statistical tests and other quantitative analyses used in two leading tropical journals, the Revista de Biología Tropical and Biotropica, during a year. The 12 most frequent tests in the articles were: Analysis of Variance (ANOVA), Chi-Square Test, Student's T Test, Linear Regression, Pearson's Correlation Coefficient, Mann-Whitney U Test, Kruskal-Wallis Test, Shannon's Diversity Index, Tukey's Test, Cluster Analysis, Spearman's Rank Correlation Test and Principal Component Analysis. We conclude that statistical education for tropical biologists must abandon the old syllabus based on the mathematical side of statistics and concentrate on the correct selection of these and other procedures and tests, on their biological interpretation and on the use of reliable and friendly freeware. We think that their time will be better spent understanding and protecting tropical ecosystems than trying to learn the mathematical foundations of statistics: in most cases, a well designed one-semester course should be enough for their basic requirements.

  9. Statistical power analysis of cardiovascular safety pharmacology studies in conscious rats.

    PubMed

    Bhatt, Siddhartha; Li, Dingzhou; Flynn, Declan; Wisialowski, Todd; Hemkens, Michelle; Steidl-Nichols, Jill

    2016-01-01

    Cardiovascular (CV) toxicity and related attrition are a major challenge for novel therapeutic entities and identifying CV liability early is critical for effective derisking. CV safety pharmacology studies in rats are a valuable tool for early investigation of CV risk. Thorough understanding of data analysis techniques and statistical power of these studies is currently lacking and is imperative for enabling sound decision-making. Data from 24 crossover and 12 parallel design CV telemetry rat studies were used for statistical power calculations. Average values of telemetry parameters (heart rate, blood pressure, body temperature, and activity) were logged every 60s (from 1h predose to 24h post-dose) and reduced to 15min mean values. These data were subsequently binned into super intervals for statistical analysis. A repeated measure analysis of variance was used for statistical analysis of crossover studies and a repeated measure analysis of covariance was used for parallel studies. Statistical power analysis was performed to generate power curves and establish relationships between detectable CV (blood pressure and heart rate) changes and statistical power. Additionally, data from a crossover CV study with phentolamine at 4, 20 and 100mg/kg are reported as a representative example of data analysis methods. Phentolamine produced a CV profile characteristic of alpha adrenergic receptor antagonism, evidenced by a dose-dependent decrease in blood pressure and reflex tachycardia. Detectable blood pressure changes at 80% statistical power for crossover studies (n=8) were 4-5mmHg. For parallel studies (n=8), detectable changes at 80% power were 6-7mmHg. Detectable heart rate changes for both study designs were 20-22bpm. Based on our results, the conscious rat CV model is a sensitive tool to detect and mitigate CV risk in early safety studies. Furthermore, these results will enable informed selection of appropriate models and study design for early stage CV studies. Copyright © 2016 Elsevier Inc. All rights reserved.

  10. Diagnosis of students' ability in a statistical course based on Rasch probabilistic outcome

    NASA Astrophysics Data System (ADS)

    Mahmud, Zamalia; Ramli, Wan Syahira Wan; Sapri, Shamsiah; Ahmad, Sanizah

    2017-06-01

    Measuring students' ability and performance are important in assessing how well students have learned and mastered the statistical courses. Any improvement in learning will depend on the student's approaches to learning, which are relevant to some factors of learning, namely assessment methods carrying out tasks consisting of quizzes, tests, assignment and final examination. This study has attempted an alternative approach to measure students' ability in an undergraduate statistical course based on the Rasch probabilistic model. Firstly, this study aims to explore the learning outcome patterns of students in a statistics course (Applied Probability and Statistics) based on an Entrance-Exit survey. This is followed by investigating students' perceived learning ability based on four Course Learning Outcomes (CLOs) and students' actual learning ability based on their final examination scores. Rasch analysis revealed that students perceived themselves as lacking the ability to understand about 95% of the statistics concepts at the beginning of the class but eventually they had a good understanding at the end of the 14 weeks class. In terms of students' performance in their final examination, their ability in understanding the topics varies at different probability values given the ability of the students and difficulty of the questions. Majority found the probability and counting rules topic to be the most difficult to learn.

  11. Point-by-point compositional analysis for atom probe tomography.

    PubMed

    Stephenson, Leigh T; Ceguerra, Anna V; Li, Tong; Rojhirunsakool, Tanaporn; Nag, Soumya; Banerjee, Rajarshi; Cairney, Julie M; Ringer, Simon P

    2014-01-01

    This new alternate approach to data processing for analyses that traditionally employed grid-based counting methods is necessary because it removes a user-imposed coordinate system that not only limits an analysis but also may introduce errors. We have modified the widely used "binomial" analysis for APT data by replacing grid-based counting with coordinate-independent nearest neighbour identification, improving the measurements and the statistics obtained, allowing quantitative analysis of smaller datasets, and datasets from non-dilute solid solutions. It also allows better visualisation of compositional fluctuations in the data. Our modifications include:.•using spherical k-atom blocks identified by each detected atom's first k nearest neighbours.•3D data visualisation of block composition and nearest neighbour anisotropy.•using z-statistics to directly compare experimental and expected composition curves. Similar modifications may be made to other grid-based counting analyses (contingency table, Langer-Bar-on-Miller, sinusoidal model) and could be instrumental in developing novel data visualisation options.

  12. Effects of Instructional Design with Mental Model Analysis on Learning.

    ERIC Educational Resources Information Center

    Hong, Eunsook

    This paper presents a model for systematic instructional design that includes mental model analysis together with the procedures used in developing computer-based instructional materials in the area of statistical hypothesis testing. The instructional design model is based on the premise that the objective for learning is to achieve expert-like…

  13. Wastewater-Based Epidemiology of Stimulant Drugs: Functional Data Analysis Compared to Traditional Statistical Methods.

    PubMed

    Salvatore, Stefania; Bramness, Jørgen Gustav; Reid, Malcolm J; Thomas, Kevin Victor; Harman, Christopher; Røislien, Jo

    2015-01-01

    Wastewater-based epidemiology (WBE) is a new methodology for estimating the drug load in a population. Simple summary statistics and specification tests have typically been used to analyze WBE data, comparing differences between weekday and weekend loads. Such standard statistical methods may, however, overlook important nuanced information in the data. In this study, we apply functional data analysis (FDA) to WBE data and compare the results to those obtained from more traditional summary measures. We analysed temporal WBE data from 42 European cities, using sewage samples collected daily for one week in March 2013. For each city, the main temporal features of two selected drugs were extracted using functional principal component (FPC) analysis, along with simpler measures such as the area under the curve (AUC). The individual cities' scores on each of the temporal FPCs were then used as outcome variables in multiple linear regression analysis with various city and country characteristics as predictors. The results were compared to those of functional analysis of variance (FANOVA). The three first FPCs explained more than 99% of the temporal variation. The first component (FPC1) represented the level of the drug load, while the second and third temporal components represented the level and the timing of a weekend peak. AUC was highly correlated with FPC1, but other temporal characteristic were not captured by the simple summary measures. FANOVA was less flexible than the FPCA-based regression, and even showed concordance results. Geographical location was the main predictor for the general level of the drug load. FDA of WBE data extracts more detailed information about drug load patterns during the week which are not identified by more traditional statistical methods. Results also suggest that regression based on FPC results is a valuable addition to FANOVA for estimating associations between temporal patterns and covariate information.

  14. A Powerful Procedure for Pathway-Based Meta-analysis Using Summary Statistics Identifies 43 Pathways Associated with Type II Diabetes in European Populations.

    PubMed

    Zhang, Han; Wheeler, William; Hyland, Paula L; Yang, Yifan; Shi, Jianxin; Chatterjee, Nilanjan; Yu, Kai

    2016-06-01

    Meta-analysis of multiple genome-wide association studies (GWAS) has become an effective approach for detecting single nucleotide polymorphism (SNP) associations with complex traits. However, it is difficult to integrate the readily accessible SNP-level summary statistics from a meta-analysis into more powerful multi-marker testing procedures, which generally require individual-level genetic data. We developed a general procedure called Summary based Adaptive Rank Truncated Product (sARTP) for conducting gene and pathway meta-analysis that uses only SNP-level summary statistics in combination with genotype correlation estimated from a panel of individual-level genetic data. We demonstrated the validity and power advantage of sARTP through empirical and simulated data. We conducted a comprehensive pathway-based meta-analysis with sARTP on type 2 diabetes (T2D) by integrating SNP-level summary statistics from two large studies consisting of 19,809 T2D cases and 111,181 controls with European ancestry. Among 4,713 candidate pathways from which genes in neighborhoods of 170 GWAS established T2D loci were excluded, we detected 43 T2D globally significant pathways (with Bonferroni corrected p-values < 0.05), which included the insulin signaling pathway and T2D pathway defined by KEGG, as well as the pathways defined according to specific gene expression patterns on pancreatic adenocarcinoma, hepatocellular carcinoma, and bladder carcinoma. Using summary data from 8 eastern Asian T2D GWAS with 6,952 cases and 11,865 controls, we showed 7 out of the 43 pathways identified in European populations remained to be significant in eastern Asians at the false discovery rate of 0.1. We created an R package and a web-based tool for sARTP with the capability to analyze pathways with thousands of genes and tens of thousands of SNPs.

  15. A Powerful Procedure for Pathway-Based Meta-analysis Using Summary Statistics Identifies 43 Pathways Associated with Type II Diabetes in European Populations

    PubMed Central

    Zhang, Han; Wheeler, William; Hyland, Paula L.; Yang, Yifan; Shi, Jianxin; Chatterjee, Nilanjan; Yu, Kai

    2016-01-01

    Meta-analysis of multiple genome-wide association studies (GWAS) has become an effective approach for detecting single nucleotide polymorphism (SNP) associations with complex traits. However, it is difficult to integrate the readily accessible SNP-level summary statistics from a meta-analysis into more powerful multi-marker testing procedures, which generally require individual-level genetic data. We developed a general procedure called Summary based Adaptive Rank Truncated Product (sARTP) for conducting gene and pathway meta-analysis that uses only SNP-level summary statistics in combination with genotype correlation estimated from a panel of individual-level genetic data. We demonstrated the validity and power advantage of sARTP through empirical and simulated data. We conducted a comprehensive pathway-based meta-analysis with sARTP on type 2 diabetes (T2D) by integrating SNP-level summary statistics from two large studies consisting of 19,809 T2D cases and 111,181 controls with European ancestry. Among 4,713 candidate pathways from which genes in neighborhoods of 170 GWAS established T2D loci were excluded, we detected 43 T2D globally significant pathways (with Bonferroni corrected p-values < 0.05), which included the insulin signaling pathway and T2D pathway defined by KEGG, as well as the pathways defined according to specific gene expression patterns on pancreatic adenocarcinoma, hepatocellular carcinoma, and bladder carcinoma. Using summary data from 8 eastern Asian T2D GWAS with 6,952 cases and 11,865 controls, we showed 7 out of the 43 pathways identified in European populations remained to be significant in eastern Asians at the false discovery rate of 0.1. We created an R package and a web-based tool for sARTP with the capability to analyze pathways with thousands of genes and tens of thousands of SNPs. PMID:27362418

  16. Cognition, comprehension and application of biostatistics in research by Indian postgraduate students in periodontics.

    PubMed

    Swetha, Jonnalagadda Laxmi; Arpita, Ramisetti; Srikanth, Chintalapani; Nutalapati, Rajasekhar

    2014-01-01

    Biostatistics is an integral part of research protocols. In any field of inquiry or investigation, data obtained is subsequently classified, analyzed and tested for accuracy by statistical methods. Statistical analysis of collected data, thus, forms the basis for all evidence-based conclusions. The aim of this study is to evaluate the cognition, comprehension and application of biostatistics in research among post graduate students in Periodontics, in India. A total of 391 post graduate students registered for a master's course in periodontics at various dental colleges across India were included in the survey. Data regarding the level of knowledge, understanding and its application in design and conduct of the research protocol was collected using a dichotomous questionnaire. A descriptive statistics was used for data analysis. Nearly 79.2% students were aware of the importance of biostatistics in research, 55-65% were familiar with MS-EXCEL spreadsheet for graphical representation of data and with the statistical softwares available on the internet, 26.0% had biostatistics as mandatory subject in their curriculum, 9.5% tried to perform statistical analysis on their own while 3.0% were successful in performing statistical analysis of their studies on their own. Biostatistics should play a central role in planning, conduct, interim analysis, final analysis and reporting of periodontal research especially by the postgraduate students. Indian postgraduate students in periodontics are aware of the importance of biostatistics in research but the level of understanding and application is still basic and needs to be addressed.

  17. A framework for incorporating DTI Atlas Builder registration into Tract-Based Spatial Statistics and a simulated comparison to standard TBSS.

    PubMed

    Leming, Matthew; Steiner, Rachel; Styner, Martin

    2016-02-27

    Tract-based spatial statistics (TBSS) 6 is a software pipeline widely employed in comparative analysis of the white matter integrity from diffusion tensor imaging (DTI) datasets. In this study, we seek to evaluate the relationship between different methods of atlas registration for use with TBSS and different measurements of DTI (fractional anisotropy, FA, axial diffusivity, AD, radial diffusivity, RD, and medial diffusivity, MD). To do so, we have developed a novel tool that builds on existing diffusion atlas building software, integrating it into an adapted version of TBSS called DAB-TBSS (DTI Atlas Builder-Tract-Based Spatial Statistics) by using the advanced registration offered in DTI Atlas Builder 7 . To compare the effectiveness of these two versions of TBSS, we also propose a framework for simulating population differences for diffusion tensor imaging data, providing a more substantive means of empirically comparing DTI group analysis programs such as TBSS. In this study, we used 33 diffusion tensor imaging datasets and simulated group-wise changes in this data by increasing, in three different simulations, the principal eigenvalue (directly altering AD), the second and third eigenvalues (RD), and all three eigenvalues (MD) in the genu, the right uncinate fasciculus, and the left IFO. Additionally, we assessed the benefits of comparing the tensors directly using a functional analysis of diffusion tensor tract statistics (FADTTS 10 ). Our results indicate comparable levels of FA-based detection between DAB-TBSS and TBSS, with standard TBSS registration reporting a higher rate of false positives in other measurements of DTI. Within the simulated changes investigated here, this study suggests that the use of DTI Atlas Builder's registration enhances TBSS group-based studies.

  18. On the Application of Syntactic Methodologies in Automatic Text Analysis.

    ERIC Educational Resources Information Center

    Salton, Gerard; And Others

    1990-01-01

    Summarizes various linguistic approaches proposed for document analysis in information retrieval environments. Topics discussed include syntactic analysis; use of machine-readable dictionary information; knowledge base construction; the PLNLP English Grammar (PEG) system; phrase normalization; and statistical and syntactic phrase evaluation used…

  19. Antiviral treatment of Bell's palsy based on baseline severity: a systematic review and meta-analysis.

    PubMed

    Turgeon, Ricky D; Wilby, Kyle J; Ensom, Mary H H

    2015-06-01

    We conducted a systematic review with meta-analysis to evaluate the efficacy of antiviral agents on complete recovery of Bell's palsy. We searched CENTRAL, Embase, MEDLINE, International Pharmaceutical Abstracts, and sources of unpublished literature to November 1, 2014. Primary and secondary outcomes were complete and satisfactory recovery, respectively. To evaluate statistical heterogeneity, we performed subgroup analysis of baseline severity of Bell's palsy and between-study sensitivity analyses based on risk of allocation and detection bias. The 10 included randomized controlled trials (2419 patients; 807 with severe Bell's palsy at onset) had variable risk of bias, with 9 trials having a high risk of bias in at least 1 domain. Complete recovery was not statistically significantly greater with antiviral use versus no antiviral use in the random-effects meta-analysis of 6 trials (relative risk, 1.06; 95% confidence interval, 0.97-1.16; I(2) = 65%). Conversely, random-effects meta-analysis of 9 trials showed a statistically significant difference in satisfactory recovery (relative risk, 1.10; 95% confidence interval, 1.02-1.18; I(2) = 63%). Response to antiviral agents did not differ visually or statistically between patients with severe symptoms at baseline and those with milder disease (test for interaction, P = .11). Sensitivity analyses did not show a clear effect of bias on outcomes. Antiviral agents are not efficacious in increasing the proportion of patients with Bell's palsy who achieved complete recovery, regardless of baseline symptom severity. Copyright © 2015 Elsevier Inc. All rights reserved.

  20. Comparing statistical and process-based flow duration curve models in ungauged basins and changing rain regimes

    NASA Astrophysics Data System (ADS)

    Müller, M. F.; Thompson, S. E.

    2016-02-01

    The prediction of flow duration curves (FDCs) in ungauged basins remains an important task for hydrologists given the practical relevance of FDCs for water management and infrastructure design. Predicting FDCs in ungauged basins typically requires spatial interpolation of statistical or model parameters. This task is complicated if climate becomes non-stationary, as the prediction challenge now also requires extrapolation through time. In this context, process-based models for FDCs that mechanistically link the streamflow distribution to climate and landscape factors may have an advantage over purely statistical methods to predict FDCs. This study compares a stochastic (process-based) and statistical method for FDC prediction in both stationary and non-stationary contexts, using Nepal as a case study. Under contemporary conditions, both models perform well in predicting FDCs, with Nash-Sutcliffe coefficients above 0.80 in 75 % of the tested catchments. The main drivers of uncertainty differ between the models: parameter interpolation was the main source of error for the statistical model, while violations of the assumptions of the process-based model represented the main source of its error. The process-based approach performed better than the statistical approach in numerical simulations with non-stationary climate drivers. The predictions of the statistical method under non-stationary rainfall conditions were poor if (i) local runoff coefficients were not accurately determined from the gauge network, or (ii) streamflow variability was strongly affected by changes in rainfall. A Monte Carlo analysis shows that the streamflow regimes in catchments characterized by frequent wet-season runoff and a rapid, strongly non-linear hydrologic response are particularly sensitive to changes in rainfall statistics. In these cases, process-based prediction approaches are favored over statistical models.

  1. Model-based reconstruction of synthetic promoter library in Corynebacterium glutamicum.

    PubMed

    Zhang, Shuanghong; Liu, Dingyu; Mao, Zhitao; Mao, Yufeng; Ma, Hongwu; Chen, Tao; Zhao, Xueming; Wang, Zhiwen

    2018-05-01

    To develop an efficient synthetic promoter library for fine-tuned expression of target genes in Corynebacterium glutamicum. A synthetic promoter library for C. glutamicum was developed based on conserved sequences of the - 10 and - 35 regions. The synthetic promoter library covered a wide range of strengths, ranging from 1 to 193% of the tac promoter. 68 promoters were selected and sequenced for correlation analysis between promoter sequence and strength with a statistical model. A new promoter library was further reconstructed with improved promoter strength and coverage based on the results of correlation analysis. Tandem promoter P70 was finally constructed with increased strength by 121% over the tac promoter. The promoter library developed in this study showed a great potential for applications in metabolic engineering and synthetic biology for the optimization of metabolic networks. To the best of our knowledge, this is the first reconstruction of synthetic promoter library based on statistical analysis of C. glutamicum.

  2. Western classical music development: a statistical analysis of composers similarity, differentiation and evolution.

    PubMed

    Georges, Patrick

    2017-01-01

    This paper proposes a statistical analysis that captures similarities and differences between classical music composers with the eventual aim to understand why particular composers 'sound' different even if their 'lineages' (influences network) are similar or why they 'sound' alike if their 'lineages' are different. In order to do this we use statistical methods and measures of association or similarity (based on presence/absence of traits such as specific 'ecological' characteristics and personal musical influences) that have been developed in biosystematics, scientometrics, and bibliographic coupling. This paper also represents a first step towards a more ambitious goal of developing an evolutionary model of Western classical music.

  3. Statistical analysis of flight times for space shuttle ferry flights

    NASA Technical Reports Server (NTRS)

    Graves, M. E.; Perlmutter, M.

    1974-01-01

    Markov chain and Monte Carlo analysis techniques are applied to the simulated Space Shuttle Orbiter Ferry flights to obtain statistical distributions of flight time duration between Edwards Air Force Base and Kennedy Space Center. The two methods are compared, and are found to be in excellent agreement. The flights are subjected to certain operational and meteorological requirements, or constraints, which cause eastbound and westbound trips to yield different results. Persistence of events theory is applied to the occurrence of inclement conditions to find their effect upon the statistical flight time distribution. In a sensitivity test, some of the constraints are varied to observe the corresponding changes in the results.

  4. 14 CFR 417.203 - Compliance.

    Code of Federal Regulations, 2012 CFR

    2012-01-01

    ... analysis method is based on accurate data and scientific principles and is statistically valid. The FAA... safety analysis must also meet the requirements for methods of analysis contained in appendices A and B... from an identical or similar launch if the analysis still applies to the later launch. (b) Method of...

  5. 14 CFR 417.203 - Compliance.

    Code of Federal Regulations, 2014 CFR

    2014-01-01

    ... analysis method is based on accurate data and scientific principles and is statistically valid. The FAA... safety analysis must also meet the requirements for methods of analysis contained in appendices A and B... from an identical or similar launch if the analysis still applies to the later launch. (b) Method of...

  6. 14 CFR 417.203 - Compliance.

    Code of Federal Regulations, 2013 CFR

    2013-01-01

    ... analysis method is based on accurate data and scientific principles and is statistically valid. The FAA... safety analysis must also meet the requirements for methods of analysis contained in appendices A and B... from an identical or similar launch if the analysis still applies to the later launch. (b) Method of...

  7. Compression Algorithm Analysis of In-Situ (S)TEM Video: Towards Automatic Event Detection and Characterization

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Teuton, Jeremy R.; Griswold, Richard L.; Mehdi, Beata L.

    Precise analysis of both (S)TEM images and video are time and labor intensive processes. As an example, determining when crystal growth and shrinkage occurs during the dynamic process of Li dendrite deposition and stripping involves manually scanning through each frame in the video to extract a specific set of frames/images. For large numbers of images, this process can be very time consuming, so a fast and accurate automated method is desirable. Given this need, we developed software that uses analysis of video compression statistics for detecting and characterizing events in large data sets. This software works by converting the datamore » into a series of images which it compresses into an MPEG-2 video using the open source “avconv” utility [1]. The software does not use the video itself, but rather analyzes the video statistics from the first pass of the video encoding that avconv records in the log file. This file contains statistics for each frame of the video including the frame quality, intra-texture and predicted texture bits, forward and backward motion vector resolution, among others. In all, avconv records 15 statistics for each frame. By combining different statistics, we have been able to detect events in various types of data. We have developed an interactive tool for exploring the data and the statistics that aids the analyst in selecting useful statistics for each analysis. Going forward, an algorithm for detecting and possibly describing events automatically can be written based on statistic(s) for each data type.« less

  8. Evaluation of standardized and applied variables in predicting treatment outcomes of polytrauma patients.

    PubMed

    Aksamija, Goran; Mulabdic, Adi; Rasic, Ismar; Muhovic, Samir; Gavric, Igor

    2011-01-01

    Polytrauma is defined as an injury where they are affected by at least two different organ systems or body, with at least one life-threatening injuries. Given the multilevel model care of polytrauma patients within KCUS are inevitable weaknesses in the management of this category of patients. To determine the dynamics of existing procedures in treatment of polytrauma patients on admission to KCUS, and based on statistical analysis of variables applied to determine and define the factors that influence the final outcome of treatment, and determine their mutual relationship, which may result in eliminating the flaws in the approach to the problem. The study was based on 263 polytrauma patients. Parametric and non-parametric statistical methods were used. Basic statistics were calculated, based on the calculated parameters for the final achievement of research objectives, multicoleration analysis, image analysis, discriminant analysis and multifactorial analysis were used. From the universe of variables for this study we selected sample of n = 25 variables, of which the first two modular, others belong to the common measurement space (n = 23) and in this paper defined as a system variable methods, procedures and assessments of polytrauma patients. After the multicoleration analysis, since the image analysis gave a reliable measurement results, we started the analysis of eigenvalues, that is defining the factors upon which they obtain information about the system solve the problem of the existing model and its correlation with treatment outcome. The study singled out the essential factors that determine the current organizational model of care, which may affect the treatment and better outcome of polytrauma patients. This analysis has shown the maximum correlative relationships between these practices and contributed to development guidelines that are defined by isolated factors.

  9. Feature-Based Statistical Analysis of Combustion Simulation Data

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bennett, J; Krishnamoorthy, V; Liu, S

    2011-11-18

    We present a new framework for feature-based statistical analysis of large-scale scientific data and demonstrate its effectiveness by analyzing features from Direct Numerical Simulations (DNS) of turbulent combustion. Turbulent flows are ubiquitous and account for transport and mixing processes in combustion, astrophysics, fusion, and climate modeling among other disciplines. They are also characterized by coherent structure or organized motion, i.e. nonlocal entities whose geometrical features can directly impact molecular mixing and reactive processes. While traditional multi-point statistics provide correlative information, they lack nonlocal structural information, and hence, fail to provide mechanistic causality information between organized fluid motion and mixing andmore » reactive processes. Hence, it is of great interest to capture and track flow features and their statistics together with their correlation with relevant scalar quantities, e.g. temperature or species concentrations. In our approach we encode the set of all possible flow features by pre-computing merge trees augmented with attributes, such as statistical moments of various scalar fields, e.g. temperature, as well as length-scales computed via spectral analysis. The computation is performed in an efficient streaming manner in a pre-processing step and results in a collection of meta-data that is orders of magnitude smaller than the original simulation data. This meta-data is sufficient to support a fully flexible and interactive analysis of the features, allowing for arbitrary thresholds, providing per-feature statistics, and creating various global diagnostics such as Cumulative Density Functions (CDFs), histograms, or time-series. We combine the analysis with a rendering of the features in a linked-view browser that enables scientists to interactively explore, visualize, and analyze the equivalent of one terabyte of simulation data. We highlight the utility of this new framework for combustion science; however, it is applicable to many other science domains.« less

  10. A statistically derived index for classifying East Coast fever reactions in cattle challenged with Theileria parva under experimental conditions.

    PubMed

    Rowlands, G J; Musoke, A J; Morzaria, S P; Nagda, S M; Ballingall, K T; McKeever, D J

    2000-04-01

    A statistically derived disease reaction index based on parasitological, clinical and haematological measurements observed in 309 5 to 8-month-old Boran cattle following laboratory challenge with Theileria parva is described. Principal component analysis was applied to 13 measures including first appearance of schizonts, first appearance of piroplasms and first occurrence of pyrexia, together with the duration and severity of these symptoms, and white blood cell count. The first principal component, which was based on approximately equal contributions of the 13 variables, provided the definition for the disease reaction index, defined on a scale of 0-10. As well as providing a more objective measure of the severity of the reaction, the continuous nature of the index score enables more powerful statistical analysis of the data compared with that which has been previously possible through clinically derived categories of non-, mild, moderate and severe reactions.

  11. Baseline estimation in flame's spectra by using neural networks and robust statistics

    NASA Astrophysics Data System (ADS)

    Garces, Hugo; Arias, Luis; Rojas, Alejandro

    2014-09-01

    This work presents a baseline estimation method in flame spectra based on artificial intelligence structure as a neural network, combining robust statistics with multivariate analysis to automatically discriminate measured wavelengths belonging to continuous feature for model adaptation, surpassing restriction of measuring target baseline for training. The main contributions of this paper are: to analyze a flame spectra database computing Jolliffe statistics from Principal Components Analysis detecting wavelengths not correlated with most of the measured data corresponding to baseline; to systematically determine the optimal number of neurons in hidden layers based on Akaike's Final Prediction Error; to estimate baseline in full wavelength range sampling measured spectra; and to train an artificial intelligence structure as a Neural Network which allows to generalize the relation between measured and baseline spectra. The main application of our research is to compute total radiation with baseline information, allowing to diagnose combustion process state for optimization in early stages.

  12. Application of a data-mining method based on Bayesian networks to lesion-deficit analysis

    NASA Technical Reports Server (NTRS)

    Herskovits, Edward H.; Gerring, Joan P.

    2003-01-01

    Although lesion-deficit analysis (LDA) has provided extensive information about structure-function associations in the human brain, LDA has suffered from the difficulties inherent to the analysis of spatial data, i.e., there are many more variables than subjects, and data may be difficult to model using standard distributions, such as the normal distribution. We herein describe a Bayesian method for LDA; this method is based on data-mining techniques that employ Bayesian networks to represent structure-function associations. These methods are computationally tractable, and can represent complex, nonlinear structure-function associations. When applied to the evaluation of data obtained from a study of the psychiatric sequelae of traumatic brain injury in children, this method generates a Bayesian network that demonstrates complex, nonlinear associations among lesions in the left caudate, right globus pallidus, right side of the corpus callosum, right caudate, and left thalamus, and subsequent development of attention-deficit hyperactivity disorder, confirming and extending our previous statistical analysis of these data. Furthermore, analysis of simulated data indicates that methods based on Bayesian networks may be more sensitive and specific for detecting associations among categorical variables than methods based on chi-square and Fisher exact statistics.

  13. Biological Parametric Mapping: A Statistical Toolbox for Multi-Modality Brain Image Analysis

    PubMed Central

    Casanova, Ramon; Ryali, Srikanth; Baer, Aaron; Laurienti, Paul J.; Burdette, Jonathan H.; Hayasaka, Satoru; Flowers, Lynn; Wood, Frank; Maldjian, Joseph A.

    2006-01-01

    In recent years multiple brain MR imaging modalities have emerged; however, analysis methodologies have mainly remained modality specific. In addition, when comparing across imaging modalities, most researchers have been forced to rely on simple region-of-interest type analyses, which do not allow the voxel-by-voxel comparisons necessary to answer more sophisticated neuroscience questions. To overcome these limitations, we developed a toolbox for multimodal image analysis called biological parametric mapping (BPM), based on a voxel-wise use of the general linear model. The BPM toolbox incorporates information obtained from other modalities as regressors in a voxel-wise analysis, thereby permitting investigation of more sophisticated hypotheses. The BPM toolbox has been developed in MATLAB with a user friendly interface for performing analyses, including voxel-wise multimodal correlation, ANCOVA, and multiple regression. It has a high degree of integration with the SPM (statistical parametric mapping) software relying on it for visualization and statistical inference. Furthermore, statistical inference for a correlation field, rather than a widely-used T-field, has been implemented in the correlation analysis for more accurate results. An example with in-vivo data is presented demonstrating the potential of the BPM methodology as a tool for multimodal image analysis. PMID:17070709

  14. SimHap GUI: An intuitive graphical user interface for genetic association analysis

    PubMed Central

    Carter, Kim W; McCaskie, Pamela A; Palmer, Lyle J

    2008-01-01

    Background Researchers wishing to conduct genetic association analysis involving single nucleotide polymorphisms (SNPs) or haplotypes are often confronted with the lack of user-friendly graphical analysis tools, requiring sophisticated statistical and informatics expertise to perform relatively straightforward tasks. Tools, such as the SimHap package for the R statistics language, provide the necessary statistical operations to conduct sophisticated genetic analysis, but lacks a graphical user interface that allows anyone but a professional statistician to effectively utilise the tool. Results We have developed SimHap GUI, a cross-platform integrated graphical analysis tool for conducting epidemiological, single SNP and haplotype-based association analysis. SimHap GUI features a novel workflow interface that guides the user through each logical step of the analysis process, making it accessible to both novice and advanced users. This tool provides a seamless interface to the SimHap R package, while providing enhanced functionality such as sophisticated data checking, automated data conversion, and real-time estimations of haplotype simulation progress. Conclusion SimHap GUI provides a novel, easy-to-use, cross-platform solution for conducting a range of genetic and non-genetic association analyses. This provides a free alternative to commercial statistics packages that is specifically designed for genetic association analysis. PMID:19109877

  15. EHME: a new word database for research in Basque language.

    PubMed

    Acha, Joana; Laka, Itziar; Landa, Josu; Salaburu, Pello

    2014-11-14

    This article presents EHME, the frequency dictionary of Basque structure, an online program that enables researchers in psycholinguistics to extract word and nonword stimuli, based on a broad range of statistics concerning the properties of Basque words. The database consists of 22.7 million tokens, and properties available include morphological structure frequency and word-similarity measures, apart from classical indexes: word frequency, orthographic structure, orthographic similarity, bigram and biphone frequency, and syllable-based measures. Measures are indexed at the lemma, morpheme and word level. We include reliability and validation analysis. The application is freely available, and enables the user to extract words based on concrete statistical criteria 1 , as well as to obtain statistical characteristics from a list of words

  16. Statistical analysis of multivariate atmospheric variables. [cloud cover

    NASA Technical Reports Server (NTRS)

    Tubbs, J. D.

    1979-01-01

    Topics covered include: (1) estimation in discrete multivariate distributions; (2) a procedure to predict cloud cover frequencies in the bivariate case; (3) a program to compute conditional bivariate normal parameters; (4) the transformation of nonnormal multivariate to near-normal; (5) test of fit for the extreme value distribution based upon the generalized minimum chi-square; (6) test of fit for continuous distributions based upon the generalized minimum chi-square; (7) effect of correlated observations on confidence sets based upon chi-square statistics; and (8) generation of random variates from specified distributions.

  17. A Framework for Establishing Standard Reference Scale of Texture by Multivariate Statistical Analysis Based on Instrumental Measurement and Sensory Evaluation.

    PubMed

    Zhi, Ruicong; Zhao, Lei; Xie, Nan; Wang, Houyin; Shi, Bolin; Shi, Jingye

    2016-01-13

    A framework of establishing standard reference scale (texture) is proposed by multivariate statistical analysis according to instrumental measurement and sensory evaluation. Multivariate statistical analysis is conducted to rapidly select typical reference samples with characteristics of universality, representativeness, stability, substitutability, and traceability. The reasonableness of the framework method is verified by establishing standard reference scale of texture attribute (hardness) with Chinese well-known food. More than 100 food products in 16 categories were tested using instrumental measurement (TPA test), and the result was analyzed with clustering analysis, principal component analysis, relative standard deviation, and analysis of variance. As a result, nine kinds of foods were determined to construct the hardness standard reference scale. The results indicate that the regression coefficient between the estimated sensory value and the instrumentally measured value is significant (R(2) = 0.9765), which fits well with Stevens's theory. The research provides reliable a theoretical basis and practical guide for quantitative standard reference scale establishment on food texture characteristics.

  18. Statistics and Machine Learning based Outlier Detection Techniques for Exoplanets

    NASA Astrophysics Data System (ADS)

    Goel, Amit; Montgomery, Michele

    2015-08-01

    Architectures of planetary systems are observable snapshots in time that can indicate formation and dynamic evolution of planets. The observable key parameters that we consider are planetary mass and orbital period. If planet masses are significantly less than their host star masses, then Keplerian Motion is defined as P^2 = a^3 where P is the orbital period in units of years and a is the orbital period in units of Astronomical Units (AU). Keplerian motion works on small scales such as the size of the Solar System but not on large scales such as the size of the Milky Way Galaxy. In this work, for confirmed exoplanets of known stellar mass, planetary mass, orbital period, and stellar age, we analyze Keplerian motion of systems based on stellar age to seek if Keplerian motion has an age dependency and to identify outliers. For detecting outliers, we apply several techniques based on statistical and machine learning methods such as probabilistic, linear, and proximity based models. In probabilistic and statistical models of outliers, the parameters of a closed form probability distributions are learned in order to detect the outliers. Linear models use regression analysis based techniques for detecting outliers. Proximity based models use distance based algorithms such as k-nearest neighbour, clustering algorithms such as k-means, or density based algorithms such as kernel density estimation. In this work, we will use unsupervised learning algorithms with only the proximity based models. In addition, we explore the relative strengths and weaknesses of the various techniques by validating the outliers. The validation criteria for the outliers is if the ratio of planetary mass to stellar mass is less than 0.001. In this work, we present our statistical analysis of the outliers thus detected.

  19. Image encryption based on a delayed fractional-order chaotic logistic system

    NASA Astrophysics Data System (ADS)

    Wang, Zhen; Huang, Xia; Li, Ning; Song, Xiao-Na

    2012-05-01

    A new image encryption scheme is proposed based on a delayed fractional-order chaotic logistic system. In the process of generating a key stream, the time-varying delay and fractional derivative are embedded in the proposed scheme to improve the security. Such a scheme is described in detail with security analyses including correlation analysis, information entropy analysis, run statistic analysis, mean-variance gray value analysis, and key sensitivity analysis. Experimental results show that the newly proposed image encryption scheme possesses high security.

  20. Microscopic saw mark analysis: an empirical approach.

    PubMed

    Love, Jennifer C; Derrick, Sharon M; Wiersema, Jason M; Peters, Charles

    2015-01-01

    Microscopic saw mark analysis is a well published and generally accepted qualitative analytical method. However, little research has focused on identifying and mitigating potential sources of error associated with the method. The presented study proposes the use of classification trees and random forest classifiers as an optimal, statistically sound approach to mitigate the potential for error of variability and outcome error in microscopic saw mark analysis. The statistical model was applied to 58 experimental saw marks created with four types of saws. The saw marks were made in fresh human femurs obtained through anatomical gift and were analyzed using a Keyence digital microscope. The statistical approach weighed the variables based on discriminatory value and produced decision trees with an associated outcome error rate of 8.62-17.82%. © 2014 American Academy of Forensic Sciences.

  1. Statistical analysis of fNIRS data: a comprehensive review.

    PubMed

    Tak, Sungho; Ye, Jong Chul

    2014-01-15

    Functional near-infrared spectroscopy (fNIRS) is a non-invasive method to measure brain activities using the changes of optical absorption in the brain through the intact skull. fNIRS has many advantages over other neuroimaging modalities such as positron emission tomography (PET), functional magnetic resonance imaging (fMRI), or magnetoencephalography (MEG), since it can directly measure blood oxygenation level changes related to neural activation with high temporal resolution. However, fNIRS signals are highly corrupted by measurement noises and physiology-based systemic interference. Careful statistical analyses are therefore required to extract neuronal activity-related signals from fNIRS data. In this paper, we provide an extensive review of historical developments of statistical analyses of fNIRS signal, which include motion artifact correction, short source-detector separation correction, principal component analysis (PCA)/independent component analysis (ICA), false discovery rate (FDR), serially-correlated errors, as well as inference techniques such as the standard t-test, F-test, analysis of variance (ANOVA), and statistical parameter mapping (SPM) framework. In addition, to provide a unified view of various existing inference techniques, we explain a linear mixed effect model with restricted maximum likelihood (ReML) variance estimation, and show that most of the existing inference methods for fNIRS analysis can be derived as special cases. Some of the open issues in statistical analysis are also described. Copyright © 2013 Elsevier Inc. All rights reserved.

  2. Assessing population exposure for landslide risk analysis using dasymetric cartography

    NASA Astrophysics Data System (ADS)

    Garcia, Ricardo A. C.; Oliveira, Sergio C.; Zezere, Jose L.

    2015-04-01

    Exposed Population is a major topic that needs to be taken into account in a full landslide risk analysis. Usually, risk analysis is based on an accounting of inhabitants number or inhabitants density, applied over statistical or administrative terrain units, such as NUTS or parishes. However, this kind of approach may skew the obtained results underestimating the importance of population, mainly in territorial units with predominance of rural occupation. Furthermore, the landslide susceptibility scores calculated for each terrain unit are frequently more detailed and accurate than the location of the exposed population inside each territorial unit based on Census data. These drawbacks are not the ideal setting when landslide risk analysis is performed for urban management and emergency planning. Dasymetric cartography, which uses a parameter or set of parameters to restrict the spatial distribution of a particular phenomenon, is a methodology that may help to enhance the resolution of Census data and therefore to give a more realistic representation of the population distribution. Therefore, this work aims to map and to compare the population distribution based on a traditional approach (population per administrative terrain units) and based on dasymetric cartography (population by building). The study is developed in the Region North of Lisbon using 2011 population data and following three main steps: i) the landslide susceptibility assessment based on statistical models independently validated; ii) the evaluation of population distribution (absolute and density) for different administrative territorial units (Parishes and BGRI - the basic statistical unit in the Portuguese Census); and iii) the dasymetric population's cartography based on building areal weighting. Preliminary results show that in sparsely populated administrative units, population density differs more than two times depending on the application of the traditional approach or the dasymetric cartography. This work was supported by the FCT - Portuguese Foundation for Science and Technology.

  3. Gis-Based Spatial Statistical Analysis of College Graduates Employment

    NASA Astrophysics Data System (ADS)

    Tang, R.

    2012-07-01

    It is urgently necessary to be aware of the distribution and employment status of college graduates for proper allocation of human resources and overall arrangement of strategic industry. This study provides empirical evidence regarding the use of geocoding and spatial analysis in distribution and employment status of college graduates based on the data from 2004-2008 Wuhan Municipal Human Resources and Social Security Bureau, China. Spatio-temporal distribution of employment unit were analyzed with geocoding using ArcGIS software, and the stepwise multiple linear regression method via SPSS software was used to predict the employment and to identify spatially associated enterprise and professionals demand in the future. The results show that the enterprises in Wuhan east lake high and new technology development zone increased dramatically from 2004 to 2008, and tended to distributed southeastward. Furthermore, the models built by statistical analysis suggest that the specialty of graduates major in has an important impact on the number of the employment and the number of graduates engaging in pillar industries. In conclusion, the combination of GIS and statistical analysis which helps to simulate the spatial distribution of the employment status is a potential tool for human resource development research.

  4. Demographic and health situation of children in conditions of economic destabilization in the Ukraine.

    PubMed

    Pantyley, Viktoriya

    2014-01-01

    In new conditions of socio-economic development in the Ukraine, the health of the population of children is considered as the most reliable indicator of socio-economic development of the country. The primary goal of the study was analysis of the effect of contemporary socio-economic transformations, their scope, and strength of effect on the demographic and social situation of children in various regions of the Ukraine. The methodological objectives of the study were as follows: development of a synthetic measure of the state of health of the population of children, based on the Hellwig's method, and selection of districts in the Ukraine according to the present health-demographic situation of children. The study was based on statistical data from the State Statistics Service of Ukraine, Centre of Medical Statistics in Kiev, Ukrainian Ministry of Defence, as well as Ministry of Education and Science, Youth and Sports of Ukraine. The following research methods were used: analysis of literature and Internet sources, selection and analysis of statistical materials, cartographic and statistical methods. Basic indices of the demographic and health situation of the population of children were analyzed, as well as factors of a socio-economic nature which affect this situation. A set of variables was developed for the synthetic evaluation of the state of health of the population of children. The typology of the Ukrainian districts was performed according to the state of health of the child population, based on the Hellwig's taxonomic method. Deterioration was observed of selected quality parameters, as well as a change in the strength and directions of effect of factors of organizational-institutional, socioeconomic, historical and cultural nature on the population of children potential.

  5. Statistical analysis of water-quality data containing multiple detection limits II: S-language software for nonparametric distribution modeling and hypothesis testing

    USGS Publications Warehouse

    Lee, L.; Helsel, D.

    2007-01-01

    Analysis of low concentrations of trace contaminants in environmental media often results in left-censored data that are below some limit of analytical precision. Interpretation of values becomes complicated when there are multiple detection limits in the data-perhaps as a result of changing analytical precision over time. Parametric and semi-parametric methods, such as maximum likelihood estimation and robust regression on order statistics, can be employed to model distributions of multiply censored data and provide estimates of summary statistics. However, these methods are based on assumptions about the underlying distribution of data. Nonparametric methods provide an alternative that does not require such assumptions. A standard nonparametric method for estimating summary statistics of multiply-censored data is the Kaplan-Meier (K-M) method. This method has seen widespread usage in the medical sciences within a general framework termed "survival analysis" where it is employed with right-censored time-to-failure data. However, K-M methods are equally valid for the left-censored data common in the geosciences. Our S-language software provides an analytical framework based on K-M methods that is tailored to the needs of the earth and environmental sciences community. This includes routines for the generation of empirical cumulative distribution functions, prediction or exceedance probabilities, and related confidence limits computation. Additionally, our software contains K-M-based routines for nonparametric hypothesis testing among an unlimited number of grouping variables. A primary characteristic of K-M methods is that they do not perform extrapolation and interpolation. Thus, these routines cannot be used to model statistics beyond the observed data range or when linear interpolation is desired. For such applications, the aforementioned parametric and semi-parametric methods must be used.

  6. “Magnitude-based Inference”: A Statistical Review

    PubMed Central

    Welsh, Alan H.; Knight, Emma J.

    2015-01-01

    ABSTRACT Purpose We consider “magnitude-based inference” and its interpretation by examining in detail its use in the problem of comparing two means. Methods We extract from the spreadsheets, which are provided to users of the analysis (http://www.sportsci.org/), a precise description of how “magnitude-based inference” is implemented. We compare the implemented version of the method with general descriptions of it and interpret the method in familiar statistical terms. Results and Conclusions We show that “magnitude-based inference” is not a progressive improvement on modern statistics. The additional probabilities introduced are not directly related to the confidence interval but, rather, are interpretable either as P values for two different nonstandard tests (for different null hypotheses) or as approximate Bayesian calculations, which also lead to a type of test. We also discuss sample size calculations associated with “magnitude-based inference” and show that the substantial reduction in sample sizes claimed for the method (30% of the sample size obtained from standard frequentist calculations) is not justifiable so the sample size calculations should not be used. Rather than using “magnitude-based inference,” a better solution is to be realistic about the limitations of the data and use either confidence intervals or a fully Bayesian analysis. PMID:25051387

  7. Automated Clinical Assessment from Smart home-based Behavior Data

    PubMed Central

    Dawadi, Prafulla Nath; Cook, Diane Joyce; Schmitter-Edgecombe, Maureen

    2016-01-01

    Smart home technologies offer potential benefits for assisting clinicians by automating health monitoring and well-being assessment. In this paper, we examine the actual benefits of smart home-based analysis by monitoring daily behaviour in the home and predicting standard clinical assessment scores of the residents. To accomplish this goal, we propose a Clinical Assessment using Activity Behavior (CAAB) approach to model a smart home resident’s daily behavior and predict the corresponding standard clinical assessment scores. CAAB uses statistical features that describe characteristics of a resident’s daily activity performance to train machine learning algorithms that predict the clinical assessment scores. We evaluate the performance of CAAB utilizing smart home sensor data collected from 18 smart homes over two years using prediction and classification-based experiments. In the prediction-based experiments, we obtain a statistically significant correlation (r = 0.72) between CAAB-predicted and clinician-provided cognitive assessment scores and a statistically significant correlation (r = 0.45) between CAAB-predicted and clinician-provided mobility scores. Similarly, for the classification-based experiments, we find CAAB has a classification accuracy of 72% while classifying cognitive assessment scores and 76% while classifying mobility scores. These prediction and classification results suggest that it is feasible to predict standard clinical scores using smart home sensor data and learning-based data analysis. PMID:26292348

  8. Evaluation of calcium ion, hydroxyl ion release and pH levels in various calcium hydroxide based intracanal medicaments: An in vitro study

    PubMed Central

    Fulzele, Punit; Baliga, Sudhindra; Thosar, Nilima; Pradhan, Debaprya

    2011-01-01

    Aims: Evaluation of calcium ion and hydroxyl ion release and pH levels in various calcium hydroxide based intracanal medicaments. Objective: The purpose of this study was to evaluate calcium and hydroxyl ion release and pH levels of calcium hydroxide based products, namely, RC Cal, Metapex, calcium hydroxide with distilled water, along with the new gutta-percha points with calcium hydroxide. Materials and Methods: The materials were inserted in polyethylene tubes and immersed in deionized water. The pH variation, Ca++ and OH- release were monitored periodically for 1 week. Statistical Analysis Used: Statistical analysis was carried out using one-way analysis of variance and Tukey's post hoc tests with PASW Statistics version 18 software to compare the statistical difference. Results: After 1 week, calcium hydroxide with distilled water and RC Cal raised the pH to 12.7 and 11.8, respectively, while a small change was observed for Metapex, calcium hydroxide gutta-percha points. The calcium released after 1 week was 15.36 mg/dL from RC Cal, followed by 13.04, 1.296, 3.064 mg/dL from calcium hydroxide with sterile water, Metapex and calcium hydroxide gutta-percha points, respectively. Conclusions: Calcium hydroxide with sterile water and RC Cal pastes liberate significantly more calcium and hydroxyl ions and raise the pH higher than Metapex and calcium hydroxidegutta-percha points. PMID:22346155

  9. Publishing in "SERJ": An Analysis of Papers from 2002-2009

    ERIC Educational Resources Information Center

    Zieffler, Andrew; Garfield, Joan; delMas, Robert C.; Le, Laura; Isaak, Rebekah; Bjornsdottir, Audbjorg; Park, Jiyoon

    2011-01-01

    "SERJ" has provided a high quality professional publication venue for researchers in statistics education for close to a decade. This paper presents a review of the articles published to explore what they suggest about the field of statistics education, the researchers, the questions addressed, and the growing knowledge base on teaching and…

  10. Statistical properties of alternative national forest inventory area estimators

    Treesearch

    Francis Roesch; John Coulston; Andrew D. Hill

    2012-01-01

    The statistical properties of potential estimators of forest area for the USDA Forest Service's Forest Inventory and Analysis (FIA) program are presented and discussed. The current FIA area estimator is compared and contrasted with a weighted mean estimator and an estimator based on the Polya posterior, in the presence of nonresponse. Estimator optimality is...

  11. The Power of 'Evidence': Reliable Science or a Set of Blunt Tools?

    ERIC Educational Resources Information Center

    Wrigley, Terry

    2018-01-01

    In response to the increasing emphasis on 'evidence-based teaching', this article examines the privileging of randomised controlled trials and their statistical synthesis (meta-analysis). It also pays particular attention to two third-level statistical syntheses: John Hattie's "Visible learning" project and the EEF's "Teaching and…

  12. Marigold (Calendula officinalis L.): an evidence-based systematic review by the Natural Standard Research Collaboration.

    PubMed

    Basch, Ethan; Bent, Steve; Foppa, Ivo; Haskmi, Sadaf; Kroll, David; Mele, Michelle; Szapary, Philippe; Ulbricht, Catherine; Vora, Mamta; Yong, Sophanna

    2006-01-01

    An evidence-based systematic review including written and statistical analysis of scientific literature, expert opinion, folkloric precedent, history, pharmacology, kinetics/dynamics, interactions, adverse effects, toxicology and dosing.

  13. An evidence-based systematic review of saffron (Crocus sativus) by the Natural Standard Research Collaboration.

    PubMed

    Ulbricht, Catherine; Conquer, Julie; Costa, Dawn; Hollands, Whitney; Iannuzzi, Carmen; Isaac, Richard; Jordan, Joseph K; Ledesma, Natalie; Ostroff, Cathy; Serrano, Jill M Grimes; Shaffer, Michael D; Varghese, Minney

    2011-03-01

    An evidence-based systematic review including written and statistical analysis of scientific literature, expert opinion, folkloric precedent, history, pharmacology, kinetics/dynamics, interactions, adverse effects, toxicology, and dosing.

  14. Evidence-based systematic review of saw palmetto by the Natural Standard Research Collaboration.

    PubMed

    Ulbricht, Catherine; Basch, Ethan; Bent, Steve; Boon, Heather; Corrado, Michelle; Foppa, Ivo; Hashmi, Sadaf; Hammerness, Paul; Kingsbury, Eileen; Smith, Michael; Szapary, Philippe; Vora, Mamta; Weissner, Wendy

    2006-01-01

    Here presented is an evidence-based systematic review including written and statistical analysis of scientific literature, expert opinion, folkloric precedent, history, pharmacology, kinetics/dynamics, interactions, adverse effects, toxicology, and dosing.

  15. Statistical properties of DNA sequences

    NASA Technical Reports Server (NTRS)

    Peng, C. K.; Buldyrev, S. V.; Goldberger, A. L.; Havlin, S.; Mantegna, R. N.; Simons, M.; Stanley, H. E.

    1995-01-01

    We review evidence supporting the idea that the DNA sequence in genes containing non-coding regions is correlated, and that the correlation is remarkably long range--indeed, nucleotides thousands of base pairs distant are correlated. We do not find such a long-range correlation in the coding regions of the gene. We resolve the problem of the "non-stationarity" feature of the sequence of base pairs by applying a new algorithm called detrended fluctuation analysis (DFA). We address the claim of Voss that there is no difference in the statistical properties of coding and non-coding regions of DNA by systematically applying the DFA algorithm, as well as standard FFT analysis, to every DNA sequence (33301 coding and 29453 non-coding) in the entire GenBank database. Finally, we describe briefly some recent work showing that the non-coding sequences have certain statistical features in common with natural and artificial languages. Specifically, we adapt to DNA the Zipf approach to analyzing linguistic texts. These statistical properties of non-coding sequences support the possibility that non-coding regions of DNA may carry biological information.

  16. A statistical-based scheduling algorithm in automated data path synthesis

    NASA Technical Reports Server (NTRS)

    Jeon, Byung Wook; Lursinsap, Chidchanok

    1992-01-01

    In this paper, we propose a new heuristic scheduling algorithm based on the statistical analysis of the cumulative frequency distribution of operations among control steps. It has a tendency of escaping from local minima and therefore reaching a globally optimal solution. The presented algorithm considers the real world constraints such as chained operations, multicycle operations, and pipelined data paths. The result of the experiment shows that it gives optimal solutions, even though it is greedy in nature.

  17. Navigation analysis for Viking 1979, option B

    NASA Technical Reports Server (NTRS)

    Mitchell, P. H.

    1971-01-01

    A parametric study performed for 48 trans-Mars reference missions in support of the Viking program is reported. The launch dates cover several months in the year 1979, and each launch date has multiple arrival dates in 1980. A plot of launch versus arrival dates with case numbers designated for reference purposes is included. The analysis consists of the computation of statistical covariance matrices based on certain assumptions about the ground-based tracking systems. The error model statistics are listed in tables. Tracking systems were assumed at three sites: Goldstone, California; Canberra, Australia; and Madrid, Spain. The tracking data consisted of range and Doppler measurements taken during the tracking intervals starting at E-30(d) and ending at E-10(d) for the control data and ending at E-18(h) for the knowledge data. The control and knowledge covariance matrices were delivered to the Planetary Mission Analysis Branch for inputs into a delta V dispersion analysis.

  18. Integrated Assessment and Improvement of the Quality Assurance System for the Cosworth Casting Process

    NASA Astrophysics Data System (ADS)

    Yousif, Dilon

    The purpose of this study was to improve the Quality Assurance (QA) System at the Nemak Windsor Aluminum Plant (WAP). The project used Six Sigma method based on Define, Measure, Analyze, Improve, and Control (DMAIC). Analysis of in process melt at WAP was based on chemical, thermal, and mechanical testing. The control limits for the W319 Al Alloy were statistically recalculated using the composition measured under stable conditions. The "Chemistry Viewer" software was developed for statistical analysis of alloy composition. This software features the Silicon Equivalency (SiBQ) developed by the IRC. The Melt Sampling Device (MSD) was designed and evaluated at WAP to overcome traditional sampling limitations. The Thermal Analysis "Filters" software was developed for cooling curve analysis of the 3XX Al Alloy(s) using IRC techniques. The impact of low melting point impurities on the start of melting was evaluated using the Universal Metallurgical Simulator and Analyzer (UMSA).

  19. THE MEASUREMENT OF BONE QUALITY USING GRAY LEVEL CO-OCCURRENCE MATRIX TEXTURAL FEATURES.

    PubMed

    Shirvaikar, Mukul; Huang, Ning; Dong, Xuanliang Neil

    2016-10-01

    In this paper, statistical methods for the estimation of bone quality to predict the risk of fracture are reported. Bone mineral density and bone architecture properties are the main contributors of bone quality. Dual-energy X-ray Absorptiometry (DXA) is the traditional clinical measurement technique for bone mineral density, but does not include architectural information to enhance the prediction of bone fragility. Other modalities are not practical due to cost and access considerations. This study investigates statistical parameters based on the Gray Level Co-occurrence Matrix (GLCM) extracted from two-dimensional projection images and explores links with architectural properties and bone mechanics. Data analysis was conducted on Micro-CT images of 13 trabecular bones (with an in-plane spatial resolution of about 50μm). Ground truth data for bone volume fraction (BV/TV), bone strength and modulus were available based on complex 3D analysis and mechanical tests. Correlation between the statistical parameters and biomechanical test results was studied using regression analysis. The results showed Cluster-Shade was strongly correlated with the microarchitecture of the trabecular bone and related to mechanical properties. Once the principle thesis of utilizing second-order statistics is established, it can be extended to other modalities, providing cost and convenience advantages for patients and doctors.

  20. THE MEASUREMENT OF BONE QUALITY USING GRAY LEVEL CO-OCCURRENCE MATRIX TEXTURAL FEATURES

    PubMed Central

    Shirvaikar, Mukul; Huang, Ning; Dong, Xuanliang Neil

    2016-01-01

    In this paper, statistical methods for the estimation of bone quality to predict the risk of fracture are reported. Bone mineral density and bone architecture properties are the main contributors of bone quality. Dual-energy X-ray Absorptiometry (DXA) is the traditional clinical measurement technique for bone mineral density, but does not include architectural information to enhance the prediction of bone fragility. Other modalities are not practical due to cost and access considerations. This study investigates statistical parameters based on the Gray Level Co-occurrence Matrix (GLCM) extracted from two-dimensional projection images and explores links with architectural properties and bone mechanics. Data analysis was conducted on Micro-CT images of 13 trabecular bones (with an in-plane spatial resolution of about 50μm). Ground truth data for bone volume fraction (BV/TV), bone strength and modulus were available based on complex 3D analysis and mechanical tests. Correlation between the statistical parameters and biomechanical test results was studied using regression analysis. The results showed Cluster-Shade was strongly correlated with the microarchitecture of the trabecular bone and related to mechanical properties. Once the principle thesis of utilizing second-order statistics is established, it can be extended to other modalities, providing cost and convenience advantages for patients and doctors. PMID:28042512

  1. Multivariate analysis, mass balance techniques, and statistical tests as tools in igneous petrology: application to the Sierra de las Cruces volcanic range (Mexican Volcanic Belt).

    PubMed

    Velasco-Tapia, Fernando

    2014-01-01

    Magmatic processes have usually been identified and evaluated using qualitative or semiquantitative geochemical or isotopic tools based on a restricted number of variables. However, a more complete and quantitative view could be reached applying multivariate analysis, mass balance techniques, and statistical tests. As an example, in this work a statistical and quantitative scheme is applied to analyze the geochemical features for the Sierra de las Cruces (SC) volcanic range (Mexican Volcanic Belt). In this locality, the volcanic activity (3.7 to 0.5 Ma) was dominantly dacitic, but the presence of spheroidal andesitic enclaves and/or diverse disequilibrium features in majority of lavas confirms the operation of magma mixing/mingling. New discriminant-function-based multidimensional diagrams were used to discriminate tectonic setting. Statistical tests of discordancy and significance were applied to evaluate the influence of the subducting Cocos plate, which seems to be rather negligible for the SC magmas in relation to several major and trace elements. A cluster analysis following Ward's linkage rule was carried out to classify the SC volcanic rocks geochemical groups. Finally, two mass-balance schemes were applied for the quantitative evaluation of the proportion of the end-member components (dacitic and andesitic magmas) in the comingled lavas (binary mixtures).

  2. mapDIA: Preprocessing and statistical analysis of quantitative proteomics data from data independent acquisition mass spectrometry.

    PubMed

    Teo, Guoshou; Kim, Sinae; Tsou, Chih-Chiang; Collins, Ben; Gingras, Anne-Claude; Nesvizhskii, Alexey I; Choi, Hyungwon

    2015-11-03

    Data independent acquisition (DIA) mass spectrometry is an emerging technique that offers more complete detection and quantification of peptides and proteins across multiple samples. DIA allows fragment-level quantification, which can be considered as repeated measurements of the abundance of the corresponding peptides and proteins in the downstream statistical analysis. However, few statistical approaches are available for aggregating these complex fragment-level data into peptide- or protein-level statistical summaries. In this work, we describe a software package, mapDIA, for statistical analysis of differential protein expression using DIA fragment-level intensities. The workflow consists of three major steps: intensity normalization, peptide/fragment selection, and statistical analysis. First, mapDIA offers normalization of fragment-level intensities by total intensity sums as well as a novel alternative normalization by local intensity sums in retention time space. Second, mapDIA removes outlier observations and selects peptides/fragments that preserve the major quantitative patterns across all samples for each protein. Last, using the selected fragments and peptides, mapDIA performs model-based statistical significance analysis of protein-level differential expression between specified groups of samples. Using a comprehensive set of simulation datasets, we show that mapDIA detects differentially expressed proteins with accurate control of the false discovery rates. We also describe the analysis procedure in detail using two recently published DIA datasets generated for 14-3-3β dynamic interaction network and prostate cancer glycoproteome. The software was written in C++ language and the source code is available for free through SourceForge website http://sourceforge.net/projects/mapdia/.This article is part of a Special Issue entitled: Computational Proteomics. Copyright © 2015 Elsevier B.V. All rights reserved.

  3. Chemical discrimination of lubricant marketing types using direct analysis in real time time-of-flight mass spectrometry.

    PubMed

    Maric, Mark; Harvey, Lauren; Tomcsak, Maren; Solano, Angelique; Bridge, Candice

    2017-06-30

    In comparison to other violent crimes, sexual assaults suffer from very low prosecution and conviction rates especially in the absence of DNA evidence. As a result, the forensic community needs to utilize other forms of trace contact evidence, like lubricant evidence, in order to provide a link between the victim and the assailant. In this study, 90 personal bottled and condom lubricants from the three main marketing types, silicone-based, water-based and condoms, were characterized by direct analysis in real time time of flight mass spectrometry (DART-TOFMS). The instrumental data was analyzed by multivariate statistics including hierarchal cluster analysis, principal component analysis, and linear discriminant analysis. By interpreting the mass spectral data with multivariate statistics, 12 discrete groupings were identified, indicating inherent chemical diversity not only between but within the three main marketing groups. A number of unique chemical markers, both major and minor, were identified, other than the three main chemical components (i.e. PEG, PDMS and nonoxynol-9) currently used for lubricant classification. The data was validated by a stratified 20% withheld cross-validation which demonstrated that there was minimal overlap between the groupings. Based on the groupings identified and unique features of each group, a highly discriminating statistical model was then developed that aims to provide the foundation for the development of a forensic lubricant database that may eventually be applied to casework. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.

  4. Sole: Online Analysis of Southern FIA Data

    Treesearch

    Michael P. Spinney; Paul C. Van Deusen; Francis A. Roesch

    2006-01-01

    The Southern On Line Estimator (SOLE) is a flexible modular software program for analyzing U.S. Department of Agriculture Forest Service Forest Inventory and Analysis data. SOLE produces statistical tables, figures, maps, and portable document format reports based on user selected area and variables. SOLE?s Java-based graphical user interface is easy to use, and its R-...

  5. Understanding the Relationship between School-Based Management, Emotional Intelligence and Performance of Religious Upper Secondary School Principals in Banten Province

    ERIC Educational Resources Information Center

    Muslihah, Oleh Eneng

    2015-01-01

    The research examines the correlation between the understanding of school-based management, emotional intelligences and headmaster performance. Data was collected, using quantitative methods. The statistical analysis used was the Pearson Correlation, and multivariate regression analysis. The results of this research suggest firstly that there is…

  6. Cost-Effectiveness Analysis: a proposal of new reporting standards in statistical analysis

    PubMed Central

    Bang, Heejung; Zhao, Hongwei

    2014-01-01

    Cost-effectiveness analysis (CEA) is a method for evaluating the outcomes and costs of competing strategies designed to improve health, and has been applied to a variety of different scientific fields. Yet, there are inherent complexities in cost estimation and CEA from statistical perspectives (e.g., skewness, bi-dimensionality, and censoring). The incremental cost-effectiveness ratio that represents the additional cost per one unit of outcome gained by a new strategy has served as the most widely accepted methodology in the CEA. In this article, we call for expanded perspectives and reporting standards reflecting a more comprehensive analysis that can elucidate different aspects of available data. Specifically, we propose that mean and median-based incremental cost-effectiveness ratios and average cost-effectiveness ratios be reported together, along with relevant summary and inferential statistics as complementary measures for informed decision making. PMID:24605979

  7. Research Update: Spatially resolved mapping of electronic structure on atomic level by multivariate statistical analysis

    NASA Astrophysics Data System (ADS)

    Belianinov, Alex; Ganesh, Panchapakesan; Lin, Wenzhi; Sales, Brian C.; Sefat, Athena S.; Jesse, Stephen; Pan, Minghu; Kalinin, Sergei V.

    2014-12-01

    Atomic level spatial variability of electronic structure in Fe-based superconductor FeTe0.55Se0.45 (Tc = 15 K) is explored using current-imaging tunneling-spectroscopy. Multivariate statistical analysis of the data differentiates regions of dissimilar electronic behavior that can be identified with the segregation of chalcogen atoms, as well as boundaries between terminations and near neighbor interactions. Subsequent clustering analysis allows identification of the spatial localization of these dissimilar regions. Similar statistical analysis of modeled calculated density of states of chemically inhomogeneous FeTe1-xSex structures further confirms that the two types of chalcogens, i.e., Te and Se, can be identified by their electronic signature and differentiated by their local chemical environment. This approach allows detailed chemical discrimination of the scanning tunneling microscopy data including separation of atomic identities, proximity, and local configuration effects and can be universally applicable to chemically and electronically inhomogeneous surfaces.

  8. Review and statistical analysis of the use of ultrasonic velocity for estimating the porosity fraction in polycrystalline materials

    NASA Technical Reports Server (NTRS)

    Roth, D. J.; Swickard, S. M.; Stang, D. B.; Deguire, M. R.

    1991-01-01

    A review and statistical analysis of the ultrasonic velocity method for estimating the porosity fraction in polycrystalline materials is presented. Initially, a semiempirical model is developed showing the origin of the linear relationship between ultrasonic velocity and porosity fraction. Then, from a compilation of data produced by many researchers, scatter plots of velocity versus percent porosity data are shown for Al2O3, MgO, porcelain-based ceramics, PZT, SiC, Si3N4, steel, tungsten, UO2,(U0.30Pu0.70)C, and YBa2Cu3O(7-x). Linear regression analysis produces predicted slope, intercept, correlation coefficient, level of significance, and confidence interval statistics for the data. Velocity values predicted from regression analysis of fully-dense materials are in good agreement with those calculated from elastic properties.

  9. Review and statistical analysis of the ultrasonic velocity method for estimating the porosity fraction in polycrystalline materials

    NASA Technical Reports Server (NTRS)

    Roth, D. J.; Swickard, S. M.; Stang, D. B.; Deguire, M. R.

    1990-01-01

    A review and statistical analysis of the ultrasonic velocity method for estimating the porosity fraction in polycrystalline materials is presented. Initially, a semi-empirical model is developed showing the origin of the linear relationship between ultrasonic velocity and porosity fraction. Then, from a compilation of data produced by many researchers, scatter plots of velocity versus percent porosity data are shown for Al2O3, MgO, porcelain-based ceramics, PZT, SiC, Si3N4, steel, tungsten, UO2,(U0.30Pu0.70)C, and YBa2Cu3O(7-x). Linear regression analysis produced predicted slope, intercept, correlation coefficient, level of significance, and confidence interval statistics for the data. Velocity values predicted from regression analysis for fully-dense materials are in good agreement with those calculated from elastic properties.

  10. The Statistical Value of Raw Fluorescence Signal in Luminex xMAP Based Multiplex Immunoassays

    PubMed Central

    Breen, Edmond J.; Tan, Woei; Khan, Alamgir

    2016-01-01

    Tissue samples (plasma, saliva, serum or urine) from 169 patients classified as either normal or having one of seven possible diseases are analysed across three 96-well plates for the presences of 37 analytes using cytokine inflammation multiplexed immunoassay panels. Censoring for concentration data caused problems for analysis of the low abundant analytes. Using fluorescence analysis over concentration based analysis allowed analysis of these low abundant analytes. Mixed-effects analysis on the resulting fluorescence and concentration responses reveals a combination of censoring and mapping the fluorescence responses to concentration values, through a 5PL curve, changed observed analyte concentrations. Simulation verifies this, by showing a dependence on the mean florescence response and its distribution on the observed analyte concentration levels. Differences from normality, in the fluorescence responses, can lead to differences in concentration estimates and unreliable probabilities for treatment effects. It is seen that when fluorescence responses are normally distributed, probabilities of treatment effects for fluorescence based t-tests has greater statistical power than the same probabilities from concentration based t-tests. We add evidence that the fluorescence response, unlike concentration values, doesn’t require censoring and we show with respect to differential analysis on the fluorescence responses that background correction is not required. PMID:27243383

  11. Orchestrating high-throughput genomic analysis with Bioconductor

    PubMed Central

    Huber, Wolfgang; Carey, Vincent J.; Gentleman, Robert; Anders, Simon; Carlson, Marc; Carvalho, Benilton S.; Bravo, Hector Corrada; Davis, Sean; Gatto, Laurent; Girke, Thomas; Gottardo, Raphael; Hahne, Florian; Hansen, Kasper D.; Irizarry, Rafael A.; Lawrence, Michael; Love, Michael I.; MacDonald, James; Obenchain, Valerie; Oleś, Andrzej K.; Pagès, Hervé; Reyes, Alejandro; Shannon, Paul; Smyth, Gordon K.; Tenenbaum, Dan; Waldron, Levi; Morgan, Martin

    2015-01-01

    Bioconductor is an open-source, open-development software project for the analysis and comprehension of high-throughput data in genomics and molecular biology. The project aims to enable interdisciplinary research, collaboration and rapid development of scientific software. Based on the statistical programming language R, Bioconductor comprises 934 interoperable packages contributed by a large, diverse community of scientists. Packages cover a range of bioinformatic and statistical applications. They undergo formal initial review and continuous automated testing. We present an overview for prospective users and contributors. PMID:25633503

  12. Nateglinide versus repaglinide for type 2 diabetes mellitus in China.

    PubMed

    Li, Chanjuan; Xia, Jielai; Zhang, Gaokui; Wang, Suzhen; Wang, Ling

    2009-12-01

    The purpose of this study is to evaluate efficacy and safety of nateglinide tablet administration in comparison with those of repaglinide tablet as control on treating type 2 diabetes mellitus in China. Pooled-analysis with analysis of covariance (ANCOVA) method was applied to assess the efficacy and safety based on original data collected from four independent randomized clinical trials with similar research protocols. However meta-analysis was applied based on the outcomes of the four studies. The results by meta-analysis were comparable to those obtained by pooled-analysis. The means of HbA(1c), and fasting blood glucose in both the nateglinide and repaglinide groups were reduced significantly after 12 weeks duration but no statistical differences in reduction between the two groups. The adverse reaction rates were 9.89 and 6.51% in the nateglinide and repaglinide groups respectively, with the rate difference showing no statistical significance, and the Odds Ratio of adverse reaction rate (95% confidence interval) was 1.59 (0.99, 2.55). Both nateglinide and repaglinide administration have similarly significant effects on reducing HbA(1c) and FBG. However, the adverse reaction rate in the nateglinide group is higher than that in the latter using repaglinide but no statistical significance difference as revealed in the four clinical trials detailed below.

  13. Practicality of Elementary Statistics Module Based on CTL Completed by Instructions on Using Software R

    NASA Astrophysics Data System (ADS)

    Delyana, H.; Rismen, S.; Handayani, S.

    2018-04-01

    This research is a development research using 4-D design model (define, design, develop, and disseminate). The results of the define stage are analyzed for the needs of the following; Syllabus analysis, textbook analysis, student characteristics analysis and literature analysis. The results of textbook analysis obtained the description that of the two textbooks that must be owned by students also still difficulty in understanding it, the form of presentation also has not facilitated students to be independent in learning to find the concept, textbooks are also not equipped with data processing referrals by using software R. The developed module is considered valid by the experts. Further field trials are conducted to determine the practicality and effectiveness. The trial was conducted to the students of Mathematics Education Study Program of STKIP PGRI which was taken randomly which has not taken Basic Statistics Course that is as many as 4 people. Practical aspects of attention are easy, time efficient, easy to interpret, and equivalence. The practical value in each aspect is 3.7; 3.79, 3.7 and 3.78. Based on the results of the test students considered that the module has been very practical use in learning. This means that the module developed can be used by students in Elementary Statistics learning.

  14. Combination of statistical and physically based methods to assess shallow slide susceptibility at the basin scale

    NASA Astrophysics Data System (ADS)

    Oliveira, Sérgio C.; Zêzere, José L.; Lajas, Sara; Melo, Raquel

    2017-07-01

    Approaches used to assess shallow slide susceptibility at the basin scale are conceptually different depending on the use of statistical or physically based methods. The former are based on the assumption that the same causes are more likely to produce the same effects, whereas the latter are based on the comparison between forces which tend to promote movement along the slope and the counteracting forces that are resistant to motion. Within this general framework, this work tests two hypotheses: (i) although conceptually and methodologically distinct, the statistical and deterministic methods generate similar shallow slide susceptibility results regarding the model's predictive capacity and spatial agreement; and (ii) the combination of shallow slide susceptibility maps obtained with statistical and physically based methods, for the same study area, generate a more reliable susceptibility model for shallow slide occurrence. These hypotheses were tested at a small test site (13.9 km2) located north of Lisbon (Portugal), using a statistical method (the information value method, IV) and a physically based method (the infinite slope method, IS). The landslide susceptibility maps produced with the statistical and deterministic methods were combined into a new landslide susceptibility map. The latter was based on a set of integration rules defined by the cross tabulation of the susceptibility classes of both maps and analysis of the corresponding contingency tables. The results demonstrate a higher predictive capacity of the new shallow slide susceptibility map, which combines the independent results obtained with statistical and physically based models. Moreover, the combination of the two models allowed the identification of areas where the results of the information value and the infinite slope methods are contradictory. Thus, these areas were classified as uncertain and deserve additional investigation at a more detailed scale.

  15. Spatial analysis of relative humidity during ungauged periods in a mountainous region

    NASA Astrophysics Data System (ADS)

    Um, Myoung-Jin; Kim, Yeonjoo

    2017-08-01

    Although atmospheric humidity influences environmental and agricultural conditions, thereby influencing plant growth, human health, and air pollution, efforts to develop spatial maps of atmospheric humidity using statistical approaches have thus far been limited. This study therefore aims to develop statistical approaches for inferring the spatial distribution of relative humidity (RH) for a mountainous island, for which data are not uniformly available across the region. A multiple regression analysis based on various mathematical models was used to identify the optimal model for estimating monthly RH by incorporating not only temperature but also location and elevation. Based on the regression analysis, we extended the monthly RH data from weather stations to cover the ungauged periods when no RH observations were available. Then, two different types of station-based data, the observational data and the data extended via the regression model, were used to form grid-based data with a resolution of 100 m. The grid-based data that used the extended station-based data captured the increasing RH trend along an elevation gradient. Furthermore, annual RH values averaged over the regions were examined. Decreasing temporal trends were found in most cases, with magnitudes varying based on the season and region.

  16. Statistical analysis and application of quasi experiments to antimicrobial resistance intervention studies.

    PubMed

    Shardell, Michelle; Harris, Anthony D; El-Kamary, Samer S; Furuno, Jon P; Miller, Ram R; Perencevich, Eli N

    2007-10-01

    Quasi-experimental study designs are frequently used to assess interventions that aim to limit the emergence of antimicrobial-resistant pathogens. However, previous studies using these designs have often used suboptimal statistical methods, which may result in researchers making spurious conclusions. Methods used to analyze quasi-experimental data include 2-group tests, regression analysis, and time-series analysis, and they all have specific assumptions, data requirements, strengths, and limitations. An example of a hospital-based intervention to reduce methicillin-resistant Staphylococcus aureus infection rates and reduce overall length of stay is used to explore these methods.

  17. BCM: toolkit for Bayesian analysis of Computational Models using samplers.

    PubMed

    Thijssen, Bram; Dijkstra, Tjeerd M H; Heskes, Tom; Wessels, Lodewyk F A

    2016-10-21

    Computational models in biology are characterized by a large degree of uncertainty. This uncertainty can be analyzed with Bayesian statistics, however, the sampling algorithms that are frequently used for calculating Bayesian statistical estimates are computationally demanding, and each algorithm has unique advantages and disadvantages. It is typically unclear, before starting an analysis, which algorithm will perform well on a given computational model. We present BCM, a toolkit for the Bayesian analysis of Computational Models using samplers. It provides efficient, multithreaded implementations of eleven algorithms for sampling from posterior probability distributions and for calculating marginal likelihoods. BCM includes tools to simplify the process of model specification and scripts for visualizing the results. The flexible architecture allows it to be used on diverse types of biological computational models. In an example inference task using a model of the cell cycle based on ordinary differential equations, BCM is significantly more efficient than existing software packages, allowing more challenging inference problems to be solved. BCM represents an efficient one-stop-shop for computational modelers wishing to use sampler-based Bayesian statistics.

  18. Recognizing stationary and locomotion activities using combinational of spectral analysis with statistical descriptors features

    NASA Astrophysics Data System (ADS)

    Zainudin, M. N. Shah; Sulaiman, Md Nasir; Mustapha, Norwati; Perumal, Thinagaran

    2017-10-01

    Prior knowledge in pervasive computing recently garnered a lot of attention due to its high demand in various application domains. Human activity recognition (HAR) considered as the applications that are widely explored by the expertise that provides valuable information to the human. Accelerometer sensor-based approach is utilized as devices to undergo the research in HAR since their small in size and this sensor already build-in in the various type of smartphones. However, the existence of high inter-class similarities among the class tends to degrade the recognition performance. Hence, this work presents the method for activity recognition using our proposed features from combinational of spectral analysis with statistical descriptors that able to tackle the issue of differentiating stationary and locomotion activities. The noise signal is filtered using Fourier Transform before it will be extracted using two different groups of features, spectral frequency analysis, and statistical descriptors. Extracted signal later will be classified using random forest ensemble classifier models. The recognition results show the good accuracy performance for stationary and locomotion activities based on USC HAD datasets.

  19. Method and system of Jones-matrix mapping of blood plasma films with "fuzzy" analysis in differentiation of breast pathology changes

    NASA Astrophysics Data System (ADS)

    Zabolotna, Natalia I.; Radchenko, Kostiantyn O.; Karas, Oleksandr V.

    2018-01-01

    A fibroadenoma diagnosing of breast using statistical analysis (determination and analysis of statistical moments of the 1st-4th order) of the obtained polarization images of Jones matrix imaginary elements of the optically thin (attenuation coefficient τ <= 0,1 ) blood plasma films with further intellectual differentiation based on the method of "fuzzy" logic and discriminant analysis were proposed. The accuracy of the intellectual differentiation of blood plasma samples to the "norm" and "fibroadenoma" of breast was 82.7% by the method of linear discriminant analysis, and by the "fuzzy" logic method is 95.3%. The obtained results allow to confirm the potentially high level of reliability of the method of differentiation by "fuzzy" analysis.

  20. Teaching Research Methods and Statistics in eLearning Environments: Pedagogy, Practical Examples, and Possible Futures

    PubMed Central

    Rock, Adam J.; Coventry, William L.; Morgan, Methuen I.; Loi, Natasha M.

    2016-01-01

    Generally, academic psychologists are mindful of the fact that, for many students, the study of research methods and statistics is anxiety provoking (Gal et al., 1997). Given the ubiquitous and distributed nature of eLearning systems (Nof et al., 2015), teachers of research methods and statistics need to cultivate an understanding of how to effectively use eLearning tools to inspire psychology students to learn. Consequently, the aim of the present paper is to discuss critically how using eLearning systems might engage psychology students in research methods and statistics. First, we critically appraise definitions of eLearning. Second, we examine numerous important pedagogical principles associated with effectively teaching research methods and statistics using eLearning systems. Subsequently, we provide practical examples of our own eLearning-based class activities designed to engage psychology students to learn statistical concepts such as Factor Analysis and Discriminant Function Analysis. Finally, we discuss general trends in eLearning and possible futures that are pertinent to teachers of research methods and statistics in psychology. PMID:27014147

  1. Teaching Research Methods and Statistics in eLearning Environments: Pedagogy, Practical Examples, and Possible Futures.

    PubMed

    Rock, Adam J; Coventry, William L; Morgan, Methuen I; Loi, Natasha M

    2016-01-01

    Generally, academic psychologists are mindful of the fact that, for many students, the study of research methods and statistics is anxiety provoking (Gal et al., 1997). Given the ubiquitous and distributed nature of eLearning systems (Nof et al., 2015), teachers of research methods and statistics need to cultivate an understanding of how to effectively use eLearning tools to inspire psychology students to learn. Consequently, the aim of the present paper is to discuss critically how using eLearning systems might engage psychology students in research methods and statistics. First, we critically appraise definitions of eLearning. Second, we examine numerous important pedagogical principles associated with effectively teaching research methods and statistics using eLearning systems. Subsequently, we provide practical examples of our own eLearning-based class activities designed to engage psychology students to learn statistical concepts such as Factor Analysis and Discriminant Function Analysis. Finally, we discuss general trends in eLearning and possible futures that are pertinent to teachers of research methods and statistics in psychology.

  2. Object Classification Based on Analysis of Spectral Characteristics of Seismic Signal Envelopes

    NASA Astrophysics Data System (ADS)

    Morozov, Yu. V.; Spektor, A. A.

    2017-11-01

    A method for classifying moving objects having a seismic effect on the ground surface is proposed which is based on statistical analysis of the envelopes of received signals. The values of the components of the amplitude spectrum of the envelopes obtained applying Hilbert and Fourier transforms are used as classification criteria. Examples illustrating the statistical properties of spectra and the operation of the seismic classifier are given for an ensemble of objects of four classes (person, group of people, large animal, vehicle). It is shown that the computational procedures for processing seismic signals are quite simple and can therefore be used in real-time systems with modest requirements for computational resources.

  3. System of Mueller-Jones matrix polarizing mapping of blood plasma films in breast pathology

    NASA Astrophysics Data System (ADS)

    Zabolotna, Natalia I.; Radchenko, Kostiantyn O.; Tarnovskiy, Mykola H.

    2017-08-01

    The combined method of Jones-Mueller matrix mapping and blood plasma films analysis based on the system that proposed in this paper. Based on the obtained data about the structure and state of blood plasma samples the diagnostic conclusions can be make about the state of breast cancer patients ("normal" or "pathology"). Then, by using the statistical analysis obtain statistical and correlational moments for every coordinate distributions; these indicators are served as diagnostic criterias. The final step is to comparing results and choosing the most effective diagnostic indicators. The paper presents the results of Mueller-Jones matrix mapping of optically thin (attenuation coefficient ,τ≤0,1) blood plasma layers.

  4. SU-E-J-261: Statistical Analysis and Chaotic Dynamics of Respiratory Signal of Patients in BodyFix

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Michalski, D; Huq, M; Bednarz, G

    Purpose: To quantify respiratory signal of patients in BodyFix undergoing 4DCT scan with and without immobilization cover. Methods: 20 pairs of respiratory tracks recorded with RPM system during 4DCT scan were analyzed. Descriptive statistic was applied to selected parameters of exhale-inhale decomposition. Standardized signals were used with the delay method to build orbits in embedded space. Nonlinear behavior was tested with surrogate data. Sample entropy SE, Lempel-Ziv complexity LZC and the largest Lyapunov exponents LLE were compared. Results: Statistical tests show difference between scans for inspiration time and its variability, which is bigger for scans without cover. The same ismore » for variability of the end of exhalation and inhalation. Other parameters fail to show the difference. For both scans respiratory signals show determinism and nonlinear stationarity. Statistical test on surrogate data reveals their nonlinearity. LLEs show signals chaotic nature and its correlation with breathing period and its embedding delay time. SE, LZC and LLE measure respiratory signal complexity. Nonlinear characteristics do not differ between scans. Conclusion: Contrary to expectation cover applied to patients in BodyFix appears to have limited effect on signal parameters. Analysis based on trajectories of delay vectors shows respiratory system nonlinear character and its sensitive dependence on initial conditions. Reproducibility of respiratory signal can be evaluated with measures of signal complexity and its predictability window. Longer respiratory period is conducive for signal reproducibility as shown by these gauges. Statistical independence of the exhale and inhale times is also supported by the magnitude of LLE. The nonlinear parameters seem more appropriate to gauge respiratory signal complexity since its deterministic chaotic nature. It contrasts with measures based on harmonic analysis that are blind for nonlinear features. Dynamics of breathing, so crucial for 4D-based clinical technologies, can be better controlled if nonlinear-based methodology, which reflects respiration characteristic, is applied. Funding provided by Varian Medical Systems via Investigator Initiated Research Project.« less

  5. Testing homogeneity of proportion ratios for stratified correlated bilateral data in two-arm randomized clinical trials.

    PubMed

    Pei, Yanbo; Tian, Guo-Liang; Tang, Man-Lai

    2014-11-10

    Stratified data analysis is an important research topic in many biomedical studies and clinical trials. In this article, we develop five test statistics for testing the homogeneity of proportion ratios for stratified correlated bilateral binary data based on an equal correlation model assumption. Bootstrap procedures based on these test statistics are also considered. To evaluate the performance of these statistics and procedures, we conduct Monte Carlo simulations to study their empirical sizes and powers under various scenarios. Our results suggest that the procedure based on score statistic performs well generally and is highly recommended. When the sample size is large, procedures based on the commonly used weighted least square estimate and logarithmic transformation with Mantel-Haenszel estimate are recommended as they do not involve any computation of maximum likelihood estimates requiring iterative algorithms. We also derive approximate sample size formulas based on the recommended test procedures. Finally, we apply the proposed methods to analyze a multi-center randomized clinical trial for scleroderma patients. Copyright © 2014 John Wiley & Sons, Ltd.

  6. DCL System Research Using Advanced Approaches for Land-based or Ship-based Real-Time Recognition and Localization of Marine Mammals

    DTIC Science & Technology

    2012-09-30

    recognition. Algorithm design and statistical analysis and feature analysis. Post -Doctoral Associate, Cornell University, Bioacoustics Research...short. The HPC-ADA was designed based on fielded systems [1-4, 6] that offer a variety of desirable attributes, specifically dynamic resource...The software package was designed to utilize parallel and distributed processing for running recognition and other advanced algorithms. DeLMA

  7. Assessment and prediction of inter-joint upper limb movement correlations based on kinematic analysis and statistical regression

    NASA Astrophysics Data System (ADS)

    Toth-Tascau, Mirela; Balanean, Flavia; Krepelka, Mircea

    2013-10-01

    Musculoskeletal impairment of the upper limb can cause difficulties in performing basic daily activities. Three dimensional motion analyses can provide valuable data of arm movement in order to precisely determine arm movement and inter-joint coordination. The purpose of this study was to develop a method to evaluate the degree of impairment based on the influence of shoulder movements in the amplitude of elbow flexion and extension based on the assumption that a lack of motion of the elbow joint will be compensated by an increased shoulder activity. In order to develop and validate a statistical model, one healthy young volunteer has been involved in the study. The activity of choice simulated blowing the nose, starting from a slight flexion of the elbow and raising the hand until the middle finger touches the tip of the nose and return to the start position. Inter-joint coordination between the elbow and shoulder movements showed significant correlation. Statistical regression was used to fit an equation model describing the influence of shoulder movements on the elbow mobility. The study provides a brief description of the kinematic analysis protocol and statistical models that may be useful in describing the relation between inter-joint movements of daily activities.

  8. A hierarchical fuzzy rule-based approach to aphasia diagnosis.

    PubMed

    Akbarzadeh-T, Mohammad-R; Moshtagh-Khorasani, Majid

    2007-10-01

    Aphasia diagnosis is a particularly challenging medical diagnostic task due to the linguistic uncertainty and vagueness, inconsistencies in the definition of aphasic syndromes, large number of measurements with imprecision, natural diversity and subjectivity in test objects as well as in opinions of experts who diagnose the disease. To efficiently address this diagnostic process, a hierarchical fuzzy rule-based structure is proposed here that considers the effect of different features of aphasia by statistical analysis in its construction. This approach can be efficient for diagnosis of aphasia and possibly other medical diagnostic applications due to its fuzzy and hierarchical reasoning construction. Initially, the symptoms of the disease which each consists of different features are analyzed statistically. The measured statistical parameters from the training set are then used to define membership functions and the fuzzy rules. The resulting two-layered fuzzy rule-based system is then compared with a back propagating feed-forward neural network for diagnosis of four Aphasia types: Anomic, Broca, Global and Wernicke. In order to reduce the number of required inputs, the technique is applied and compared on both comprehensive and spontaneous speech tests. Statistical t-test analysis confirms that the proposed approach uses fewer Aphasia features while also presenting a significant improvement in terms of accuracy.

  9. Assessment of statistical methods used in library-based approaches to microbial source tracking.

    PubMed

    Ritter, Kerry J; Carruthers, Ethan; Carson, C Andrew; Ellender, R D; Harwood, Valerie J; Kingsley, Kyle; Nakatsu, Cindy; Sadowsky, Michael; Shear, Brian; West, Brian; Whitlock, John E; Wiggins, Bruce A; Wilbur, Jayson D

    2003-12-01

    Several commonly used statistical methods for fingerprint identification in microbial source tracking (MST) were examined to assess the effectiveness of pattern-matching algorithms to correctly identify sources. Although numerous statistical methods have been employed for source identification, no widespread consensus exists as to which is most appropriate. A large-scale comparison of several MST methods, using identical fecal sources, presented a unique opportunity to assess the utility of several popular statistical methods. These included discriminant analysis, nearest neighbour analysis, maximum similarity and average similarity, along with several measures of distance or similarity. Threshold criteria for excluding uncertain or poorly matched isolates from final analysis were also examined for their ability to reduce false positives and increase prediction success. Six independent libraries used in the study were constructed from indicator bacteria isolated from fecal materials of humans, seagulls, cows and dogs. Three of these libraries were constructed using the rep-PCR technique and three relied on antibiotic resistance analysis (ARA). Five of the libraries were constructed using Escherichia coli and one using Enterococcus spp. (ARA). Overall, the outcome of this study suggests a high degree of variability across statistical methods. Despite large differences in correct classification rates among the statistical methods, no single statistical approach emerged as superior. Thresholds failed to consistently increase rates of correct classification and improvement was often associated with substantial effective sample size reduction. Recommendations are provided to aid in selecting appropriate analyses for these types of data.

  10. Stochastic or statistic? Comparing flow duration curve models in ungauged basins and changing climates

    NASA Astrophysics Data System (ADS)

    Müller, M. F.; Thompson, S. E.

    2015-09-01

    The prediction of flow duration curves (FDCs) in ungauged basins remains an important task for hydrologists given the practical relevance of FDCs for water management and infrastructure design. Predicting FDCs in ungauged basins typically requires spatial interpolation of statistical or model parameters. This task is complicated if climate becomes non-stationary, as the prediction challenge now also requires extrapolation through time. In this context, process-based models for FDCs that mechanistically link the streamflow distribution to climate and landscape factors may have an advantage over purely statistical methods to predict FDCs. This study compares a stochastic (process-based) and statistical method for FDC prediction in both stationary and non-stationary contexts, using Nepal as a case study. Under contemporary conditions, both models perform well in predicting FDCs, with Nash-Sutcliffe coefficients above 0.80 in 75 % of the tested catchments. The main drives of uncertainty differ between the models: parameter interpolation was the main source of error for the statistical model, while violations of the assumptions of the process-based model represented the main source of its error. The process-based approach performed better than the statistical approach in numerical simulations with non-stationary climate drivers. The predictions of the statistical method under non-stationary rainfall conditions were poor if (i) local runoff coefficients were not accurately determined from the gauge network, or (ii) streamflow variability was strongly affected by changes in rainfall. A Monte Carlo analysis shows that the streamflow regimes in catchments characterized by a strong wet-season runoff and a rapid, strongly non-linear hydrologic response are particularly sensitive to changes in rainfall statistics. In these cases, process-based prediction approaches are strongly favored over statistical models.

  11. Cognition, comprehension and application of biostatistics in research by Indian postgraduate students in periodontics

    PubMed Central

    Swetha, Jonnalagadda Laxmi; Arpita, Ramisetti; Srikanth, Chintalapani; Nutalapati, Rajasekhar

    2014-01-01

    Background: Biostatistics is an integral part of research protocols. In any field of inquiry or investigation, data obtained is subsequently classified, analyzed and tested for accuracy by statistical methods. Statistical analysis of collected data, thus, forms the basis for all evidence-based conclusions. Aim: The aim of this study is to evaluate the cognition, comprehension and application of biostatistics in research among post graduate students in Periodontics, in India. Materials and Methods: A total of 391 post graduate students registered for a master's course in periodontics at various dental colleges across India were included in the survey. Data regarding the level of knowledge, understanding and its application in design and conduct of the research protocol was collected using a dichotomous questionnaire. A descriptive statistics was used for data analysis. Results: Nearly 79.2% students were aware of the importance of biostatistics in research, 55-65% were familiar with MS-EXCEL spreadsheet for graphical representation of data and with the statistical softwares available on the internet, 26.0% had biostatistics as mandatory subject in their curriculum, 9.5% tried to perform statistical analysis on their own while 3.0% were successful in performing statistical analysis of their studies on their own. Conclusion: Biostatistics should play a central role in planning, conduct, interim analysis, final analysis and reporting of periodontal research especially by the postgraduate students. Indian postgraduate students in periodontics are aware of the importance of biostatistics in research but the level of understanding and application is still basic and needs to be addressed. PMID:24744547

  12. Statistical description of non-Gaussian samples in the F2 layer of the ionosphere during heliogeophysical disturbances

    NASA Astrophysics Data System (ADS)

    Sergeenko, N. P.

    2017-11-01

    An adequate statistical method should be developed in order to predict probabilistically the range of ionospheric parameters. This problem is solved in this paper. The time series of the critical frequency of the layer F2- foF2( t) were subjected to statistical processing. For the obtained samples {δ foF2}, statistical distributions and invariants up to the fourth order are calculated. The analysis shows that the distributions differ from the Gaussian law during the disturbances. At levels of sufficiently small probability distributions, there are arbitrarily large deviations from the model of the normal process. Therefore, it is attempted to describe statistical samples {δ foF2} based on the Poisson model. For the studied samples, the exponential characteristic function is selected under the assumption that time series are a superposition of some deterministic and random processes. Using the Fourier transform, the characteristic function is transformed into a nonholomorphic excessive-asymmetric probability-density function. The statistical distributions of the samples {δ foF2} calculated for the disturbed periods are compared with the obtained model distribution function. According to the Kolmogorov's criterion, the probabilities of the coincidence of a posteriori distributions with the theoretical ones are P 0.7-0.9. The conducted analysis makes it possible to draw a conclusion about the applicability of a model based on the Poisson random process for the statistical description and probabilistic variation estimates during heliogeophysical disturbances of the variations {δ foF2}.

  13. Visual Survey of Infantry Troops. Part 1. Visual Acuity, Refractive Status, Interpupillary Distance and Visual Skills

    DTIC Science & Technology

    1989-06-01

    letters on one line and several letters on the next line, there is no accurate way to credit these extra letters for statistical analysis. The decimal and...contains the descriptive statistics of the objective refractive error components of infantrymen. Figures 8-11 show the frequency distributions for sphere...equivalents. Nonspectacle wearers Table 12 contains the idescriptive statistics for non- spectacle wearers. Based or these refractive error data, about 30

  14. Agriculture, population growth, and statistical analysis of the radiocarbon record.

    PubMed

    Zahid, H Jabran; Robinson, Erick; Kelly, Robert L

    2016-01-26

    The human population has grown significantly since the onset of the Holocene about 12,000 y ago. Despite decades of research, the factors determining prehistoric population growth remain uncertain. Here, we examine measurements of the rate of growth of the prehistoric human population based on statistical analysis of the radiocarbon record. We find that, during most of the Holocene, human populations worldwide grew at a long-term annual rate of 0.04%. Statistical analysis of the radiocarbon record shows that transitioning farming societies experienced the same rate of growth as contemporaneous foraging societies. The same rate of growth measured for populations dwelling in a range of environments and practicing a variety of subsistence strategies suggests that the global climate and/or endogenous biological factors, not adaptability to local environment or subsistence practices, regulated the long-term growth of the human population during most of the Holocene. Our results demonstrate that statistical analyses of large ensembles of radiocarbon dates are robust and valuable for quantitatively investigating the demography of prehistoric human populations worldwide.

  15. Statistical analysis for understanding and predicting battery degradations in real-life electric vehicle use

    NASA Astrophysics Data System (ADS)

    Barré, Anthony; Suard, Frédéric; Gérard, Mathias; Montaru, Maxime; Riu, Delphine

    2014-01-01

    This paper describes the statistical analysis of recorded data parameters of electrical battery ageing during electric vehicle use. These data permit traditional battery ageing investigation based on the evolution of the capacity fade and resistance raise. The measured variables are examined in order to explain the correlation between battery ageing and operating conditions during experiments. Such study enables us to identify the main ageing factors. Then, detailed statistical dependency explorations present the responsible factors on battery ageing phenomena. Predictive battery ageing models are built from this approach. Thereby results demonstrate and quantify a relationship between variables and battery ageing global observations, and also allow accurate battery ageing diagnosis through predictive models.

  16. Anomalous heat transfer modes of nanofluids: a review based on statistical analysis

    NASA Astrophysics Data System (ADS)

    Sergis, Antonis; Hardalupas, Yannis

    2011-05-01

    This paper contains the results of a concise statistical review analysis of a large amount of publications regarding the anomalous heat transfer modes of nanofluids. The application of nanofluids as coolants is a novel practise with no established physical foundations explaining the observed anomalous heat transfer. As a consequence, traditional methods of performing a literature review may not be adequate in presenting objectively the results representing the bulk of the available literature. The current literature review analysis aims to resolve the problems faced by researchers in the past by employing an unbiased statistical analysis to present and reveal the current trends and general belief of the scientific community regarding the anomalous heat transfer modes of nanofluids. The thermal performance analysis indicated that statistically there exists a variable enhancement for conduction, convection/mixed heat transfer, pool boiling heat transfer and critical heat flux modes. The most popular proposed mechanisms in the literature to explain heat transfer in nanofluids are revealed, as well as possible trends between nanofluid properties and thermal performance. The review also suggests future experimentation to provide more conclusive answers to the control mechanisms and influential parameters of heat transfer in nanofluids.

  17. Anomalous heat transfer modes of nanofluids: a review based on statistical analysis.

    PubMed

    Sergis, Antonis; Hardalupas, Yannis

    2011-05-19

    This paper contains the results of a concise statistical review analysis of a large amount of publications regarding the anomalous heat transfer modes of nanofluids. The application of nanofluids as coolants is a novel practise with no established physical foundations explaining the observed anomalous heat transfer. As a consequence, traditional methods of performing a literature review may not be adequate in presenting objectively the results representing the bulk of the available literature. The current literature review analysis aims to resolve the problems faced by researchers in the past by employing an unbiased statistical analysis to present and reveal the current trends and general belief of the scientific community regarding the anomalous heat transfer modes of nanofluids. The thermal performance analysis indicated that statistically there exists a variable enhancement for conduction, convection/mixed heat transfer, pool boiling heat transfer and critical heat flux modes. The most popular proposed mechanisms in the literature to explain heat transfer in nanofluids are revealed, as well as possible trends between nanofluid properties and thermal performance. The review also suggests future experimentation to provide more conclusive answers to the control mechanisms and influential parameters of heat transfer in nanofluids.

  18. Anomalous heat transfer modes of nanofluids: a review based on statistical analysis

    PubMed Central

    2011-01-01

    This paper contains the results of a concise statistical review analysis of a large amount of publications regarding the anomalous heat transfer modes of nanofluids. The application of nanofluids as coolants is a novel practise with no established physical foundations explaining the observed anomalous heat transfer. As a consequence, traditional methods of performing a literature review may not be adequate in presenting objectively the results representing the bulk of the available literature. The current literature review analysis aims to resolve the problems faced by researchers in the past by employing an unbiased statistical analysis to present and reveal the current trends and general belief of the scientific community regarding the anomalous heat transfer modes of nanofluids. The thermal performance analysis indicated that statistically there exists a variable enhancement for conduction, convection/mixed heat transfer, pool boiling heat transfer and critical heat flux modes. The most popular proposed mechanisms in the literature to explain heat transfer in nanofluids are revealed, as well as possible trends between nanofluid properties and thermal performance. The review also suggests future experimentation to provide more conclusive answers to the control mechanisms and influential parameters of heat transfer in nanofluids. PMID:21711932

  19. Method for Identifying Probable Archaeological Sites from Remotely Sensed Data

    NASA Technical Reports Server (NTRS)

    Tilton, James C.; Comer, Douglas C.; Priebe, Carey E.; Sussman, Daniel

    2011-01-01

    Archaeological sites are being compromised or destroyed at a catastrophic rate in most regions of the world. The best solution to this problem is for archaeologists to find and study these sites before they are compromised or destroyed. One way to facilitate the necessary rapid, wide area surveys needed to find these archaeological sites is through the generation of maps of probable archaeological sites from remotely sensed data. We describe an approach for identifying probable locations of archaeological sites over a wide area based on detecting subtle anomalies in vegetative cover through a statistically based analysis of remotely sensed data from multiple sources. We further developed this approach under a recent NASA ROSES Space Archaeology Program project. Under this project we refined and elaborated this statistical analysis to compensate for potential slight miss-registrations between the remote sensing data sources and the archaeological site location data. We also explored data quantization approaches (required by the statistical analysis approach), and we identified a superior data quantization approached based on a unique image segmentation approach. In our presentation we will summarize our refined approach and demonstrate the effectiveness of the overall approach with test data from Santa Catalina Island off the southern California coast. Finally, we discuss our future plans for further improving our approach.

  20. Trial Sequential Methods for Meta-Analysis

    ERIC Educational Resources Information Center

    Kulinskaya, Elena; Wood, John

    2014-01-01

    Statistical methods for sequential meta-analysis have applications also for the design of new trials. Existing methods are based on group sequential methods developed for single trials and start with the calculation of a required information size. This works satisfactorily within the framework of fixed effects meta-analysis, but conceptual…

  1. Cluster Analysis of Minnesota School Districts. A Research Report.

    ERIC Educational Resources Information Center

    Cleary, James

    The term "cluster analysis" refers to a set of statistical methods that classify entities with similar profiles of scores on a number of measured dimensions, in order to create empirically based typologies. A 1980 Minnesota House Research Report employed cluster analysis to categorize school districts according to their relative mixtures…

  2. Using Statistics and Data Mining Approaches to Analyze Male Sexual Behaviors and Use of Erectile Dysfunction Drugs Based on Large Questionnaire Data.

    PubMed

    Qiao, Zhi; Li, Xiang; Liu, Haifeng; Zhang, Lei; Cao, Junyang; Xie, Guotong; Qin, Nan; Jiang, Hui; Lin, Haocheng

    2017-01-01

    The prevalence of erectile dysfunction (ED) has been extensively studied worldwide. Erectile dysfunction drugs has shown great efficacy in preventing male erectile dysfunction. In order to help doctors know drug taken preference of patients and better prescribe, it is crucial to analyze who actually take erectile dysfunction drugs and the relation between sexual behaviors and drug use. Existing clinical studies usually used descriptive statistics and regression analysis based on small volume of data. In this paper, based on big volume of data (48,630 questionnaires), we use data mining approaches besides statistics and regression analysis to comprehensively analyze the relation between male sexual behaviors and use of erectile dysfunction drugs for unravelling the characteristic of patients who take erectile dysfunction drugs. We firstly analyze the impact of multiple sexual behavior factors on whether to use the erectile dysfunction drugs. Then, we explore to mine the Decision Rules for Stratification to discover patients who are more likely to take drugs. Based on the decision rules, the patients can be partitioned into four potential groups for use of erectile dysfunction: high potential group, intermediate potential-1 group, intermediate potential-2 group and low potential group. Experimental results show 1) the sexual behavior factors, erectile hardness and time length to prepare (how long to prepares for sexual behaviors ahead of time), have bigger impacts both in correlation analysis and potential drug taking patients discovering; 2) odds ratio between patients identified as low potential and high potential was 6.098 (95% confidence interval, 5.159-7.209) with statistically significant differences in taking drug potential detected between all potential groups.

  3. Statistical properties of the ice particle distribution in stratiform clouds

    NASA Astrophysics Data System (ADS)

    Delanoe, J.; Tinel, C.; Testud, J.

    2003-04-01

    This paper presents an extensive analysis of several microphysical data bases CEPEX, EUCREX, CLARE and CARL to determine statistical properties of the Particle Size Distribution (PSD). The data base covers different type of stratiform clouds : tropical cirrus (CEPEX), mid-latitude cirrus (EUCREX) and mid-latitude cirrus and stratus (CARL,CLARE) The approach for analysis uses the concept of normalisation of the PSD developed by Testud et al. (2001). The normalization aims at isolating three independent characteristics of the PSD : its "intrinsic" shape, the "average size" of the spectrum and the ice water content IWC, "average size" is meant the mean mass weighted diameter. It is shown that concentration should be normalized by N_0^* proportional to IWC/D_m^4. The "intrinsic" shape is defined as F(Deq/D_m)=N(Deq)/N_0^* where Deq is the equivalent melted diameter. The "intrinsic" shape is found to be very stable in the range 001.5, more scatter is observed, but future analysis should decide if it is representative of real physical variation or statistical "error" due to counting problem. Considering an overall statistics over the full data base, a large scatter of the N_0^* against Dm plot is found. But in the case of a particular event or a particular leg of a flight, the N_0^* vs. Dm plot is much less scattered and shows a systematic trend for decaying of N_0^* when Dm increases. This trend is interpreted as the manifestation of the predominance of the aggregation process. Finally an important point for cloud remote sensing is investigated : the normalised relationships IWC/N_0^* against Z/N_0^* is much less scattered that the classical IWC against Z the radar reflectivity factor.

  4. University and student segmentation: multilevel latent-class analysis of students' attitudes towards research methods and statistics.

    PubMed

    Mutz, Rüdiger; Daniel, Hans-Dieter

    2013-06-01

    It is often claimed that psychology students' attitudes towards research methods and statistics affect course enrollment, persistence, achievement, and course climate. However, the inter-institutional variability has been widely neglected in the research on students' attitudes towards research methods and statistics, but it is important for didactic purposes (heterogeneity of the student population). The paper presents a scale based on findings of the social psychology of attitudes (polar and emotion-based concept) in conjunction with a method for capturing beginning university students' attitudes towards research methods and statistics and identifying the proportion of students having positive attitudes at the institutional level. The study based on a re-analysis of a nationwide survey in Germany in August 2000 of all psychology students that enrolled in fall 1999/2000 (N= 1,490) and N= 44 universities. Using multilevel latent-class analysis (MLLCA), the aim was to group students in different student attitude types and at the same time to obtain university segments based on the incidences of the different student attitude types. Four student latent clusters were found that can be ranked on a bipolar attitude dimension. Membership in a cluster was predicted by age, grade point average (GPA) on school-leaving exam, and personality traits. In addition, two university segments were found: universities with an average proportion of students with positive attitudes and universities with a high proportion of students with positive attitudes (excellent segment). As psychology students make up a very heterogeneous group, the use of multiple learning activities as opposed to the classical lecture course is required. © 2011 The British Psychological Society.

  5. Meta-analysis of gene-level associations for rare variants based on single-variant statistics.

    PubMed

    Hu, Yi-Juan; Berndt, Sonja I; Gustafsson, Stefan; Ganna, Andrea; Hirschhorn, Joel; North, Kari E; Ingelsson, Erik; Lin, Dan-Yu

    2013-08-08

    Meta-analysis of genome-wide association studies (GWASs) has led to the discoveries of many common variants associated with complex human diseases. There is a growing recognition that identifying "causal" rare variants also requires large-scale meta-analysis. The fact that association tests with rare variants are performed at the gene level rather than at the variant level poses unprecedented challenges in the meta-analysis. First, different studies may adopt different gene-level tests, so the results are not compatible. Second, gene-level tests require multivariate statistics (i.e., components of the test statistic and their covariance matrix), which are difficult to obtain. To overcome these challenges, we propose to perform gene-level tests for rare variants by combining the results of single-variant analysis (i.e., p values of association tests and effect estimates) from participating studies. This simple strategy is possible because of an insight that multivariate statistics can be recovered from single-variant statistics, together with the correlation matrix of the single-variant test statistics, which can be estimated from one of the participating studies or from a publicly available database. We show both theoretically and numerically that the proposed meta-analysis approach provides accurate control of the type I error and is as powerful as joint analysis of individual participant data. This approach accommodates any disease phenotype and any study design and produces all commonly used gene-level tests. An application to the GWAS summary results of the Genetic Investigation of ANthropometric Traits (GIANT) consortium reveals rare and low-frequency variants associated with human height. The relevant software is freely available. Copyright © 2013 The American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.

  6. Text grouping in patent analysis using adaptive K-means clustering algorithm

    NASA Astrophysics Data System (ADS)

    Shanie, Tiara; Suprijadi, Jadi; Zulhanif

    2017-03-01

    Patents are one of the Intellectual Property. Analyzing patent is one requirement in knowing well the development of technology in each country and in the world now. This study uses the patent document coming from the Espacenet server about Green Tea. Patent documents related to the technology in the field of tea is still widespread, so it will be difficult for users to information retrieval (IR). Therefore, it is necessary efforts to categorize documents in a specific group of related terms contained therein. This study uses titles patent text data with the proposed Green Tea in Statistical Text Mining methods consists of two phases: data preparation and data analysis stage. The data preparation phase uses Text Mining methods and data analysis stage is done by statistics. Statistical analysis in this study using a cluster analysis algorithm, the Adaptive K-Means Clustering Algorithm. Results from this study showed that based on the maximum value Silhouette, generate 87 clusters associated fifteen terms therein that can be utilized in the process of information retrieval needs.

  7. Optical diagnosis of cervical cancer by higher order spectra and boosting

    NASA Astrophysics Data System (ADS)

    Pratiher, Sawon; Mukhopadhyay, Sabyasachi; Barman, Ritwik; Pratiher, Souvik; Pradhan, Asima; Ghosh, Nirmalya; Panigrahi, Prasanta K.

    2017-03-01

    In this contribution, we report the application of higher order statistical moments using decision tree and ensemble based learning methodology for the development of diagnostic algorithms for optical diagnosis of cancer. The classification results were compared to those obtained with an independent feature extractors like linear discriminant analysis (LDA). The performance and efficacy of these methodology using higher order statistics as a classifier using boosting has higher specificity and sensitivity while being much faster as compared to other time-frequency domain based methods.

  8. Statistical Analysis of High-Cycle Fatigue Behavior of Friction Stir Welded AA5083-H321

    DTIC Science & Technology

    2011-01-01

    durable structures are: (a) FSW is 111being used in a serial production of aluminum alloy -based 112ferryboat deck structures in Finland; (b) Al-Mg- Si -based...and strain-hardened/stabilized Al-Mg-Mn alloy ) are characterized by a relatively large statistical scatter. This scatter is closely related to the...associated with friction stir-welded (FSW) joints of AA5083-H321 (a solid-solution-strengthened and strain-hardened/stabilized Al-Mg-Mn alloy ) are

  9. Network meta-analysis: a technique to gather evidence from direct and indirect comparisons

    PubMed Central

    2017-01-01

    Systematic reviews and pairwise meta-analyses of randomized controlled trials, at the intersection of clinical medicine, epidemiology and statistics, are positioned at the top of evidence-based practice hierarchy. These are important tools to base drugs approval, clinical protocols and guidelines formulation and for decision-making. However, this traditional technique only partially yield information that clinicians, patients and policy-makers need to make informed decisions, since it usually compares only two interventions at the time. In the market, regardless the clinical condition under evaluation, usually many interventions are available and few of them have been studied in head-to-head studies. This scenario precludes conclusions to be drawn from comparisons of all interventions profile (e.g. efficacy and safety). The recent development and introduction of a new technique – usually referred as network meta-analysis, indirect meta-analysis, multiple or mixed treatment comparisons – has allowed the estimation of metrics for all possible comparisons in the same model, simultaneously gathering direct and indirect evidence. Over the last years this statistical tool has matured as technique with models available for all types of raw data, producing different pooled effect measures, using both Frequentist and Bayesian frameworks, with different software packages. However, the conduction, report and interpretation of network meta-analysis still poses multiple challenges that should be carefully considered, especially because this technique inherits all assumptions from pairwise meta-analysis but with increased complexity. Thus, we aim to provide a basic explanation of network meta-analysis conduction, highlighting its risks and benefits for evidence-based practice, including information on statistical methods evolution, assumptions and steps for performing the analysis. PMID:28503228

  10. Examining the effectiveness of discriminant function analysis and cluster analysis in species identification of male field crickets based on their calling songs.

    PubMed

    Jaiswara, Ranjana; Nandi, Diptarup; Balakrishnan, Rohini

    2013-01-01

    Traditional taxonomy based on morphology has often failed in accurate species identification owing to the occurrence of cryptic species, which are reproductively isolated but morphologically identical. Molecular data have thus been used to complement morphology in species identification. The sexual advertisement calls in several groups of acoustically communicating animals are species-specific and can thus complement molecular data as non-invasive tools for identification. Several statistical tools and automated identifier algorithms have been used to investigate the efficiency of acoustic signals in species identification. Despite a plethora of such methods, there is a general lack of knowledge regarding the appropriate usage of these methods in specific taxa. In this study, we investigated the performance of two commonly used statistical methods, discriminant function analysis (DFA) and cluster analysis, in identification and classification based on acoustic signals of field cricket species belonging to the subfamily Gryllinae. Using a comparative approach we evaluated the optimal number of species and calling song characteristics for both the methods that lead to most accurate classification and identification. The accuracy of classification using DFA was high and was not affected by the number of taxa used. However, a constraint in using discriminant function analysis is the need for a priori classification of songs. Accuracy of classification using cluster analysis, which does not require a priori knowledge, was maximum for 6-7 taxa and decreased significantly when more than ten taxa were analysed together. We also investigated the efficacy of two novel derived acoustic features in improving the accuracy of identification. Our results show that DFA is a reliable statistical tool for species identification using acoustic signals. Our results also show that cluster analysis of acoustic signals in crickets works effectively for species classification and identification.

  11. Computational Analysis for Rocket-Based Combined-Cycle Systems During Rocket-Only Operation

    NASA Technical Reports Server (NTRS)

    Steffen, C. J., Jr.; Smith, T. D.; Yungster, S.; Keller, D. J.

    2000-01-01

    A series of Reynolds-averaged Navier-Stokes calculations were employed to study the performance of rocket-based combined-cycle systems operating in an all-rocket mode. This parametric series of calculations were executed within a statistical framework, commonly known as design of experiments. The parametric design space included four geometric and two flowfield variables set at three levels each, for a total of 729 possible combinations. A D-optimal design strategy was selected. It required that only 36 separate computational fluid dynamics (CFD) solutions be performed to develop a full response surface model, which quantified the linear, bilinear, and curvilinear effects of the six experimental variables. The axisymmetric, Reynolds-averaged Navier-Stokes simulations were executed with the NPARC v3.0 code. The response used in the statistical analysis was created from Isp efficiency data integrated from the 36 CFD simulations. The influence of turbulence modeling was analyzed by using both one- and two-equation models. Careful attention was also given to quantify the influence of mesh dependence, iterative convergence, and artificial viscosity upon the resulting statistical model. Thirteen statistically significant effects were observed to have an influence on rocket-based combined-cycle nozzle performance. It was apparent that the free-expansion process, directly downstream of the rocket nozzle, can influence the Isp efficiency. Numerical schlieren images and particle traces have been used to further understand the physical phenomena behind several of the statistically significant results.

  12. Precipitate statistics in an Al-Mg-Si-Cu alloy from scanning precession electron diffraction data

    NASA Astrophysics Data System (ADS)

    Sunde, J. K.; Paulsen, Ø.; Wenner, S.; Holmestad, R.

    2017-09-01

    The key microstructural feature providing strength to age-hardenable Al alloys is nanoscale precipitates. Alloy development requires a reliable statistical assessment of these precipitates, in order to link the microstructure with material properties. Here, it is demonstrated that scanning precession electron diffraction combined with computational analysis enable the semi-automated extraction of precipitate statistics in an Al-Mg-Si-Cu alloy. Among the main findings is the precipitate number density, which agrees well with a conventional method based on manual counting and measurements. By virtue of its data analysis objectivity, our methodology is therefore seen as an advantageous alternative to existing routines, offering reproducibility and efficiency in alloy statistics. Additional results include improved qualitative information on phase distributions. The developed procedure is generic and applicable to any material containing nanoscale precipitates.

  13. Launch commit criteria performance trending analysis, phase 1, revision A. SRM and QA mission services

    NASA Technical Reports Server (NTRS)

    1989-01-01

    An assessment of quantitative methods and measures for measuring launch commit criteria (LCC) performance measurement trends is made. A statistical performance trending analysis pilot study was processed and compared to STS-26 mission data. This study used four selected shuttle measurement types (solid rocket booster, external tank, space shuttle main engine, and range safety switch safe and arm device) from the five missions prior to mission 51-L. After obtaining raw data coordinates, each set of measurements was processed to obtain statistical confidence bounds and mean data profiles for each of the selected measurement types. STS-26 measurements were compared to the statistical data base profiles to verify the statistical capability of assessing occurrences of data trend anomalies and abnormal time-varying operational conditions associated with data amplitude and phase shifts.

  14. Integrating Statistical Mechanics with Experimental Data from the Rotational-Vibrational Spectrum of HCl into the Physical Chemistry Laboratory

    ERIC Educational Resources Information Center

    Findley, Bret R.; Mylon, Steven E.

    2008-01-01

    We introduce a computer exercise that bridges spectroscopy and thermodynamics using statistical mechanics and the experimental data taken from the commonly used laboratory exercise involving the rotational-vibrational spectrum of HCl. Based on the results from the analysis of their HCl spectrum, students calculate bulk thermodynamic properties…

  15. Black Females in High School: A Statistical Educational Profile

    ERIC Educational Resources Information Center

    Muhammad, Crystal Gafford; Dixson, Adrienne D.

    2008-01-01

    In life as in literature, both the mainstream public and the Black community writ large, overlook the Black female experiences, both adolescent and adult. In order to contribute to the knowledge base regarding this population, we present through our study a statistical portrait of Black females in high school. To do so, we present an analysis of…

  16. Which Variables Associated with Data-Driven Instruction Are Believed to Best Predict Urban Student Achievement?

    ERIC Educational Resources Information Center

    Greer, Wil

    2013-01-01

    This study identified the variables associated with data-driven instruction (DDI) that are perceived to best predict student achievement. Of the DDI variables discussed in the literature, 51 of them had a sufficient enough research base to warrant statistical analysis. Of them, 26 were statistically significant. Multiple regression and an…

  17. BATMAN: Bayesian Technique for Multi-image Analysis

    NASA Astrophysics Data System (ADS)

    Casado, J.; Ascasibar, Y.; García-Benito, R.; Guidi, G.; Choudhury, O. S.; Bellocchi, E.; Sánchez, S. F.; Díaz, A. I.

    2017-04-01

    This paper describes the Bayesian Technique for Multi-image Analysis (BATMAN), a novel image-segmentation technique based on Bayesian statistics that characterizes any astronomical data set containing spatial information and performs a tessellation based on the measurements and errors provided as input. The algorithm iteratively merges spatial elements as long as they are statistically consistent with carrying the same information (I.e. identical signal within the errors). We illustrate its operation and performance with a set of test cases including both synthetic and real integral-field spectroscopic data. The output segmentations adapt to the underlying spatial structure, regardless of its morphology and/or the statistical properties of the noise. The quality of the recovered signal represents an improvement with respect to the input, especially in regions with low signal-to-noise ratio. However, the algorithm may be sensitive to small-scale random fluctuations, and its performance in presence of spatial gradients is limited. Due to these effects, errors may be underestimated by as much as a factor of 2. Our analysis reveals that the algorithm prioritizes conservation of all the statistically significant information over noise reduction, and that the precise choice of the input data has a crucial impact on the results. Hence, the philosophy of BaTMAn is not to be used as a 'black box' to improve the signal-to-noise ratio, but as a new approach to characterize spatially resolved data prior to its analysis. The source code is publicly available at http://astro.ft.uam.es/SELGIFS/BaTMAn.

  18. Demonstration of Wavelet Techniques in the Spectral Analysis of Bypass Transition Data

    NASA Technical Reports Server (NTRS)

    Lewalle, Jacques; Ashpis, David E.; Sohn, Ki-Hyeon

    1997-01-01

    A number of wavelet-based techniques for the analysis of experimental data are developed and illustrated. A multiscale analysis based on the Mexican hat wavelet is demonstrated as a tool for acquiring physical and quantitative information not obtainable by standard signal analysis methods. Experimental data for the analysis came from simultaneous hot-wire velocity traces in a bypass transition of the boundary layer on a heated flat plate. A pair of traces (two components of velocity) at one location was excerpted. A number of ensemble and conditional statistics related to dominant time scales for energy and momentum transport were calculated. The analysis revealed a lack of energy-dominant time scales inside turbulent spots but identified transport-dominant scales inside spots that account for the largest part of the Reynolds stress. Momentum transport was much more intermittent than were energetic fluctuations. This work is the first step in a continuing study of the spatial evolution of these scale-related statistics, the goal being to apply the multiscale analysis results to improve the modeling of transitional and turbulent industrial flows.

  19. Daniel Goodman’s empirical approach to Bayesian statistics

    USGS Publications Warehouse

    Gerrodette, Tim; Ward, Eric; Taylor, Rebecca L.; Schwarz, Lisa K.; Eguchi, Tomoharu; Wade, Paul; Himes Boor, Gina

    2016-01-01

    Bayesian statistics, in contrast to classical statistics, uses probability to represent uncertainty about the state of knowledge. Bayesian statistics has often been associated with the idea that knowledge is subjective and that a probability distribution represents a personal degree of belief. Dr. Daniel Goodman considered this viewpoint problematic for issues of public policy. He sought to ground his Bayesian approach in data, and advocated the construction of a prior as an empirical histogram of “similar” cases. In this way, the posterior distribution that results from a Bayesian analysis combined comparable previous data with case-specific current data, using Bayes’ formula. Goodman championed such a data-based approach, but he acknowledged that it was difficult in practice. If based on a true representation of our knowledge and uncertainty, Goodman argued that risk assessment and decision-making could be an exact science, despite the uncertainties. In his view, Bayesian statistics is a critical component of this science because a Bayesian analysis produces the probabilities of future outcomes. Indeed, Goodman maintained that the Bayesian machinery, following the rules of conditional probability, offered the best legitimate inference from available data. We give an example of an informative prior in a recent study of Steller sea lion spatial use patterns in Alaska.

  20. Local sensitivity analysis for inverse problems solved by singular value decomposition

    USGS Publications Warehouse

    Hill, M.C.; Nolan, B.T.

    2010-01-01

    Local sensitivity analysis provides computationally frugal ways to evaluate models commonly used for resource management, risk assessment, and so on. This includes diagnosing inverse model convergence problems caused by parameter insensitivity and(or) parameter interdependence (correlation), understanding what aspects of the model and data contribute to measures of uncertainty, and identifying new data likely to reduce model uncertainty. Here, we consider sensitivity statistics relevant to models in which the process model parameters are transformed using singular value decomposition (SVD) to create SVD parameters for model calibration. The statistics considered include the PEST identifiability statistic, and combined use of the process-model parameter statistics composite scaled sensitivities and parameter correlation coefficients (CSS and PCC). The statistics are complimentary in that the identifiability statistic integrates the effects of parameter sensitivity and interdependence, while CSS and PCC provide individual measures of sensitivity and interdependence. PCC quantifies correlations between pairs or larger sets of parameters; when a set of parameters is intercorrelated, the absolute value of PCC is close to 1.00 for all pairs in the set. The number of singular vectors to include in the calculation of the identifiability statistic is somewhat subjective and influences the statistic. To demonstrate the statistics, we use the USDA’s Root Zone Water Quality Model to simulate nitrogen fate and transport in the unsaturated zone of the Merced River Basin, CA. There are 16 log-transformed process-model parameters, including water content at field capacity (WFC) and bulk density (BD) for each of five soil layers. Calibration data consisted of 1,670 observations comprising soil moisture, soil water tension, aqueous nitrate and bromide concentrations, soil nitrate concentration, and organic matter content. All 16 of the SVD parameters could be estimated by regression based on the range of singular values. Identifiability statistic results varied based on the number of SVD parameters included. Identifiability statistics calculated for four SVD parameters indicate the same three most important process-model parameters as CSS/PCC (WFC1, WFC2, and BD2), but the order differed. Additionally, the identifiability statistic showed that BD1 was almost as dominant as WFC1. The CSS/PCC analysis showed that this results from its high correlation with WCF1 (-0.94), and not its individual sensitivity. Such distinctions, combined with analysis of how high correlations and(or) sensitivities result from the constructed model, can produce important insights into, for example, the use of sensitivity analysis to design monitoring networks. In conclusion, the statistics considered identified similar important parameters. They differ because (1) with CSS/PCC can be more awkward because sensitivity and interdependence are considered separately and (2) identifiability requires consideration of how many SVD parameters to include. A continuing challenge is to understand how these computationally efficient methods compare with computationally demanding global methods like Markov-Chain Monte Carlo given common nonlinear processes and the often even more nonlinear models.

  1. The Social Construction of "Evidence-Based" Drug Prevention Programs: A Reanalysis of Data from the Drug Abuse Resistance Education (DARE) Program

    ERIC Educational Resources Information Center

    Gorman, Dennis M.; Huber, J. Charles, Jr.

    2009-01-01

    This study explores the possibility that any drug prevention program might be considered "evidence-based" given the use of data analysis procedures that optimize the chance of producing statistically significant results by reanalyzing data from a Drug Abuse Resistance Education (DARE) program evaluation. The analysis produced a number of…

  2. Automatic Generation of Algorithms for the Statistical Analysis of Planetary Nebulae Images

    NASA Technical Reports Server (NTRS)

    Fischer, Bernd

    2004-01-01

    Analyzing data sets collected in experiments or by observations is a Core scientific activity. Typically, experimentd and observational data are &aught with uncertainty, and the analysis is based on a statistical model of the conjectured underlying processes, The large data volumes collected by modern instruments make computer support indispensible for this. Consequently, scientists spend significant amounts of their time with the development and refinement of the data analysis programs. AutoBayes [GF+02, FS03] is a fully automatic synthesis system for generating statistical data analysis programs. Externally, it looks like a compiler: it takes an abstract problem specification and translates it into executable code. Its input is a concise description of a data analysis problem in the form of a statistical model as shown in Figure 1; its output is optimized and fully documented C/C++ code which can be linked dynamically into the Matlab and Octave environments. Internally, however, it is quite different: AutoBayes derives a customized algorithm implementing the given model using a schema-based process, and then further refines and optimizes the algorithm into code. A schema is a parameterized code template with associated semantic constraints which define and restrict the template s applicability. The schema parameters are instantiated in a problem-specific way during synthesis as AutoBayes checks the constraints against the original model or, recursively, against emerging sub-problems. AutoBayes schema library contains problem decomposition operators (which are justified by theorems in a formal logic in the domain of Bayesian networks) as well as machine learning algorithms (e.g., EM, k-Means) and nu- meric optimization methods (e.g., Nelder-Mead simplex, conjugate gradient). AutoBayes augments this schema-based approach by symbolic computation to derive closed-form solutions whenever possible. This is a major advantage over other statistical data analysis systems which use numerical approximations even in cases where closed-form solutions exist. AutoBayes is implemented in Prolog and comprises approximately 75.000 lines of code. In this paper, we take one typical scientific data analysis problem-analyzing planetary nebulae images taken by the Hubble Space Telescope-and show how AutoBayes can be used to automate the implementation of the necessary anal- ysis programs. We initially follow the analysis described by Knuth and Hajian [KHO2] and use AutoBayes to derive code for the published models. We show the details of the code derivation process, including the symbolic computations and automatic integration of library procedures, and compare the results of the automatically generated and manually implemented code. We then go beyond the original analysis and use AutoBayes to derive code for a simple image segmentation procedure based on a mixture model which can be used to automate a manual preproceesing step. Finally, we combine the original approach with the simple segmentation which yields a more detailed analysis. This also demonstrates that AutoBayes makes it easy to combine different aspects of data analysis.

  3. [Study on ecological suitability regionalization of Eucommia ulmoides in Guizhou].

    PubMed

    Kang, Chuan-Zhi; Wang, Qing-Qing; Zhou, Tao; Jiang, Wei-Ke; Xiao, Cheng-Hong; Xie, Yu

    2014-05-01

    To study the ecological suitability regionalization of Eucommia ulmoides, for selecting artificial planting base and high-quality industrial raw material purchase area of the herb in Guizhou. Based on the investigation of 14 Eucommia ulmoides producing areas, pinoresinol diglucoside content and ecological factors were obtained. Using spatial analysis method to carry on ecological suitability regionalization. Meanwhile, combining pinoresinol diglucoside content, the correlation of major active components and environmental factors were analyzed by statistical analysis. The most suitability planting area of Eucommia ulmoides was the northwest of Guizhou. The distribution of Eucommia ulmoides was mainly affected by the type and pH value of soil, and monthly precipitation. The spatial structure of major active components in Eucommia ulmoides were randomly distributed in global space, but had only one aggregation point which had a high positive correlation in local space. The major active components of Eucommia ulmoides had no correlation with altitude, longitude or latitude. Using the spatial analysis method and statistical analysis method, based on environmental factor and pinoresinol diglucoside content, the ecological suitability regionalization of Eucommia ulmoides can provide reference for the selection of suitable planting area, artificial planting base and directing production layout.

  4. OPATs: Omnibus P-value association tests.

    PubMed

    Chen, Chia-Wei; Yang, Hsin-Chou

    2017-07-10

    Combining statistical significances (P-values) from a set of single-locus association tests in genome-wide association studies is a proof-of-principle method for identifying disease-associated genomic segments, functional genes and biological pathways. We review P-value combinations for genome-wide association studies and introduce an integrated analysis tool, Omnibus P-value Association Tests (OPATs), which provides popular analysis methods of P-value combinations. The software OPATs programmed in R and R graphical user interface features a user-friendly interface. In addition to analysis modules for data quality control and single-locus association tests, OPATs provides three types of set-based association test: window-, gene- and biopathway-based association tests. P-value combinations with or without threshold and rank truncation are provided. The significance of a set-based association test is evaluated by using resampling procedures. Performance of the set-based association tests in OPATs has been evaluated by simulation studies and real data analyses. These set-based association tests help boost the statistical power, alleviate the multiple-testing problem, reduce the impact of genetic heterogeneity, increase the replication efficiency of association tests and facilitate the interpretation of association signals by streamlining the testing procedures and integrating the genetic effects of multiple variants in genomic regions of biological relevance. In summary, P-value combinations facilitate the identification of marker sets associated with disease susceptibility and uncover missing heritability in association studies, thereby establishing a foundation for the genetic dissection of complex diseases and traits. OPATs provides an easy-to-use and statistically powerful analysis tool for P-value combinations. OPATs, examples, and user guide can be downloaded from http://www.stat.sinica.edu.tw/hsinchou/genetics/association/OPATs.htm. © The Author 2017. Published by Oxford University Press.

  5. Capturing farm diversity with hypothesis-based typologies: An innovative methodological framework for farming system typology development

    PubMed Central

    Alvarez, Stéphanie; Timler, Carl J.; Michalscheck, Mirja; Paas, Wim; Descheemaeker, Katrien; Tittonell, Pablo; Andersson, Jens A.; Groot, Jeroen C. J.

    2018-01-01

    Creating typologies is a way to summarize the large heterogeneity of smallholder farming systems into a few farm types. Various methods exist, commonly using statistical analysis, to create these typologies. We demonstrate that the methodological decisions on data collection, variable selection, data-reduction and clustering techniques can bear a large impact on the typology results. We illustrate the effects of analysing the diversity from different angles, using different typology objectives and different hypotheses, on typology creation by using an example from Zambia’s Eastern Province. Five separate typologies were created with principal component analysis (PCA) and hierarchical clustering analysis (HCA), based on three different expert-informed hypotheses. The greatest overlap between typologies was observed for the larger, wealthier farm types but for the remainder of the farms there were no clear overlaps between typologies. Based on these results, we argue that the typology development should be guided by a hypothesis on the local agriculture features and the drivers and mechanisms of differentiation among farming systems, such as biophysical and socio-economic conditions. That hypothesis is based both on the typology objective and on prior expert knowledge and theories of the farm diversity in the study area. We present a methodological framework that aims to integrate participatory and statistical methods for hypothesis-based typology construction. This is an iterative process whereby the results of the statistical analysis are compared with the reality of the target population as hypothesized by the local experts. Using a well-defined hypothesis and the presented methodological framework, which consolidates the hypothesis through local expert knowledge for the creation of typologies, warrants development of less subjective and more contextualized quantitative farm typologies. PMID:29763422

  6. Capturing farm diversity with hypothesis-based typologies: An innovative methodological framework for farming system typology development.

    PubMed

    Alvarez, Stéphanie; Timler, Carl J; Michalscheck, Mirja; Paas, Wim; Descheemaeker, Katrien; Tittonell, Pablo; Andersson, Jens A; Groot, Jeroen C J

    2018-01-01

    Creating typologies is a way to summarize the large heterogeneity of smallholder farming systems into a few farm types. Various methods exist, commonly using statistical analysis, to create these typologies. We demonstrate that the methodological decisions on data collection, variable selection, data-reduction and clustering techniques can bear a large impact on the typology results. We illustrate the effects of analysing the diversity from different angles, using different typology objectives and different hypotheses, on typology creation by using an example from Zambia's Eastern Province. Five separate typologies were created with principal component analysis (PCA) and hierarchical clustering analysis (HCA), based on three different expert-informed hypotheses. The greatest overlap between typologies was observed for the larger, wealthier farm types but for the remainder of the farms there were no clear overlaps between typologies. Based on these results, we argue that the typology development should be guided by a hypothesis on the local agriculture features and the drivers and mechanisms of differentiation among farming systems, such as biophysical and socio-economic conditions. That hypothesis is based both on the typology objective and on prior expert knowledge and theories of the farm diversity in the study area. We present a methodological framework that aims to integrate participatory and statistical methods for hypothesis-based typology construction. This is an iterative process whereby the results of the statistical analysis are compared with the reality of the target population as hypothesized by the local experts. Using a well-defined hypothesis and the presented methodological framework, which consolidates the hypothesis through local expert knowledge for the creation of typologies, warrants development of less subjective and more contextualized quantitative farm typologies.

  7. Choice-Based Conjoint Analysis: Classification vs. Discrete Choice Models

    NASA Astrophysics Data System (ADS)

    Giesen, Joachim; Mueller, Klaus; Taneva, Bilyana; Zolliker, Peter

    Conjoint analysis is a family of techniques that originated in psychology and later became popular in market research. The main objective of conjoint analysis is to measure an individual's or a population's preferences on a class of options that can be described by parameters and their levels. We consider preference data obtained in choice-based conjoint analysis studies, where one observes test persons' choices on small subsets of the options. There are many ways to analyze choice-based conjoint analysis data. Here we discuss the intuition behind a classification based approach, and compare this approach to one based on statistical assumptions (discrete choice models) and to a regression approach. Our comparison on real and synthetic data indicates that the classification approach outperforms the discrete choice models.

  8. Quality control analysis : part II : soil and aggregate base course.

    DOT National Transportation Integrated Search

    1966-07-01

    This is the second of the three reports on the quality control analysis of highway construction materials. : It deals with the statistical evaluation of results from several construction projects to determine the basic pattern of variability with res...

  9. From fields to objects: A review of geographic boundary analysis

    NASA Astrophysics Data System (ADS)

    Jacquez, G. M.; Maruca, S.; Fortin, M.-J.

    Geographic boundary analysis is a relatively new approach unfamiliar to many spatial analysts. It is best viewed as a technique for defining objects - geographic boundaries - on spatial fields, and for evaluating the statistical significance of characteristics of those boundary objects. This is accomplished using null spatial models representative of the spatial processes expected in the absence of boundary-generating phenomena. Close ties to the object-field dialectic eminently suit boundary analysis to GIS data. The majority of existing spatial methods are field-based in that they describe, estimate, or predict how attributes (variables defining the field) vary through geographic space. Such methods are appropriate for field representations but not object representations. As the object-field paradigm gains currency in geographic information science, appropriate techniques for the statistical analysis of objects are required. The methods reviewed in this paper are a promising foundation. Geographic boundary analysis is clearly a valuable addition to the spatial statistical toolbox. This paper presents the philosophy of, and motivations for geographic boundary analysis. It defines commonly used statistics for quantifying boundaries and their characteristics, as well as simulation procedures for evaluating their significance. We review applications of these techniques, with the objective of making this promising approach accessible to the GIS-spatial analysis community. We also describe the implementation of these methods within geographic boundary analysis software: GEM.

  10. SOCR Analyses - an Instructional Java Web-based Statistical Analysis Toolkit.

    PubMed

    Chu, Annie; Cui, Jenny; Dinov, Ivo D

    2009-03-01

    The Statistical Online Computational Resource (SOCR) designs web-based tools for educational use in a variety of undergraduate courses (Dinov 2006). Several studies have demonstrated that these resources significantly improve students' motivation and learning experiences (Dinov et al. 2008). SOCR Analyses is a new component that concentrates on data modeling and analysis using parametric and non-parametric techniques supported with graphical model diagnostics. Currently implemented analyses include commonly used models in undergraduate statistics courses like linear models (Simple Linear Regression, Multiple Linear Regression, One-Way and Two-Way ANOVA). In addition, we implemented tests for sample comparisons, such as t-test in the parametric category; and Wilcoxon rank sum test, Kruskal-Wallis test, Friedman's test, in the non-parametric category. SOCR Analyses also include several hypothesis test models, such as Contingency tables, Friedman's test and Fisher's exact test.The code itself is open source (http://socr.googlecode.com/), hoping to contribute to the efforts of the statistical computing community. The code includes functionality for each specific analysis model and it has general utilities that can be applied in various statistical computing tasks. For example, concrete methods with API (Application Programming Interface) have been implemented in statistical summary, least square solutions of general linear models, rank calculations, etc. HTML interfaces, tutorials, source code, activities, and data are freely available via the web (www.SOCR.ucla.edu). Code examples for developers and demos for educators are provided on the SOCR Wiki website.In this article, the pedagogical utilization of the SOCR Analyses is discussed, as well as the underlying design framework. As the SOCR project is on-going and more functions and tools are being added to it, these resources are constantly improved. The reader is strongly encouraged to check the SOCR site for most updated information and newly added models.

  11. Landscape of Research Areas for Zeolites and Metal-Organic Frameworks Using Computational Classification Based on Citation Networks.

    PubMed

    Ogawa, Takaya; Iyoki, Kenta; Fukushima, Tomohiro; Kajikawa, Yuya

    2017-12-14

    The field of porous materials is widely spreading nowadays, and researchers need to read tremendous numbers of papers to obtain a "bird's eye" view of a given research area. However, it is difficult for researchers to obtain an objective database based on statistical data without any relation to subjective knowledge related to individual research interests. Here, citation network analysis was applied for a comparative analysis of the research areas for zeolites and metal-organic frameworks as examples for porous materials. The statistical and objective data contributed to the analysis of: (1) the computational screening of research areas; (2) classification of research stages to a certain domain; (3) "well-cited" research areas; and (4) research area preferences of specific countries. Moreover, we proposed a methodology to assist researchers to gain potential research ideas by reviewing related research areas, which is based on the detection of unfocused ideas in one area but focused in the other area by a bibliometric approach.

  12. Landscape of Research Areas for Zeolites and Metal-Organic Frameworks Using Computational Classification Based on Citation Networks

    PubMed Central

    Ogawa, Takaya; Fukushima, Tomohiro; Kajikawa, Yuya

    2017-01-01

    The field of porous materials is widely spreading nowadays, and researchers need to read tremendous numbers of papers to obtain a “bird’s eye” view of a given research area. However, it is difficult for researchers to obtain an objective database based on statistical data without any relation to subjective knowledge related to individual research interests. Here, citation network analysis was applied for a comparative analysis of the research areas for zeolites and metal-organic frameworks as examples for porous materials. The statistical and objective data contributed to the analysis of: (1) the computational screening of research areas; (2) classification of research stages to a certain domain; (3) “well-cited” research areas; and (4) research area preferences of specific countries. Moreover, we proposed a methodology to assist researchers to gain potential research ideas by reviewing related research areas, which is based on the detection of unfocused ideas in one area but focused in the other area by a bibliometric approach. PMID:29240708

  13. Studies of oceanic tectonics based on GEOS-3 satellite altimetry

    NASA Technical Reports Server (NTRS)

    Poehls, K. A.; Kaula, W. M.; Schubert, G.; Sandwell, D.

    1979-01-01

    Using statistical analysis, geoidal admittance (the relationship between the ocean geoid and seafloor topography) obtained from GEOS-3 altimetry was compared to various model admittances. Analysis of several altimetry tracks in the Pacific Ocean demonstrated a low coherence between altimetry and seafloor topography except where the track crosses active or recent tectonic features. However, global statistical studies using the much larger data base of all available gravimetry showed a positive correlation of oceanic gravity with topography. The oceanic lithosphere was modeled by simultaneously inverting surface wave dispersion, topography, and gravity data. Efforts to incorporate geoid data into the inversion showed that the base of the subchannel can be better resolved with geoid rather than gravity data. Thermomechanical models of seafloor spreading taking into account differing plate velocities, heat source distributions, and rock rheologies were discussed.

  14. 1H NMR-Based Metabolomic Analysis of Sub-Lethal Perfluorooctane Sulfonate Exposure to the Earthworm, Eisenia fetida, in Soil

    PubMed Central

    Lankadurai, Brian P.; Furdui, Vasile I.; Reiner, Eric J.; Simpson, André J.; Simpson, Myrna J.

    2013-01-01

    1H NMR-based metabolomics was used to measure the response of Eisenia fetida earthworms after exposure to sub-lethal concentrations of perfluorooctane sulfonate (PFOS) in soil. Earthworms were exposed to a range of PFOS concentrations (five, 10, 25, 50, 100 or 150 mg/kg) for two, seven and fourteen days. Earthworm tissues were extracted and analyzed by 1H NMR. Multivariate statistical analysis of the metabolic response of E. fetida to PFOS exposure identified time-dependent responses that were comprised of two separate modes of action: a non-polar narcosis type mechanism after two days of exposure and increased fatty acid oxidation after seven and fourteen days of exposure. Univariate statistical analysis revealed that 2-hexyl-5-ethyl-3-furansulfonate (HEFS), betaine, leucine, arginine, glutamate, maltose and ATP are potential indicators of PFOS exposure, as the concentrations of these metabolites fluctuated significantly. Overall, NMR-based metabolomic analysis suggests elevated fatty acid oxidation, disruption in energy metabolism and biological membrane structure and a possible interruption of ATP synthesis. These conclusions obtained from analysis of the metabolic profile in response to sub-lethal PFOS exposure indicates that NMR-based metabolomics is an excellent discovery tool when the mode of action (MOA) of contaminants is not clearly defined. PMID:24958147

  15. Development of a funding, cost, and spending model for satellite projects

    NASA Technical Reports Server (NTRS)

    Johnson, Jesse P.

    1989-01-01

    The need for a predictive budget/funging model is obvious. The current models used by the Resource Analysis Office (RAO) are used to predict the total costs of satellite projects. An effort to extend the modeling capabilities from total budget analysis to total budget and budget outlays over time analysis was conducted. A statistical based and data driven methodology was used to derive and develop the model. Th budget data for the last 18 GSFC-sponsored satellite projects were analyzed and used to build a funding model which would describe the historical spending patterns. This raw data consisted of dollars spent in that specific year and their 1989 dollar equivalent. This data was converted to the standard format used by the RAO group and placed in a database. A simple statistical analysis was performed to calculate the gross statistics associated with project length and project cost ant the conditional statistics on project length and project cost. The modeling approach used is derived form the theory of embedded statistics which states that properly analyzed data will produce the underlying generating function. The process of funding large scale projects over extended periods of time is described by Life Cycle Cost Models (LCCM). The data was analyzed to find a model in the generic form of a LCCM. The model developed is based on a Weibull function whose parameters are found by both nonlinear optimization and nonlinear regression. In order to use this model it is necessary to transform the problem from a dollar/time space to a percentage of total budget/time space. This transformation is equivalent to moving to a probability space. By using the basic rules of probability, the validity of both the optimization and the regression steps are insured. This statistically significant model is then integrated and inverted. The resulting output represents a project schedule which relates the amount of money spent to the percentage of project completion.

  16. Biostatistics primer: part I.

    PubMed

    Overholser, Brian R; Sowinski, Kevin M

    2007-12-01

    Biostatistics is the application of statistics to biologic data. The field of statistics can be broken down into 2 fundamental parts: descriptive and inferential. Descriptive statistics are commonly used to categorize, display, and summarize data. Inferential statistics can be used to make predictions based on a sample obtained from a population or some large body of information. It is these inferences that are used to test specific research hypotheses. This 2-part review will outline important features of descriptive and inferential statistics as they apply to commonly conducted research studies in the biomedical literature. Part 1 in this issue will discuss fundamental topics of statistics and data analysis. Additionally, some of the most commonly used statistical tests found in the biomedical literature will be reviewed in Part 2 in the February 2008 issue.

  17. Statistical dependency in visual scanning

    NASA Technical Reports Server (NTRS)

    Ellis, Stephen R.; Stark, Lawrence

    1986-01-01

    A method to identify statistical dependencies in the positions of eye fixations is developed and applied to eye movement data from subjects who viewed dynamic displays of air traffic and judged future relative position of aircraft. Analysis of approximately 23,000 fixations on points of interest on the display identified statistical dependencies in scanning that were independent of the physical placement of the points of interest. Identification of these dependencies is inconsistent with random-sampling-based theories used to model visual search and information seeking.

  18. The l z ( p ) * Person-Fit Statistic in an Unfolding Model Context.

    PubMed

    Tendeiro, Jorge N

    2017-01-01

    Although person-fit analysis has a long-standing tradition within item response theory, it has been applied in combination with dominance response models almost exclusively. In this article, a popular log likelihood-based parametric person-fit statistic under the framework of the generalized graded unfolding model is used. Results from a simulation study indicate that the person-fit statistic performed relatively well in detecting midpoint response style patterns and not so well in detecting extreme response style patterns.

  19. Bayesian networks and statistical analysis application to analyze the diagnostic test accuracy

    NASA Astrophysics Data System (ADS)

    Orzechowski, P.; Makal, Jaroslaw; Onisko, A.

    2005-02-01

    The computer aided BPH diagnosis system based on Bayesian network is described in the paper. First result are compared to a given statistical method. Different statistical methods are used successfully in medicine for years. However, the undoubted advantages of probabilistic methods make them useful in application in newly created systems which are frequent in medicine, but do not have full and competent knowledge. The article presents advantages of the computer aided BPH diagnosis system in clinical practice for urologists.

  20. Dangerous "spin": the probability myth of evidence-based prescribing - a Merleau-Pontyian approach.

    PubMed

    Morstyn, Ron

    2011-08-01

    The aim of this study was to examine logical positivist statistical probability statements used to support and justify "evidence-based" prescribing rules in psychiatry when viewed from the major philosophical theories of probability, and to propose "phenomenological probability" based on Maurice Merleau-Ponty's philosophy of "phenomenological positivism" as a better clinical and ethical basis for psychiatric prescribing. The logical positivist statistical probability statements which are currently used to support "evidence-based" prescribing rules in psychiatry have little clinical or ethical justification when subjected to critical analysis from any of the major theories of probability and represent dangerous "spin" because they necessarily exclude the individual , intersubjective and ambiguous meaning of mental illness. A concept of "phenomenological probability" founded on Merleau-Ponty's philosophy of "phenomenological positivism" overcomes the clinically destructive "objectivist" and "subjectivist" consequences of logical positivist statistical probability and allows psychopharmacological treatments to be appropriately integrated into psychiatric treatment.

  1. Do regional methods really help reduce uncertainties in flood frequency analyses?

    NASA Astrophysics Data System (ADS)

    Cong Nguyen, Chi; Payrastre, Olivier; Gaume, Eric

    2013-04-01

    Flood frequency analyses are often based on continuous measured series at gauge sites. However, the length of the available data sets is usually too short to provide reliable estimates of extreme design floods. To reduce the estimation uncertainties, the analyzed data sets have to be extended either in time, making use of historical and paleoflood data, or in space, merging data sets considered as statistically homogeneous to build large regional data samples. Nevertheless, the advantage of the regional analyses, the important increase of the size of the studied data sets, may be counterbalanced by the possible heterogeneities of the merged sets. The application and comparison of four different flood frequency analysis methods to two regions affected by flash floods in the south of France (Ardèche and Var) illustrates how this balance between the number of records and possible heterogeneities plays in real-world applications. The four tested methods are: (1) a local statistical analysis based on the existing series of measured discharges, (2) a local analysis valuating the existing information on historical floods, (3) a standard regional flood frequency analysis based on existing measured series at gauged sites and (4) a modified regional analysis including estimated extreme peak discharges at ungauged sites. Monte Carlo simulations are conducted to simulate a large number of discharge series with characteristics similar to the observed ones (type of statistical distributions, number of sites and records) to evaluate to which extent the results obtained on these case studies can be generalized. These two case studies indicate that even small statistical heterogeneities, which are not detected by the standard homogeneity tests implemented in regional flood frequency studies, may drastically limit the usefulness of such approaches. On the other hand, these result show that the valuation of information on extreme events, either historical flood events at gauged sites or estimated extremes at ungauged sites in the considered region, is an efficient way to reduce uncertainties in flood frequency studies.

  2. Can Propensity Score Analysis Approximate Randomized Experiments Using Pretest and Demographic Information in Pre-K Intervention Research?

    PubMed

    Dong, Nianbo; Lipsey, Mark W

    2017-01-01

    It is unclear whether propensity score analysis (PSA) based on pretest and demographic covariates will meet the ignorability assumption for replicating the results of randomized experiments. This study applies within-study comparisons to assess whether pre-Kindergarten (pre-K) treatment effects on achievement outcomes estimated using PSA based on a pretest and demographic covariates can approximate those found in a randomized experiment. Data-Four studies with samples of pre-K children each provided data on two math achievement outcome measures with baseline pretests and child demographic variables that included race, gender, age, language spoken at home, and mother's highest education. Research Design and Data Analysis-A randomized study of a pre-K math curriculum provided benchmark estimates of effects on achievement measures. Comparison samples from other pre-K studies were then substituted for the original randomized control and the effects were reestimated using PSA. The correspondence was evaluated using multiple criteria. The effect estimates using PSA were in the same direction as the benchmark estimates, had similar but not identical statistical significance, and did not differ from the benchmarks at statistically significant levels. However, the magnitude of the effect sizes differed and displayed both absolute and relative bias larger than required to show statistical equivalence with formal tests, but those results were not definitive because of the limited statistical power. We conclude that treatment effect estimates based on a single pretest and demographic covariates in PSA correspond to those from a randomized experiment on the most general criteria for equivalence.

  3. An integrated workflow for robust alignment and simplified quantitative analysis of NMR spectrometry data.

    PubMed

    Vu, Trung N; Valkenborg, Dirk; Smets, Koen; Verwaest, Kim A; Dommisse, Roger; Lemière, Filip; Verschoren, Alain; Goethals, Bart; Laukens, Kris

    2011-10-20

    Nuclear magnetic resonance spectroscopy (NMR) is a powerful technique to reveal and compare quantitative metabolic profiles of biological tissues. However, chemical and physical sample variations make the analysis of the data challenging, and typically require the application of a number of preprocessing steps prior to data interpretation. For example, noise reduction, normalization, baseline correction, peak picking, spectrum alignment and statistical analysis are indispensable components in any NMR analysis pipeline. We introduce a novel suite of informatics tools for the quantitative analysis of NMR metabolomic profile data. The core of the processing cascade is a novel peak alignment algorithm, called hierarchical Cluster-based Peak Alignment (CluPA). The algorithm aligns a target spectrum to the reference spectrum in a top-down fashion by building a hierarchical cluster tree from peak lists of reference and target spectra and then dividing the spectra into smaller segments based on the most distant clusters of the tree. To reduce the computational time to estimate the spectral misalignment, the method makes use of Fast Fourier Transformation (FFT) cross-correlation. Since the method returns a high-quality alignment, we can propose a simple methodology to study the variability of the NMR spectra. For each aligned NMR data point the ratio of the between-group and within-group sum of squares (BW-ratio) is calculated to quantify the difference in variability between and within predefined groups of NMR spectra. This differential analysis is related to the calculation of the F-statistic or a one-way ANOVA, but without distributional assumptions. Statistical inference based on the BW-ratio is achieved by bootstrapping the null distribution from the experimental data. The workflow performance was evaluated using a previously published dataset. Correlation maps, spectral and grey scale plots show clear improvements in comparison to other methods, and the down-to-earth quantitative analysis works well for the CluPA-aligned spectra. The whole workflow is embedded into a modular and statistically sound framework that is implemented as an R package called "speaq" ("spectrum alignment and quantitation"), which is freely available from http://code.google.com/p/speaq/.

  4. Applications of "Integrated Data Viewer'' (IDV) in the classroom

    NASA Astrophysics Data System (ADS)

    Nogueira, R.; Cutrim, E. M.

    2006-06-01

    Conventionally, weather products utilized in synoptic meteorology reduce phenomena occurring in four dimensions to a 2-dimensional form. This constitutes a road-block for non-atmospheric-science majors who need to take meteorology as a non-mathematical and complementary course to their major programs. This research examines the use of Integrated Data Viewer-IDV as a teaching tool, as it allows a 4-dimensional representation of weather products. IDV was tested in the teaching of synoptic meteorology, weather analysis, and weather map interpretation to non-science students in the laboratory sessions of an introductory meteorology class at Western Michigan University. Comparison of student exam scores according to the laboratory teaching techniques, i.e., traditional lab manual and IDV was performed for short- and long-term learning. Results of the statistical analysis show that the Fall 2004 students in the IDV-based lab session retained learning. However, in the Spring 2005 the exam scores did not reflect retention in learning when compared with IDV-based and MANUAL-based lab scores (short term learning, i.e., exam taken one week after the lab exercise). Testing the long-term learning, seven weeks between the two exams in the Spring 2005, show no statistically significant difference between IDV-based group scores and MANUAL-based group scores. However, the IDV group obtained exam score average slightly higher than the MANUAL group. Statistical testing of the principal hypothesis in this study, leads to the conclusion that the IDV-based method did not prove to be a better teaching tool than the traditional paper-based method. Future studies could potentially find significant differences in the effectiveness of both manual and IDV methods if the conditions had been more controlled. That is, students in the control group should not be exposed to the weather analysis using IDV during lecture.

  5. New robust statistical procedures for the polytomous logistic regression models.

    PubMed

    Castilla, Elena; Ghosh, Abhik; Martin, Nirian; Pardo, Leandro

    2018-05-17

    This article derives a new family of estimators, namely the minimum density power divergence estimators, as a robust generalization of the maximum likelihood estimator for the polytomous logistic regression model. Based on these estimators, a family of Wald-type test statistics for linear hypotheses is introduced. Robustness properties of both the proposed estimators and the test statistics are theoretically studied through the classical influence function analysis. Appropriate real life examples are presented to justify the requirement of suitable robust statistical procedures in place of the likelihood based inference for the polytomous logistic regression model. The validity of the theoretical results established in the article are further confirmed empirically through suitable simulation studies. Finally, an approach for the data-driven selection of the robustness tuning parameter is proposed with empirical justifications. © 2018, The International Biometric Society.

  6. Planck 2015 results. XVI. Isotropy and statistics of the CMB

    NASA Astrophysics Data System (ADS)

    Planck Collaboration; Ade, P. A. R.; Aghanim, N.; Akrami, Y.; Aluri, P. K.; Arnaud, M.; Ashdown, M.; Aumont, J.; Baccigalupi, C.; Banday, A. J.; Barreiro, R. B.; Bartolo, N.; Basak, S.; Battaner, E.; Benabed, K.; Benoît, A.; Benoit-Lévy, A.; Bernard, J.-P.; Bersanelli, M.; Bielewicz, P.; Bock, J. J.; Bonaldi, A.; Bonavera, L.; Bond, J. R.; Borrill, J.; Bouchet, F. R.; Boulanger, F.; Bucher, M.; Burigana, C.; Butler, R. C.; Calabrese, E.; Cardoso, J.-F.; Casaponsa, B.; Catalano, A.; Challinor, A.; Chamballu, A.; Chiang, H. C.; Christensen, P. R.; Church, S.; Clements, D. L.; Colombi, S.; Colombo, L. P. L.; Combet, C.; Contreras, D.; Couchot, F.; Coulais, A.; Crill, B. P.; Cruz, M.; Curto, A.; Cuttaia, F.; Danese, L.; Davies, R. D.; Davis, R. J.; de Bernardis, P.; de Rosa, A.; de Zotti, G.; Delabrouille, J.; Désert, F.-X.; Diego, J. M.; Dole, H.; Donzelli, S.; Doré, O.; Douspis, M.; Ducout, A.; Dupac, X.; Efstathiou, G.; Elsner, F.; Enßlin, T. A.; Eriksen, H. K.; Fantaye, Y.; Fergusson, J.; Fernandez-Cobos, R.; Finelli, F.; Forni, O.; Frailis, M.; Fraisse, A. A.; Franceschi, E.; Frejsel, A.; Frolov, A.; Galeotta, S.; Galli, S.; Ganga, K.; Gauthier, C.; Ghosh, T.; Giard, M.; Giraud-Héraud, Y.; Gjerløw, E.; González-Nuevo, J.; Górski, K. M.; Gratton, S.; Gregorio, A.; Gruppuso, A.; Gudmundsson, J. E.; Hansen, F. K.; Hanson, D.; Harrison, D. L.; Henrot-Versillé, S.; Hernández-Monteagudo, C.; Herranz, D.; Hildebrandt, S. R.; Hivon, E.; Hobson, M.; Holmes, W. A.; Hornstrup, A.; Hovest, W.; Huang, Z.; Huffenberger, K. M.; Hurier, G.; Jaffe, A. H.; Jaffe, T. R.; Jones, W. C.; Juvela, M.; Keihänen, E.; Keskitalo, R.; Kim, J.; Kisner, T. S.; Knoche, J.; Kunz, M.; Kurki-Suonio, H.; Lagache, G.; Lähteenmäki, A.; Lamarre, J.-M.; Lasenby, A.; Lattanzi, M.; Lawrence, C. R.; Leonardi, R.; Lesgourgues, J.; Levrier, F.; Liguori, M.; Lilje, P. B.; Linden-Vørnle, M.; Liu, H.; López-Caniego, M.; Lubin, P. M.; Macías-Pérez, J. F.; Maggio, G.; Maino, D.; Mandolesi, N.; Mangilli, A.; Marinucci, D.; Maris, M.; Martin, P. G.; Martínez-González, E.; Masi, S.; Matarrese, S.; McGehee, P.; Meinhold, P. R.; Melchiorri, A.; Mendes, L.; Mennella, A.; Migliaccio, M.; Mikkelsen, K.; Mitra, S.; Miville-Deschênes, M.-A.; Molinari, D.; Moneti, A.; Montier, L.; Morgante, G.; Mortlock, D.; Moss, A.; Munshi, D.; Murphy, J. A.; Naselsky, P.; Nati, F.; Natoli, P.; Netterfield, C. B.; Nørgaard-Nielsen, H. U.; Noviello, F.; Novikov, D.; Novikov, I.; Oxborrow, C. A.; Paci, F.; Pagano, L.; Pajot, F.; Pant, N.; Paoletti, D.; Pasian, F.; Patanchon, G.; Pearson, T. J.; Perdereau, O.; Perotto, L.; Perrotta, F.; Pettorino, V.; Piacentini, F.; Piat, M.; Pierpaoli, E.; Pietrobon, D.; Plaszczynski, S.; Pointecouteau, E.; Polenta, G.; Popa, L.; Pratt, G. W.; Prézeau, G.; Prunet, S.; Puget, J.-L.; Rachen, J. P.; Rebolo, R.; Reinecke, M.; Remazeilles, M.; Renault, C.; Renzi, A.; Ristorcelli, I.; Rocha, G.; Rosset, C.; Rossetti, M.; Rotti, A.; Roudier, G.; Rubiño-Martín, J. A.; Rusholme, B.; Sandri, M.; Santos, D.; Savelainen, M.; Savini, G.; Scott, D.; Seiffert, M. D.; Shellard, E. P. S.; Souradeep, T.; Spencer, L. D.; Stolyarov, V.; Stompor, R.; Sudiwala, R.; Sunyaev, R.; Sutton, D.; Suur-Uski, A.-S.; Sygnet, J.-F.; Tauber, J. A.; Terenzi, L.; Toffolatti, L.; Tomasi, M.; Tristram, M.; Trombetti, T.; Tucci, M.; Tuovinen, J.; Valenziano, L.; Valiviita, J.; Van Tent, B.; Vielva, P.; Villa, F.; Wade, L. A.; Wandelt, B. D.; Wehus, I. K.; Yvon, D.; Zacchei, A.; Zibin, J. P.; Zonca, A.

    2016-09-01

    We test the statistical isotropy and Gaussianity of the cosmic microwave background (CMB) anisotropies using observations made by the Planck satellite. Our results are based mainly on the full Planck mission for temperature, but also include some polarization measurements. In particular, we consider the CMB anisotropy maps derived from the multi-frequency Planck data by several component-separation methods. For the temperature anisotropies, we find excellent agreement between results based on these sky maps over both a very large fraction of the sky and a broad range of angular scales, establishing that potential foreground residuals do not affect our studies. Tests of skewness, kurtosis, multi-normality, N-point functions, and Minkowski functionals indicate consistency with Gaussianity, while a power deficit at large angular scales is manifested in several ways, for example low map variance. The results of a peak statistics analysis are consistent with the expectations of a Gaussian random field. The "Cold Spot" is detected with several methods, including map kurtosis, peak statistics, and mean temperature profile. We thoroughly probe the large-scale dipolar power asymmetry, detecting it with several independent tests, and address the subject of a posteriori correction. Tests of directionality suggest the presence of angular clustering from large to small scales, but at a significance that is dependent on the details of the approach. We perform the first examination of polarization data, finding the morphology of stacked peaks to be consistent with the expectations of statistically isotropic simulations. Where they overlap, these results are consistent with the Planck 2013 analysis based on the nominal mission data and provide our most thorough view of the statistics of the CMB fluctuations to date.

  7. Planck 2015 results: XVI. Isotropy and statistics of the CMB

    DOE PAGES

    Ade, P. A. R.; Aghanim, N.; Akrami, Y.; ...

    2016-09-20

    In this paper, we test the statistical isotropy and Gaussianity of the cosmic microwave background (CMB) anisotropies using observations made by the Planck satellite. Our results are based mainly on the full Planck mission for temperature, but also include some polarization measurements. In particular, we consider the CMB anisotropy maps derived from the multi-frequency Planck data by several component-separation methods. For the temperature anisotropies, we find excellent agreement between results based on these sky maps over both a very large fraction of the sky and a broad range of angular scales, establishing that potential foreground residuals do not affect ourmore » studies. Tests of skewness, kurtosis, multi-normality, N-point functions, and Minkowski functionals indicate consistency with Gaussianity, while a power deficit at large angular scales is manifested in several ways, for example low map variance. The results of a peak statistics analysis are consistent with the expectations of a Gaussian random field. The “Cold Spot” is detected with several methods, including map kurtosis, peak statistics, and mean temperature profile. We thoroughly probe the large-scale dipolar power asymmetry, detecting it with several independent tests, and address the subject of a posteriori correction. Tests of directionality suggest the presence of angular clustering from large to small scales, but at a significance that is dependent on the details of the approach. We perform the first examination of polarization data, finding the morphology of stacked peaks to be consistent with the expectations of statistically isotropic simulations. Finally, where they overlap, these results are consistent with the Planck 2013 analysis based on the nominal mission data and provide our most thorough view of the statistics of the CMB fluctuations to date.« less

  8. 16 CFR 1000.26 - Directorate for Epidemiology.

    Code of Federal Regulations, 2010 CFR

    2010-01-01

    ... things, incidents associated with consumer products, based on news clips, medical examiner reports, hotline reports, Internet complaints, and referrals. The Hazard Analysis Division conducts statistical...

  9. [Cluster analysis applicability to fitness evaluation of cosmonauts on long-term missions of the International space station].

    PubMed

    Egorov, A D; Stepantsov, V I; Nosovskiĭ, A M; Shipov, A A

    2009-01-01

    Cluster analysis was applied to evaluate locomotion training (running and running intermingled with walking) of 13 cosmonauts on long-term ISS missions by the parameters of duration (min), distance (m) and intensity (km/h). Based on the results of analyses, the cosmonauts were distributed into three steady groups of 2, 5 and 6 persons. Distance and speed showed a statistical rise (p < 0.03) from group 1 to group 3. Duration of physical locomotion training was not statistically different in the groups (p = 0.125). Therefore, cluster analysis is an adequate method of evaluating fitness of cosmonauts on long-term missions.

  10. Spectral Analysis of B Stars: An Application of Bayesian Statistics

    NASA Astrophysics Data System (ADS)

    Mugnes, J.-M.; Robert, C.

    2012-12-01

    To better understand the processes involved in stellar physics, it is necessary to obtain accurate stellar parameters (effective temperature, surface gravity, abundances…). Spectral analysis is a powerful tool for investigating stars, but it is also vital to reduce uncertainties at a decent computational cost. Here we present a spectral analysis method based on a combination of Bayesian statistics and grids of synthetic spectra obtained with TLUSTY. This method simultaneously constrains the stellar parameters by using all the lines accessible in observed spectra and thus greatly reduces uncertainties and improves the overall spectrum fitting. Preliminary results are shown using spectra from the Observatoire du Mont-Mégantic.

  11. Statistical design of quantitative mass spectrometry-based proteomic experiments.

    PubMed

    Oberg, Ann L; Vitek, Olga

    2009-05-01

    We review the fundamental principles of statistical experimental design, and their application to quantitative mass spectrometry-based proteomics. We focus on class comparison using Analysis of Variance (ANOVA), and discuss how randomization, replication and blocking help avoid systematic biases due to the experimental procedure, and help optimize our ability to detect true quantitative changes between groups. We also discuss the issues of pooling multiple biological specimens for a single mass analysis, and calculation of the number of replicates in a future study. When applicable, we emphasize the parallels between designing quantitative proteomic experiments and experiments with gene expression microarrays, and give examples from that area of research. We illustrate the discussion using theoretical considerations, and using real-data examples of profiling of disease.

  12. Applying quantitative adiposity feature analysis models to predict benefit of bevacizumab-based chemotherapy in ovarian cancer patients

    NASA Astrophysics Data System (ADS)

    Wang, Yunzhi; Qiu, Yuchen; Thai, Theresa; More, Kathleen; Ding, Kai; Liu, Hong; Zheng, Bin

    2016-03-01

    How to rationally identify epithelial ovarian cancer (EOC) patients who will benefit from bevacizumab or other antiangiogenic therapies is a critical issue in EOC treatments. The motivation of this study is to quantitatively measure adiposity features from CT images and investigate the feasibility of predicting potential benefit of EOC patients with or without receiving bevacizumab-based chemotherapy treatment using multivariate statistical models built based on quantitative adiposity image features. A dataset involving CT images from 59 advanced EOC patients were included. Among them, 32 patients received maintenance bevacizumab after primary chemotherapy and the remaining 27 patients did not. We developed a computer-aided detection (CAD) scheme to automatically segment subcutaneous fat areas (VFA) and visceral fat areas (SFA) and then extracted 7 adiposity-related quantitative features. Three multivariate data analysis models (linear regression, logistic regression and Cox proportional hazards regression) were performed respectively to investigate the potential association between the model-generated prediction results and the patients' progression-free survival (PFS) and overall survival (OS). The results show that using all 3 statistical models, a statistically significant association was detected between the model-generated results and both of the two clinical outcomes in the group of patients receiving maintenance bevacizumab (p<0.01), while there were no significant association for both PFS and OS in the group of patients without receiving maintenance bevacizumab. Therefore, this study demonstrated the feasibility of using quantitative adiposity-related CT image features based statistical prediction models to generate a new clinical marker and predict the clinical outcome of EOC patients receiving maintenance bevacizumab-based chemotherapy.

  13. The level crossing rates and associated statistical properties of a random frequency response function

    NASA Astrophysics Data System (ADS)

    Langley, Robin S.

    2018-03-01

    This work is concerned with the statistical properties of the frequency response function of the energy of a random system. Earlier studies have considered the statistical distribution of the function at a single frequency, or alternatively the statistics of a band-average of the function. In contrast the present analysis considers the statistical fluctuations over a frequency band, and results are obtained for the mean rate at which the function crosses a specified level (or equivalently, the average number of times the level is crossed within the band). Results are also obtained for the probability of crossing a specified level at least once, the mean rate of occurrence of peaks, and the mean trough-to-peak height. The analysis is based on the assumption that the natural frequencies and mode shapes of the system have statistical properties that are governed by the Gaussian Orthogonal Ensemble (GOE), and the validity of this assumption is demonstrated by comparison with numerical simulations for a random plate. The work has application to the assessment of the performance of dynamic systems that are sensitive to random imperfections.

  14. A PDF-based classification of gait cadence patterns in patients with amyotrophic lateral sclerosis.

    PubMed

    Wu, Yunfeng; Ng, Sin Chun

    2010-01-01

    Amyotrophic lateral sclerosis (ALS) is a type of neurological disease due to the degeneration of motor neurons. During the course of such a progressive disease, it would be difficult for ALS patients to regulate normal locomotion, so that the gait stability becomes perturbed. This paper presents a pilot statistical study on the gait cadence (or stride interval) in ALS, based on the statistical analysis method. The probability density functions (PDFs) of stride interval were first estimated with the nonparametric Parzen-window method. We computed the mean of the left-foot stride interval and the modified Kullback-Leibler divergence (MKLD) from the PDFs estimated. The analysis results suggested that both of these two statistical parameters were significantly altered in ALS, and the least-squares support vector machine (LS-SVM) may effectively distinguish the stride patterns between the ALS patients and healthy controls, with an accurate rate of 82.8% and an area of 0.87 under the receiver operating characteristic curve.

  15. Multi-scale statistical analysis of coronal solar activity

    DOE PAGES

    Gamborino, Diana; del-Castillo-Negrete, Diego; Martinell, Julio J.

    2016-07-08

    Multi-filter images from the solar corona are used to obtain temperature maps that are analyzed using techniques based on proper orthogonal decomposition (POD) in order to extract dynamical and structural information at various scales. Exploring active regions before and after a solar flare and comparing them with quiet regions, we show that the multi-scale behavior presents distinct statistical properties for each case that can be used to characterize the level of activity in a region. Information about the nature of heat transport is also to be extracted from the analysis.

  16. A statistical method for measuring activation of gene regulatory networks.

    PubMed

    Esteves, Gustavo H; Reis, Luiz F L

    2018-06-13

    Gene expression data analysis is of great importance for modern molecular biology, given our ability to measure the expression profiles of thousands of genes and enabling studies rooted in systems biology. In this work, we propose a simple statistical model for the activation measuring of gene regulatory networks, instead of the traditional gene co-expression networks. We present the mathematical construction of a statistical procedure for testing hypothesis regarding gene regulatory network activation. The real probability distribution for the test statistic is evaluated by a permutation based study. To illustrate the functionality of the proposed methodology, we also present a simple example based on a small hypothetical network and the activation measuring of two KEGG networks, both based on gene expression data collected from gastric and esophageal samples. The two KEGG networks were also analyzed for a public database, available through NCBI-GEO, presented as Supplementary Material. This method was implemented in an R package that is available at the BioConductor project website under the name maigesPack.

  17. The Development of Statistics Textbook Supported with ICT and Portfolio-Based Assessment

    NASA Astrophysics Data System (ADS)

    Hendikawati, Putriaji; Yuni Arini, Florentina

    2016-02-01

    This research was development research that aimed to develop and produce a Statistics textbook model that supported with information and communication technology (ICT) and Portfolio-Based Assessment. This book was designed for students of mathematics at the college to improve students’ ability in mathematical connection and communication. There were three stages in this research i.e. define, design, and develop. The textbooks consisted of 10 chapters which each chapter contains introduction, core materials and include examples and exercises. The textbook developed phase begins with the early stages of designed the book (draft 1) which then validated by experts. Revision of draft 1 produced draft 2 which then limited test for readability test book. Furthermore, revision of draft 2 produced textbook draft 3 which simulated on a small sample to produce a valid model textbook. The data were analysed with descriptive statistics. The analysis showed that the Statistics textbook model that supported with ICT and Portfolio-Based Assessment valid and fill up the criteria of practicality.

  18. Application of survival analysis methodology to the quantitative analysis of LC-MS proteomics data.

    PubMed

    Tekwe, Carmen D; Carroll, Raymond J; Dabney, Alan R

    2012-08-01

    Protein abundance in quantitative proteomics is often based on observed spectral features derived from liquid chromatography mass spectrometry (LC-MS) or LC-MS/MS experiments. Peak intensities are largely non-normal in distribution. Furthermore, LC-MS-based proteomics data frequently have large proportions of missing peak intensities due to censoring mechanisms on low-abundance spectral features. Recognizing that the observed peak intensities detected with the LC-MS method are all positive, skewed and often left-censored, we propose using survival methodology to carry out differential expression analysis of proteins. Various standard statistical techniques including non-parametric tests such as the Kolmogorov-Smirnov and Wilcoxon-Mann-Whitney rank sum tests, and the parametric survival model and accelerated failure time-model with log-normal, log-logistic and Weibull distributions were used to detect any differentially expressed proteins. The statistical operating characteristics of each method are explored using both real and simulated datasets. Survival methods generally have greater statistical power than standard differential expression methods when the proportion of missing protein level data is 5% or more. In particular, the AFT models we consider consistently achieve greater statistical power than standard testing procedures, with the discrepancy widening with increasing missingness in the proportions. The testing procedures discussed in this article can all be performed using readily available software such as R. The R codes are provided as supplemental materials. ctekwe@stat.tamu.edu.

  19. Multivariate Analysis, Mass Balance Techniques, and Statistical Tests as Tools in Igneous Petrology: Application to the Sierra de las Cruces Volcanic Range (Mexican Volcanic Belt)

    PubMed Central

    Velasco-Tapia, Fernando

    2014-01-01

    Magmatic processes have usually been identified and evaluated using qualitative or semiquantitative geochemical or isotopic tools based on a restricted number of variables. However, a more complete and quantitative view could be reached applying multivariate analysis, mass balance techniques, and statistical tests. As an example, in this work a statistical and quantitative scheme is applied to analyze the geochemical features for the Sierra de las Cruces (SC) volcanic range (Mexican Volcanic Belt). In this locality, the volcanic activity (3.7 to 0.5 Ma) was dominantly dacitic, but the presence of spheroidal andesitic enclaves and/or diverse disequilibrium features in majority of lavas confirms the operation of magma mixing/mingling. New discriminant-function-based multidimensional diagrams were used to discriminate tectonic setting. Statistical tests of discordancy and significance were applied to evaluate the influence of the subducting Cocos plate, which seems to be rather negligible for the SC magmas in relation to several major and trace elements. A cluster analysis following Ward's linkage rule was carried out to classify the SC volcanic rocks geochemical groups. Finally, two mass-balance schemes were applied for the quantitative evaluation of the proportion of the end-member components (dacitic and andesitic magmas) in the comingled lavas (binary mixtures). PMID:24737994

  20. The impact of mother's literacy on child dental caries: Individual data or aggregate data analysis?

    PubMed

    Haghdoost, Ali-Akbar; Hessari, Hossein; Baneshi, Mohammad Reza; Rad, Maryam; Shahravan, Arash

    2017-01-01

    To evaluate the impact of mother's literacy on child dental caries based on a national oral health survey in Iran and to investigate the possibility of ecological fallacy in aggregate data analysis. Existing data were from second national oral health survey that was carried out in 2004, which including 8725 6 years old participants. The association of mother's literacy with caries occurrence (DMF (Decayed, Missing, Filling) total score >0) of her child was assessed using individual data by logistic regression model. Then the association of the percentages of mother's literacy and the percentages of decayed teeth in each 30 provinces of Iran was assessed using aggregated data retrieved from the data of second national oral health survey of Iran and alternatively from census of "Statistical Center of Iran" using linear regression model. The significance level was set at 0.05 for all analysis. Individual data analysis showed a statistically significant association between mother's literacy and decayed teeth of children ( P = 0.02, odds ratio = 0.83). There were not statistical significant association between mother's literacy and child dental caries in aggregate data analysis of oral health survey ( P = 0.79, B = 0.03) and census of "Statistical Center of Statistics" ( P = 0.60, B = 0.14). Literate mothers have a preventive effect on occurring dental caries of children. According to the high percentage of illiterate parents in Iran, it's logical to consider suitable methods of oral health education which do not need reading or writing. Aggregate data analysis and individual data analysis had completely different results in this study.

  1. Dynamic association rules for gene expression data analysis.

    PubMed

    Chen, Shu-Chuan; Tsai, Tsung-Hsien; Chung, Cheng-Han; Li, Wen-Hsiung

    2015-10-14

    The purpose of gene expression analysis is to look for the association between regulation of gene expression levels and phenotypic variations. This association based on gene expression profile has been used to determine whether the induction/repression of genes correspond to phenotypic variations including cell regulations, clinical diagnoses and drug development. Statistical analyses on microarray data have been developed to resolve gene selection issue. However, these methods do not inform us of causality between genes and phenotypes. In this paper, we propose the dynamic association rule algorithm (DAR algorithm) which helps ones to efficiently select a subset of significant genes for subsequent analysis. The DAR algorithm is based on association rules from market basket analysis in marketing. We first propose a statistical way, based on constructing a one-sided confidence interval and hypothesis testing, to determine if an association rule is meaningful. Based on the proposed statistical method, we then developed the DAR algorithm for gene expression data analysis. The method was applied to analyze four microarray datasets and one Next Generation Sequencing (NGS) dataset: the Mice Apo A1 dataset, the whole genome expression dataset of mouse embryonic stem cells, expression profiling of the bone marrow of Leukemia patients, Microarray Quality Control (MAQC) data set and the RNA-seq dataset of a mouse genomic imprinting study. A comparison of the proposed method with the t-test on the expression profiling of the bone marrow of Leukemia patients was conducted. We developed a statistical way, based on the concept of confidence interval, to determine the minimum support and minimum confidence for mining association relationships among items. With the minimum support and minimum confidence, one can find significant rules in one single step. The DAR algorithm was then developed for gene expression data analysis. Four gene expression datasets showed that the proposed DAR algorithm not only was able to identify a set of differentially expressed genes that largely agreed with that of other methods, but also provided an efficient and accurate way to find influential genes of a disease. In the paper, the well-established association rule mining technique from marketing has been successfully modified to determine the minimum support and minimum confidence based on the concept of confidence interval and hypothesis testing. It can be applied to gene expression data to mine significant association rules between gene regulation and phenotype. The proposed DAR algorithm provides an efficient way to find influential genes that underlie the phenotypic variance.

  2. Markov model plus k-word distributions: a synergy that produces novel statistical measures for sequence comparison.

    PubMed

    Dai, Qi; Yang, Yanchun; Wang, Tianming

    2008-10-15

    Many proposed statistical measures can efficiently compare biological sequences to further infer their structures, functions and evolutionary information. They are related in spirit because all the ideas for sequence comparison try to use the information on the k-word distributions, Markov model or both. Motivated by adding k-word distributions to Markov model directly, we investigated two novel statistical measures for sequence comparison, called wre.k.r and S2.k.r. The proposed measures were tested by similarity search, evaluation on functionally related regulatory sequences and phylogenetic analysis. This offers the systematic and quantitative experimental assessment of our measures. Moreover, we compared our achievements with these based on alignment or alignment-free. We grouped our experiments into two sets. The first one, performed via ROC (receiver operating curve) analysis, aims at assessing the intrinsic ability of our statistical measures to search for similar sequences from a database and discriminate functionally related regulatory sequences from unrelated sequences. The second one aims at assessing how well our statistical measure is used for phylogenetic analysis. The experimental assessment demonstrates that our similarity measures intending to incorporate k-word distributions into Markov model are more efficient.

  3. A large-scale perspective on stress-induced alterations in resting-state networks

    NASA Astrophysics Data System (ADS)

    Maron-Katz, Adi; Vaisvaser, Sharon; Lin, Tamar; Hendler, Talma; Shamir, Ron

    2016-02-01

    Stress is known to induce large-scale neural modulations. However, its neural effect once the stressor is removed and how it relates to subjective experience are not fully understood. Here we used a statistically sound data-driven approach to investigate alterations in large-scale resting-state functional connectivity (rsFC) induced by acute social stress. We compared rsfMRI profiles of 57 healthy male subjects before and after stress induction. Using a parcellation-based univariate statistical analysis, we identified a large-scale rsFC change, involving 490 parcel-pairs. Aiming to characterize this change, we employed statistical enrichment analysis, identifying anatomic structures that were significantly interconnected by these pairs. This analysis revealed strengthening of thalamo-cortical connectivity and weakening of cross-hemispheral parieto-temporal connectivity. These alterations were further found to be associated with change in subjective stress reports. Integrating report-based information on stress sustainment 20 minutes post induction, revealed a single significant rsFC change between the right amygdala and the precuneus, which inversely correlated with the level of subjective recovery. Our study demonstrates the value of enrichment analysis for exploring large-scale network reorganization patterns, and provides new insight on stress-induced neural modulations and their relation to subjective experience.

  4. A new statistical PCA-ICA algorithm for location of R-peaks in ECG.

    PubMed

    Chawla, M P S; Verma, H K; Kumar, Vinod

    2008-09-16

    The success of ICA to separate the independent components from the mixture depends on the properties of the electrocardiogram (ECG) recordings. This paper discusses some of the conditions of independent component analysis (ICA) that could affect the reliability of the separation and evaluation of issues related to the properties of the signals and number of sources. Principal component analysis (PCA) scatter plots are plotted to indicate the diagnostic features in the presence and absence of base-line wander in interpreting the ECG signals. In this analysis, a newly developed statistical algorithm by authors, based on the use of combined PCA-ICA for two correlated channels of 12-channel ECG data is proposed. ICA technique has been successfully implemented in identifying and removal of noise and artifacts from ECG signals. Cleaned ECG signals are obtained using statistical measures like kurtosis and variance of variance after ICA processing. This analysis also paper deals with the detection of QRS complexes in electrocardiograms using combined PCA-ICA algorithm. The efficacy of the combined PCA-ICA algorithm lies in the fact that the location of the R-peaks is bounded from above and below by the location of the cross-over points, hence none of the peaks are ignored or missed.

  5. Detection of Fatty Acids from Intact Microorganisms by Molecular Beam Static Secondary Ion Mass Spectrometry

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ingram, Jani Cheri; Lehman, Richard Michael; Bauer, William Francis

    We report the use of a surface analysis approach, static secondary ion mass spectrometry (SIMS) equipped with a molecular (ReO4-) ion primary beam, to analyze the surface of intact microbial cells. SIMS spectra of 28 microorganisms were compared to fatty acid profiles determined by gas chromatographic analysis of transesterfied fatty acids extracted from the same organisms. The results indicate that surface bombardment using the molecular primary beam cleaved the ester linkage characteristic of bacteria at the glycerophosphate backbone of the phospholipid components of the cell membrane. This cleavage enables direct detection of the fatty acid conjugate base of intact microorganismsmore » by static SIMS. The limit of detection for this approach is approximately 107 bacterial cells/cm2. Multivariate statistical methods were applied in a graded approach to the SIMS microbial data. The results showed that the full data set could initially be statistically grouped based upon major differences in biochemical composition of the cell wall. The gram-positive bacteria were further statistically analyzed, followed by final analysis of a specific bacterial genus that was successfully grouped by species. Additionally, the use of SIMS to detect microbes on mineral surfaces is demonstrated by an analysis of Shewanella oneidensis on crushed hematite. The results of this study provide evidence for the potential of static SIMS to rapidly detect bacterial species based on ion fragments originating from cell membrane lipids directly from sample surfaces.« less

  6. Planning representation for automated exploratory data analysis

    NASA Astrophysics Data System (ADS)

    St. Amant, Robert; Cohen, Paul R.

    1994-03-01

    Igor is a knowledge-based system for exploratory statistical analysis of complex systems and environments. Igor has two related goals: to help automate the search for interesting patterns in data sets, and to help develop models that capture significant relationships in the data. We outline a language for Igor, based on techniques of opportunistic planning, which balances control and opportunism. We describe the application of Igor to the analysis of the behavior of Phoenix, an artificial intelligence planning system.

  7. Fisher statistics for analysis of diffusion tensor directional information.

    PubMed

    Hutchinson, Elizabeth B; Rutecki, Paul A; Alexander, Andrew L; Sutula, Thomas P

    2012-04-30

    A statistical approach is presented for the quantitative analysis of diffusion tensor imaging (DTI) directional information using Fisher statistics, which were originally developed for the analysis of vectors in the field of paleomagnetism. In this framework, descriptive and inferential statistics have been formulated based on the Fisher probability density function, a spherical analogue of the normal distribution. The Fisher approach was evaluated for investigation of rat brain DTI maps to characterize tissue orientation in the corpus callosum, fornix, and hilus of the dorsal hippocampal dentate gyrus, and to compare directional properties in these regions following status epilepticus (SE) or traumatic brain injury (TBI) with values in healthy brains. Direction vectors were determined for each region of interest (ROI) for each brain sample and Fisher statistics were applied to calculate the mean direction vector and variance parameters in the corpus callosum, fornix, and dentate gyrus of normal rats and rats that experienced TBI or SE. Hypothesis testing was performed by calculation of Watson's F-statistic and associated p-value giving the likelihood that grouped observations were from the same directional distribution. In the fornix and midline corpus callosum, no directional differences were detected between groups, however in the hilus, significant (p<0.0005) differences were found that robustly confirmed observations that were suggested by visual inspection of directionally encoded color DTI maps. The Fisher approach is a potentially useful analysis tool that may extend the current capabilities of DTI investigation by providing a means of statistical comparison of tissue structural orientation. Copyright © 2012 Elsevier B.V. All rights reserved.

  8. Probabilistic Analysis for Comparing Fatigue Data Based on Johnson-Weibull Parameters

    NASA Technical Reports Server (NTRS)

    Hendricks, Robert C.; Zaretsky, Erwin V.; Vicek, Brian L.

    2007-01-01

    Probabilistic failure analysis is essential when analysis of stress-life (S-N) curves is inconclusive in determining the relative ranking of two or more materials. In 1964, L. Johnson published a methodology for establishing the confidence that two populations of data are different. Simplified algebraic equations for confidence numbers were derived based on the original work of L. Johnson. Using the ratios of mean life, the resultant values of confidence numbers deviated less than one percent from those of Johnson. It is possible to rank the fatigue lives of different materials with a reasonable degree of statistical certainty based on combined confidence numbers. These equations were applied to rotating beam fatigue tests that were conducted on three aluminum alloys at three stress levels each. These alloys were AL 2024, AL 6061, and AL 7075. The results were analyzed and compared using ASTM Standard E739-91 and the Johnson-Weibull analysis. The ASTM method did not statistically distinguish between AL 6010 and AL 7075. Based on the Johnson-Weibull analysis confidence numbers greater than 99 percent, AL 2024 was found to have the longest fatigue life, followed by AL 7075, and then AL 6061. The ASTM Standard and the Johnson-Weibull analysis result in the same stress-life exponent p for each of the three aluminum alloys at the median or L(sub 50) lives.

  9. Statistical significance approximation in local trend analysis of high-throughput time-series data using the theory of Markov chains.

    PubMed

    Xia, Li C; Ai, Dongmei; Cram, Jacob A; Liang, Xiaoyi; Fuhrman, Jed A; Sun, Fengzhu

    2015-09-21

    Local trend (i.e. shape) analysis of time series data reveals co-changing patterns in dynamics of biological systems. However, slow permutation procedures to evaluate the statistical significance of local trend scores have limited its applications to high-throughput time series data analysis, e.g., data from the next generation sequencing technology based studies. By extending the theories for the tail probability of the range of sum of Markovian random variables, we propose formulae for approximating the statistical significance of local trend scores. Using simulations and real data, we show that the approximate p-value is close to that obtained using a large number of permutations (starting at time points >20 with no delay and >30 with delay of at most three time steps) in that the non-zero decimals of the p-values obtained by the approximation and the permutations are mostly the same when the approximate p-value is less than 0.05. In addition, the approximate p-value is slightly larger than that based on permutations making hypothesis testing based on the approximate p-value conservative. The approximation enables efficient calculation of p-values for pairwise local trend analysis, making large scale all-versus-all comparisons possible. We also propose a hybrid approach by integrating the approximation and permutations to obtain accurate p-values for significantly associated pairs. We further demonstrate its use with the analysis of the Polymouth Marine Laboratory (PML) microbial community time series from high-throughput sequencing data and found interesting organism co-occurrence dynamic patterns. The software tool is integrated into the eLSA software package that now provides accelerated local trend and similarity analysis pipelines for time series data. The package is freely available from the eLSA website: http://bitbucket.org/charade/elsa.

  10. Biotrichotomy: The Neuroscientific and Neurobiological Systemology, Epistemology, and Methodology of the Tri-Squared Test and Tri-Center Analysis in Biostatistics

    ERIC Educational Resources Information Center

    Osler, James Edward

    2015-01-01

    This monograph provides a neuroscience-based systemological, epistemological, and methodological rational for the design of an advanced and novel parametric statistical analytics designed for the biological sciences referred to as "Biotrichotomy". The aim of this new arena of statistics is to provide dual metrics designed to analyze the…

  11. Application of Turchin's method of statistical regularization

    NASA Astrophysics Data System (ADS)

    Zelenyi, Mikhail; Poliakova, Mariia; Nozik, Alexander; Khudyakov, Alexey

    2018-04-01

    During analysis of experimental data, one usually needs to restore a signal after it has been convoluted with some kind of apparatus function. According to Hadamard's definition this problem is ill-posed and requires regularization to provide sensible results. In this article we describe an implementation of the Turchin's method of statistical regularization based on the Bayesian approach to the regularization strategy.

  12. A statistical power analysis of woody carbon flux from forest inventory data

    Treesearch

    James A. Westfall; Christopher W. Woodall; Mark A. Hatfield

    2013-01-01

    At a national scale, the carbon (C) balance of numerous forest ecosystem C pools can be monitored using a stock change approach based on national forest inventory data. Given the potential influence of disturbance events and/or climate change processes, the statistical detection of changes in forest C stocks is paramount to maintaining the net sequestration status of...

  13. Person Fit Analysis in Computerized Adaptive Testing Using Tests for a Change Point

    ERIC Educational Resources Information Center

    Sinharay, Sandip

    2016-01-01

    Meijer and van Krimpen-Stoop noted that the number of person-fit statistics (PFSs) that have been designed for computerized adaptive tests (CATs) is relatively modest. This article partially addresses that concern by suggesting three new PFSs for CATs. The statistics are based on tests for a change point and can be used to detect an abrupt change…

  14. A Monte Carlo Analysis of the Thrust Imbalance for the RSRMV Booster During Both the Ignition Transient and Steady State Operation

    NASA Technical Reports Server (NTRS)

    Foster, Winfred A., Jr.; Crowder, Winston; Steadman, Todd E.

    2014-01-01

    This paper presents the results of statistical analyses performed to predict the thrust imbalance between two solid rocket motor boosters to be used on the Space Launch System (SLS) vehicle. Two legacy internal ballistics codes developed for the Space Shuttle program were coupled with a Monte Carlo analysis code to determine a thrust imbalance envelope for the SLS vehicle based on the performance of 1000 motor pairs. Thirty three variables which could impact the performance of the motors during the ignition transient and thirty eight variables which could impact the performance of the motors during steady state operation of the motor were identified and treated as statistical variables for the analyses. The effects of motor to motor variation as well as variations between motors of a single pair were included in the analyses. The statistical variations of the variables were defined based on data provided by NASA's Marshall Space Flight Center for the upgraded five segment booster and from the Space Shuttle booster when appropriate. The results obtained for the statistical envelope are compared with the design specification thrust imbalance limits for the SLS launch vehicle

  15. A Monte Carlo Analysis of the Thrust Imbalance for the Space Launch System Booster During Both the Ignition Transient and Steady State Operation

    NASA Technical Reports Server (NTRS)

    Foster, Winfred A., Jr.; Crowder, Winston; Steadman, Todd E.

    2014-01-01

    This paper presents the results of statistical analyses performed to predict the thrust imbalance between two solid rocket motor boosters to be used on the Space Launch System (SLS) vehicle. Two legacy internal ballistics codes developed for the Space Shuttle program were coupled with a Monte Carlo analysis code to determine a thrust imbalance envelope for the SLS vehicle based on the performance of 1000 motor pairs. Thirty three variables which could impact the performance of the motors during the ignition transient and thirty eight variables which could impact the performance of the motors during steady state operation of the motor were identified and treated as statistical variables for the analyses. The effects of motor to motor variation as well as variations between motors of a single pair were included in the analyses. The statistical variations of the variables were defined based on data provided by NASA's Marshall Space Flight Center for the upgraded five segment booster and from the Space Shuttle booster when appropriate. The results obtained for the statistical envelope are compared with the design specification thrust imbalance limits for the SLS launch vehicle.

  16. A Statistics-based Platform for Quantitative N-terminome Analysis and Identification of Protease Cleavage Products*

    PubMed Central

    auf dem Keller, Ulrich; Prudova, Anna; Gioia, Magda; Butler, Georgina S.; Overall, Christopher M.

    2010-01-01

    Terminal amine isotopic labeling of substrates (TAILS), our recently introduced platform for quantitative N-terminome analysis, enables wide dynamic range identification of original mature protein N-termini and protease cleavage products. Modifying TAILS by use of isobaric tag for relative and absolute quantification (iTRAQ)-like labels for quantification together with a robust statistical classifier derived from experimental protease cleavage data, we report reliable and statistically valid identification of proteolytic events in complex biological systems in MS2 mode. The statistical classifier is supported by a novel parameter evaluating ion intensity-dependent quantification confidences of single peptide quantifications, the quantification confidence factor (QCF). Furthermore, the isoform assignment score (IAS) is introduced, a new scoring system for the evaluation of single peptide-to-protein assignments based on high confidence protein identifications in the same sample prior to negative selection enrichment of N-terminal peptides. By these approaches, we identified and validated, in addition to known substrates, low abundance novel bioactive MMP-2 targets including the plasminogen receptor S100A10 (p11) and the proinflammatory cytokine proEMAP/p43 that were previously undescribed. PMID:20305283

  17. Statistical trends of episiotomy around the world: Comparative systematic review of changing practices.

    PubMed

    Clesse, Christophe; Lighezzolo-Alnot, Joëlle; De Lavergne, Sylvie; Hamlin, Sandrine; Scheffler, Michèle

    2018-06-01

    The authors' purpose for this article is to identify, review and interpret all publications about the episiotomy rates worldwide. Based on the criteria from the PRISMA guidelines, twenty databases were scrutinized. All studies which include national statistics related to episiotomy were selected, as well as studies presenting estimated data. Sixty-one papers were selected with publication dates between 1995 and 2016. A static and dynamic analysis of all the results was carried out. The assumption for the decline in the number of episiotomies is discussed and confirmed, recalling that nowadays high rates of episiotomy remain in less industrialized countries and East Asia. Finally, our analysis aims to investigate the potential determinants which influence apparent statistical disparities.

  18. Statistical research on the bioactivity of new marine natural products discovered during the 28 years from 1985 to 2012.

    PubMed

    Hu, Yiwen; Chen, Jiahui; Hu, Guping; Yu, Jianchen; Zhu, Xun; Lin, Yongcheng; Chen, Shengping; Yuan, Jie

    2015-01-07

    Every year, hundreds of new compounds are discovered from the metabolites of marine organisms. Finding new and useful compounds is one of the crucial drivers for this field of research. Here we describe the statistics of bioactive compounds discovered from marine organisms from 1985 to 2012. This work is based on our database, which contains information on more than 15,000 chemical substances including 4196 bioactive marine natural products. We performed a comprehensive statistical analysis to understand the characteristics of the novel bioactive compounds and detail temporal trends, chemical structures, species distribution, and research progress. We hope this meta-analysis will provide useful information for research into the bioactivity of marine natural products and drug development.

  19. Fourier descriptor analysis and unification of voice range profile contours: method and applications.

    PubMed

    Pabon, Peter; Ternström, Sten; Lamarche, Anick

    2011-06-01

    To describe a method for unified description, statistical modeling, and comparison of voice range profile (VRP) contours, even from diverse sources. A morphologic modeling technique, which is based on Fourier descriptors (FDs), is applied to the VRP contour. The technique, which essentially involves resampling of the curve of the contour, is assessed and also is compared to density-based VRP averaging methods that use the overlap count. VRP contours can be usefully described and compared using FDs. The method also permits the visualization of the local covariation along the contour average. For example, the FD-based analysis shows that the population variance for ensembles of VRP contours is usually smallest at the upper left part of the VRP. To illustrate the method's advantages and possible further application, graphs are given that compare the averaged contours from different authors and recording devices--for normal, trained, and untrained male and female voices as well as for child voices. The proposed technique allows any VRP shape to be brought to the same uniform base. On this uniform base, VRP contours or contour elements coming from a variety of sources may be placed within the same graph for comparison and for statistical analysis.

  20. Association between pathology and texture features of multi parametric MRI of the prostate

    NASA Astrophysics Data System (ADS)

    Kuess, Peter; Andrzejewski, Piotr; Nilsson, David; Georg, Petra; Knoth, Johannes; Susani, Martin; Trygg, Johan; Helbich, Thomas H.; Polanec, Stephan H.; Georg, Dietmar; Nyholm, Tufve

    2017-10-01

    The role of multi-parametric (mp)MRI in the diagnosis and treatment of prostate cancer has increased considerably. An alternative to visual inspection of mpMRI is the evaluation using histogram-based (first order statistics) parameters and textural features (second order statistics). The aims of the present work were to investigate the relationship between benign and malignant sub-volumes of the prostate and textures obtained from mpMR images. The performance of tumor prediction was investigated based on the combination of histogram-based and textural parameters. Subsequently, the relative importance of mpMR images was assessed and the benefit of additional imaging analyzed. Finally, sub-structures based on the PI-RADS classification were investigated as potential regions to automatically detect maligned lesions. Twenty-five patients who received mpMRI prior to radical prostatectomy were included in the study. The imaging protocol included T2, DWI, and DCE. Delineation of tumor regions was performed based on pathological information. First and second order statistics were derived from each structure and for all image modalities. The resulting data were processed with multivariate analysis, using PCA (principal component analysis) and OPLS-DA (orthogonal partial least squares discriminant analysis) for separation of malignant and healthy tissue. PCA showed a clear difference between tumor and healthy regions in the peripheral zone for all investigated images. The predictive ability of the OPLS-DA models increased for all image modalities when first and second order statistics were combined. The predictive value reached a plateau after adding ADC and T2, and did not increase further with the addition of other image information. The present study indicates a distinct difference in the signatures between malign and benign prostate tissue. This is an absolute prerequisite for automatic tumor segmentation, but only the first step in that direction. For the specific identified signature, DCE did not add complementary information to T2 and ADC maps.

  1. Generalization of Entropy Based Divergence Measures for Symbolic Sequence Analysis

    PubMed Central

    Ré, Miguel A.; Azad, Rajeev K.

    2014-01-01

    Entropy based measures have been frequently used in symbolic sequence analysis. A symmetrized and smoothed form of Kullback-Leibler divergence or relative entropy, the Jensen-Shannon divergence (JSD), is of particular interest because of its sharing properties with families of other divergence measures and its interpretability in different domains including statistical physics, information theory and mathematical statistics. The uniqueness and versatility of this measure arise because of a number of attributes including generalization to any number of probability distributions and association of weights to the distributions. Furthermore, its entropic formulation allows its generalization in different statistical frameworks, such as, non-extensive Tsallis statistics and higher order Markovian statistics. We revisit these generalizations and propose a new generalization of JSD in the integrated Tsallis and Markovian statistical framework. We show that this generalization can be interpreted in terms of mutual information. We also investigate the performance of different JSD generalizations in deconstructing chimeric DNA sequences assembled from bacterial genomes including that of E. coli, S. enterica typhi, Y. pestis and H. influenzae. Our results show that the JSD generalizations bring in more pronounced improvements when the sequences being compared are from phylogenetically proximal organisms, which are often difficult to distinguish because of their compositional similarity. While small but noticeable improvements were observed with the Tsallis statistical JSD generalization, relatively large improvements were observed with the Markovian generalization. In contrast, the proposed Tsallis-Markovian generalization yielded more pronounced improvements relative to the Tsallis and Markovian generalizations, specifically when the sequences being compared arose from phylogenetically proximal organisms. PMID:24728338

  2. Generalization of entropy based divergence measures for symbolic sequence analysis.

    PubMed

    Ré, Miguel A; Azad, Rajeev K

    2014-01-01

    Entropy based measures have been frequently used in symbolic sequence analysis. A symmetrized and smoothed form of Kullback-Leibler divergence or relative entropy, the Jensen-Shannon divergence (JSD), is of particular interest because of its sharing properties with families of other divergence measures and its interpretability in different domains including statistical physics, information theory and mathematical statistics. The uniqueness and versatility of this measure arise because of a number of attributes including generalization to any number of probability distributions and association of weights to the distributions. Furthermore, its entropic formulation allows its generalization in different statistical frameworks, such as, non-extensive Tsallis statistics and higher order Markovian statistics. We revisit these generalizations and propose a new generalization of JSD in the integrated Tsallis and Markovian statistical framework. We show that this generalization can be interpreted in terms of mutual information. We also investigate the performance of different JSD generalizations in deconstructing chimeric DNA sequences assembled from bacterial genomes including that of E. coli, S. enterica typhi, Y. pestis and H. influenzae. Our results show that the JSD generalizations bring in more pronounced improvements when the sequences being compared are from phylogenetically proximal organisms, which are often difficult to distinguish because of their compositional similarity. While small but noticeable improvements were observed with the Tsallis statistical JSD generalization, relatively large improvements were observed with the Markovian generalization. In contrast, the proposed Tsallis-Markovian generalization yielded more pronounced improvements relative to the Tsallis and Markovian generalizations, specifically when the sequences being compared arose from phylogenetically proximal organisms.

  3. Student Background, School Climate, School Disorder, and Student Achievement: An Empirical Study of New York City's Middle Schools

    ERIC Educational Resources Information Center

    Chen, Greg; Weikart, Lynne A.

    2008-01-01

    This study develops and tests a school disorder and student achievement model based upon the school climate framework. The model was fitted to 212 New York City middle schools using the Structural Equations Modeling Analysis method. The analysis shows that the model fits the data well based upon test statistics and goodness of fit indices. The…

  4. A nonparametric analysis of plot basal area growth using tree based models

    Treesearch

    G. L. Gadbury; H. K. lyer; H. T. Schreuder; C. Y. Ueng

    1997-01-01

    Tree based statistical models can be used to investigate data structure and predict future observations. We used nonparametric and nonlinear models to reexamine the data sets on tree growth used by Bechtold et al. (1991) and Ruark et al. (1991). The growth data were collected by Forest Inventory and Analysis (FIA) teams from 1962 to 1972 (4th cycle) and 1972 to 1982 (...

  5. Estimating the Regional Economic Significance of Airports

    DTIC Science & Technology

    1992-09-01

    following three options for estimating induced impacts: the economic base model , an econometric model , and a regional input-output model . One approach to...limitations, however, the economic base model has been widely used for regional economic analysis. A second approach is to develop an econometric model of...analysis is the principal statistical tool used to estimate the economic relationships. Regional econometric models are capable of estimating a single

  6. Analysis of spontaneous MEG activity in mild cognitive impairment and Alzheimer's disease using spectral entropies and statistical complexity measures

    NASA Astrophysics Data System (ADS)

    Bruña, Ricardo; Poza, Jesús; Gómez, Carlos; García, María; Fernández, Alberto; Hornero, Roberto

    2012-06-01

    Alzheimer's disease (AD) is the most common cause of dementia. Over the last few years, a considerable effort has been devoted to exploring new biomarkers. Nevertheless, a better understanding of brain dynamics is still required to optimize therapeutic strategies. In this regard, the characterization of mild cognitive impairment (MCI) is crucial, due to the high conversion rate from MCI to AD. However, only a few studies have focused on the analysis of magnetoencephalographic (MEG) rhythms to characterize AD and MCI. In this study, we assess the ability of several parameters derived from information theory to describe spontaneous MEG activity from 36 AD patients, 18 MCI subjects and 26 controls. Three entropies (Shannon, Tsallis and Rényi entropies), one disequilibrium measure (based on Euclidean distance ED) and three statistical complexities (based on Lopez Ruiz-Mancini-Calbet complexity LMC) were used to estimate the irregularity and statistical complexity of MEG activity. Statistically significant differences between AD patients and controls were obtained with all parameters (p < 0.01). In addition, statistically significant differences between MCI subjects and controls were achieved by ED and LMC (p < 0.05). In order to assess the diagnostic ability of the parameters, a linear discriminant analysis with a leave-one-out cross-validation procedure was applied. The accuracies reached 83.9% and 65.9% to discriminate AD and MCI subjects from controls, respectively. Our findings suggest that MCI subjects exhibit an intermediate pattern of abnormalities between normal aging and AD. Furthermore, the proposed parameters provide a new description of brain dynamics in AD and MCI.

  7. Global Sensitivity Analysis of Environmental Systems via Multiple Indices based on Statistical Moments of Model Outputs

    NASA Astrophysics Data System (ADS)

    Guadagnini, A.; Riva, M.; Dell'Oca, A.

    2017-12-01

    We propose to ground sensitivity of uncertain parameters of environmental models on a set of indices based on the main (statistical) moments, i.e., mean, variance, skewness and kurtosis, of the probability density function (pdf) of a target model output. This enables us to perform Global Sensitivity Analysis (GSA) of a model in terms of multiple statistical moments and yields a quantification of the impact of model parameters on features driving the shape of the pdf of model output. Our GSA approach includes the possibility of being coupled with the construction of a reduced complexity model that allows approximating the full model response at a reduced computational cost. We demonstrate our approach through a variety of test cases. These include a commonly used analytical benchmark, a simplified model representing pumping in a coastal aquifer, a laboratory-scale tracer experiment, and the migration of fracturing fluid through a naturally fractured reservoir (source) to reach an overlying formation (target). Our strategy allows discriminating the relative importance of model parameters to the four statistical moments considered. We also provide an appraisal of the error associated with the evaluation of our sensitivity metrics by replacing the original system model through the selected surrogate model. Our results suggest that one might need to construct a surrogate model with increasing level of accuracy depending on the statistical moment considered in the GSA. The methodological framework we propose can assist the development of analysis techniques targeted to model calibration, design of experiment, uncertainty quantification and risk assessment.

  8. Atrial Electrogram Fractionation Distribution before and after Pulmonary Vein Isolation in Human Persistent Atrial Fibrillation-A Retrospective Multivariate Statistical Analysis.

    PubMed

    Almeida, Tiago P; Chu, Gavin S; Li, Xin; Dastagir, Nawshin; Tuan, Jiun H; Stafford, Peter J; Schlindwein, Fernando S; Ng, G André

    2017-01-01

    Purpose: Complex fractionated atrial electrograms (CFAE)-guided ablation after pulmonary vein isolation (PVI) has been used for persistent atrial fibrillation (persAF) therapy. This strategy has shown suboptimal outcomes due to, among other factors, undetected changes in the atrial tissue following PVI. In the present work, we investigate CFAE distribution before and after PVI in patients with persAF using a multivariate statistical model. Methods: 207 pairs of atrial electrograms (AEGs) were collected before and after PVI respectively, from corresponding LA regions in 18 persAF patients. Twelve attributes were measured from the AEGs, before and after PVI. Statistical models based on multivariate analysis of variance (MANOVA) and linear discriminant analysis (LDA) have been used to characterize the atrial regions and AEGs. Results: PVI significantly reduced CFAEs in the LA (70 vs. 40%; P < 0.0001). Four types of LA regions were identified, based on the AEGs characteristics: (i) fractionated before PVI that remained fractionated after PVI (31% of the collected points); (ii) fractionated that converted to normal (39%); (iii) normal prior to PVI that became fractionated (9%) and; (iv) normal that remained normal (21%). Individually, the attributes failed to distinguish these LA regions, but multivariate statistical models were effective in their discrimination ( P < 0.0001). Conclusion: Our results have unveiled that there are LA regions resistant to PVI, while others are affected by it. Although, traditional methods were unable to identify these different regions, the proposed multivariate statistical model discriminated LA regions resistant to PVI from those affected by it without prior ablation information.

  9. Analyzing Large Gene Expression and Methylation Data Profiles Using StatBicRM: Statistical Biclustering-Based Rule Mining

    PubMed Central

    Maulik, Ujjwal; Mallik, Saurav; Mukhopadhyay, Anirban; Bandyopadhyay, Sanghamitra

    2015-01-01

    Microarray and beadchip are two most efficient techniques for measuring gene expression and methylation data in bioinformatics. Biclustering deals with the simultaneous clustering of genes and samples. In this article, we propose a computational rule mining framework, StatBicRM (i.e., statistical biclustering-based rule mining) to identify special type of rules and potential biomarkers using integrated approaches of statistical and binary inclusion-maximal biclustering techniques from the biological datasets. At first, a novel statistical strategy has been utilized to eliminate the insignificant/low-significant/redundant genes in such way that significance level must satisfy the data distribution property (viz., either normal distribution or non-normal distribution). The data is then discretized and post-discretized, consecutively. Thereafter, the biclustering technique is applied to identify maximal frequent closed homogeneous itemsets. Corresponding special type of rules are then extracted from the selected itemsets. Our proposed rule mining method performs better than the other rule mining algorithms as it generates maximal frequent closed homogeneous itemsets instead of frequent itemsets. Thus, it saves elapsed time, and can work on big dataset. Pathway and Gene Ontology analyses are conducted on the genes of the evolved rules using David database. Frequency analysis of the genes appearing in the evolved rules is performed to determine potential biomarkers. Furthermore, we also classify the data to know how much the evolved rules are able to describe accurately the remaining test (unknown) data. Subsequently, we also compare the average classification accuracy, and other related factors with other rule-based classifiers. Statistical significance tests are also performed for verifying the statistical relevance of the comparative results. Here, each of the other rule mining methods or rule-based classifiers is also starting with the same post-discretized data-matrix. Finally, we have also included the integrated analysis of gene expression and methylation for determining epigenetic effect (viz., effect of methylation) on gene expression level. PMID:25830807

  10. Analyzing large gene expression and methylation data profiles using StatBicRM: statistical biclustering-based rule mining.

    PubMed

    Maulik, Ujjwal; Mallik, Saurav; Mukhopadhyay, Anirban; Bandyopadhyay, Sanghamitra

    2015-01-01

    Microarray and beadchip are two most efficient techniques for measuring gene expression and methylation data in bioinformatics. Biclustering deals with the simultaneous clustering of genes and samples. In this article, we propose a computational rule mining framework, StatBicRM (i.e., statistical biclustering-based rule mining) to identify special type of rules and potential biomarkers using integrated approaches of statistical and binary inclusion-maximal biclustering techniques from the biological datasets. At first, a novel statistical strategy has been utilized to eliminate the insignificant/low-significant/redundant genes in such way that significance level must satisfy the data distribution property (viz., either normal distribution or non-normal distribution). The data is then discretized and post-discretized, consecutively. Thereafter, the biclustering technique is applied to identify maximal frequent closed homogeneous itemsets. Corresponding special type of rules are then extracted from the selected itemsets. Our proposed rule mining method performs better than the other rule mining algorithms as it generates maximal frequent closed homogeneous itemsets instead of frequent itemsets. Thus, it saves elapsed time, and can work on big dataset. Pathway and Gene Ontology analyses are conducted on the genes of the evolved rules using David database. Frequency analysis of the genes appearing in the evolved rules is performed to determine potential biomarkers. Furthermore, we also classify the data to know how much the evolved rules are able to describe accurately the remaining test (unknown) data. Subsequently, we also compare the average classification accuracy, and other related factors with other rule-based classifiers. Statistical significance tests are also performed for verifying the statistical relevance of the comparative results. Here, each of the other rule mining methods or rule-based classifiers is also starting with the same post-discretized data-matrix. Finally, we have also included the integrated analysis of gene expression and methylation for determining epigenetic effect (viz., effect of methylation) on gene expression level.

  11. Performance evaluation of tile-based Fisher Ratio analysis using a benchmark yeast metabolome dataset.

    PubMed

    Watson, Nathanial E; Parsons, Brendon A; Synovec, Robert E

    2016-08-12

    Performance of tile-based Fisher Ratio (F-ratio) data analysis, recently developed for discovery-based studies using comprehensive two-dimensional gas chromatography coupled with time-of-flight mass spectrometry (GC×GC-TOFMS), is evaluated with a metabolomics dataset that had been previously analyzed in great detail, but while taking a brute force approach. The previously analyzed data (referred to herein as the benchmark dataset) were intracellular extracts from Saccharomyces cerevisiae (yeast), either metabolizing glucose (repressed) or ethanol (derepressed), which define the two classes in the discovery-based analysis to find metabolites that are statistically different in concentration between the two classes. Beneficially, this previously analyzed dataset provides a concrete means to validate the tile-based F-ratio software. Herein, we demonstrate and validate the significant benefits of applying tile-based F-ratio analysis. The yeast metabolomics data are analyzed more rapidly in about one week versus one year for the prior studies with this dataset. Furthermore, a null distribution analysis is implemented to statistically determine an adequate F-ratio threshold, whereby the variables with F-ratio values below the threshold can be ignored as not class distinguishing, which provides the analyst with confidence when analyzing the hit table. Forty-six of the fifty-four benchmarked changing metabolites were discovered by the new methodology while consistently excluding all but one of the benchmarked nineteen false positive metabolites previously identified. Copyright © 2016 Elsevier B.V. All rights reserved.

  12. Differentiation of commercial vermiculite based on statistical analysis of bulk chemical data: Fingerprinting vermiculite from Libby, Montana U.S.A

    USGS Publications Warehouse

    Gunter, M.E.; Singleton, E.; Bandli, B.R.; Lowers, H.A.; Meeker, G.P.

    2005-01-01

    Major-, minor-, and trace-element compositions, as determined by X-ray fluorescence (XRF) analysis, were obtained on 34 samples of vermiculite to ascertain whether chemical differences exist to the extent of determining the source of commercial products. The sample set included ores from four deposits, seven commercially available garden products, and insulation from four attics. The trace-element distributions of Ba, Cr, and V can be used to distinguish the Libby vermiculite samples from the garden products. In general, the overall composition of the Libby and South Carolina deposits appeared similar, but differed from the South Africa and China deposits based on simple statistical methods. Cluster analysis provided a good distinction of the four ore types, grouped the four attic samples with the Libby ore, and, with less certainty, grouped the garden samples with the South Africa ore.

  13. One Hundred Ways to be Non-Fickian - A Rigorous Multi-Variate Statistical Analysis of Pore-Scale Transport

    NASA Astrophysics Data System (ADS)

    Most, Sebastian; Nowak, Wolfgang; Bijeljic, Branko

    2015-04-01

    Fickian transport in groundwater flow is the exception rather than the rule. Transport in porous media is frequently simulated via particle methods (i.e. particle tracking random walk (PTRW) or continuous time random walk (CTRW)). These methods formulate transport as a stochastic process of particle position increments. At the pore scale, geometry and micro-heterogeneities prohibit the commonly made assumption of independent and normally distributed increments to represent dispersion. Many recent particle methods seek to loosen this assumption. Hence, it is important to get a better understanding of the processes at pore scale. For our analysis we track the positions of 10.000 particles migrating through the pore space over time. The data we use come from micro CT scans of a homogeneous sandstone and encompass about 10 grain sizes. Based on those images we discretize the pore structure and simulate flow at the pore scale based on the Navier-Stokes equation. This flow field realistically describes flow inside the pore space and we do not need to add artificial dispersion during the transport simulation. Next, we use particle tracking random walk and simulate pore-scale transport. Finally, we use the obtained particle trajectories to do a multivariate statistical analysis of the particle motion at the pore scale. Our analysis is based on copulas. Every multivariate joint distribution is a combination of its univariate marginal distributions. The copula represents the dependence structure of those univariate marginals and is therefore useful to observe correlation and non-Gaussian interactions (i.e. non-Fickian transport). The first goal of this analysis is to better understand the validity regions of commonly made assumptions. We are investigating three different transport distances: 1) The distance where the statistical dependence between particle increments can be modelled as an order-one Markov process. This would be the Markovian distance for the process, where the validity of yet-unexplored non-Gaussian-but-Markovian random walks start. 2) The distance where bivariate statistical dependence simplifies to a multi-Gaussian dependence based on simple linear correlation (validity of correlated PTRW/CTRW). 3) The distance of complete statistical independence (validity of classical PTRW/CTRW). The second objective is to reveal characteristic dependencies influencing transport the most. Those dependencies can be very complex. Copulas are highly capable of representing linear dependence as well as non-linear dependence. With that tool we are able to detect persistent characteristics dominating transport even across different scales. The results derived from our experimental data set suggest that there are many more non-Fickian aspects of pore-scale transport than the univariate statistics of longitudinal displacements. Non-Fickianity can also be found in transverse displacements, and in the relations between increments at different time steps. Also, the found dependence is non-linear (i.e. beyond simple correlation) and persists over long distances. Thus, our results strongly support the further refinement of techniques like correlated PTRW or correlated CTRW towards non-linear statistical relations.

  14. Statistical wind analysis for near-space applications

    NASA Astrophysics Data System (ADS)

    Roney, Jason A.

    2007-09-01

    Statistical wind models were developed based on the existing observational wind data for near-space altitudes between 60 000 and 100 000 ft (18 30 km) above ground level (AGL) at two locations, Akon, OH, USA, and White Sands, NM, USA. These two sites are envisioned as playing a crucial role in the first flights of high-altitude airships. The analysis shown in this paper has not been previously applied to this region of the stratosphere for such an application. Standard statistics were compiled for these data such as mean, median, maximum wind speed, and standard deviation, and the data were modeled with Weibull distributions. These statistics indicated, on a yearly average, there is a lull or a “knee” in the wind between 65 000 and 72 000 ft AGL (20 22 km). From the standard statistics, trends at both locations indicated substantial seasonal variation in the mean wind speed at these heights. The yearly and monthly statistical modeling indicated that Weibull distributions were a reasonable model for the data. Forecasts and hindcasts were done by using a Weibull model based on 2004 data and comparing the model with the 2003 and 2005 data. The 2004 distribution was also a reasonable model for these years. Lastly, the Weibull distribution and cumulative function were used to predict the 50%, 95%, and 99% winds, which are directly related to the expected power requirements of a near-space station-keeping airship. These values indicated that using only the standard deviation of the mean may underestimate the operational conditions.

  15. Fast and accurate imputation of summary statistics enhances evidence of functional enrichment

    PubMed Central

    Pasaniuc, Bogdan; Zaitlen, Noah; Shi, Huwenbo; Bhatia, Gaurav; Gusev, Alexander; Pickrell, Joseph; Hirschhorn, Joel; Strachan, David P.; Patterson, Nick; Price, Alkes L.

    2014-01-01

    Motivation: Imputation using external reference panels (e.g. 1000 Genomes) is a widely used approach for increasing power in genome-wide association studies and meta-analysis. Existing hidden Markov models (HMM)-based imputation approaches require individual-level genotypes. Here, we develop a new method for Gaussian imputation from summary association statistics, a type of data that is becoming widely available. Results: In simulations using 1000 Genomes (1000G) data, this method recovers 84% (54%) of the effective sample size for common (>5%) and low-frequency (1–5%) variants [increasing to 87% (60%) when summary linkage disequilibrium information is available from target samples] versus the gold standard of 89% (67%) for HMM-based imputation, which cannot be applied to summary statistics. Our approach accounts for the limited sample size of the reference panel, a crucial step to eliminate false-positive associations, and it is computationally very fast. As an empirical demonstration, we apply our method to seven case–control phenotypes from the Wellcome Trust Case Control Consortium (WTCCC) data and a study of height in the British 1958 birth cohort (1958BC). Gaussian imputation from summary statistics recovers 95% (105%) of the effective sample size (as quantified by the ratio of χ2 association statistics) compared with HMM-based imputation from individual-level genotypes at the 227 (176) published single nucleotide polymorphisms (SNPs) in the WTCCC (1958BC height) data. In addition, for publicly available summary statistics from large meta-analyses of four lipid traits, we publicly release imputed summary statistics at 1000G SNPs, which could not have been obtained using previously published methods, and demonstrate their accuracy by masking subsets of the data. We show that 1000G imputation using our approach increases the magnitude and statistical evidence of enrichment at genic versus non-genic loci for these traits, as compared with an analysis without 1000G imputation. Thus, imputation of summary statistics will be a valuable tool in future functional enrichment analyses. Availability and implementation: Publicly available software package available at http://bogdan.bioinformatics.ucla.edu/software/. Contact: bpasaniuc@mednet.ucla.edu or aprice@hsph.harvard.edu Supplementary information: Supplementary materials are available at Bioinformatics online. PMID:24990607

  16. Interpreting support vector machine models for multivariate group wise analysis in neuroimaging

    PubMed Central

    Gaonkar, Bilwaj; Shinohara, Russell T; Davatzikos, Christos

    2015-01-01

    Machine learning based classification algorithms like support vector machines (SVMs) have shown great promise for turning a high dimensional neuroimaging data into clinically useful decision criteria. However, tracing imaging based patterns that contribute significantly to classifier decisions remains an open problem. This is an issue of critical importance in imaging studies seeking to determine which anatomical or physiological imaging features contribute to the classifier’s decision, thereby allowing users to critically evaluate the findings of such machine learning methods and to understand disease mechanisms. The majority of published work addresses the question of statistical inference for support vector classification using permutation tests based on SVM weight vectors. Such permutation testing ignores the SVM margin, which is critical in SVM theory. In this work we emphasize the use of a statistic that explicitly accounts for the SVM margin and show that the null distributions associated with this statistic are asymptotically normal. Further, our experiments show that this statistic is a lot less conservative as compared to weight based permutation tests and yet specific enough to tease out multivariate patterns in the data. Thus, we can better understand the multivariate patterns that the SVM uses for neuroimaging based classification. PMID:26210913

  17. Data management in large-scale collaborative toxicity studies: how to file experimental data for automated statistical analysis.

    PubMed

    Stanzel, Sven; Weimer, Marc; Kopp-Schneider, Annette

    2013-06-01

    High-throughput screening approaches are carried out for the toxicity assessment of a large number of chemical compounds. In such large-scale in vitro toxicity studies several hundred or thousand concentration-response experiments are conducted. The automated evaluation of concentration-response data using statistical analysis scripts saves time and yields more consistent results in comparison to data analysis performed by the use of menu-driven statistical software. Automated statistical analysis requires that concentration-response data are available in a standardised data format across all compounds. To obtain consistent data formats, a standardised data management workflow must be established, including guidelines for data storage, data handling and data extraction. In this paper two procedures for data management within large-scale toxicological projects are proposed. Both procedures are based on Microsoft Excel files as the researcher's primary data format and use a computer programme to automate the handling of data files. The first procedure assumes that data collection has not yet started whereas the second procedure can be used when data files already exist. Successful implementation of the two approaches into the European project ACuteTox is illustrated. Copyright © 2012 Elsevier Ltd. All rights reserved.

  18. Fast mean and variance computation of the diffuse sound transmission through finite-sized thick and layered wall and floor systems

    NASA Astrophysics Data System (ADS)

    Decraene, Carolina; Dijckmans, Arne; Reynders, Edwin P. B.

    2018-05-01

    A method is developed for computing the mean and variance of the diffuse field sound transmission loss of finite-sized layered wall and floor systems that consist of solid, fluid and/or poroelastic layers. This is achieved by coupling a transfer matrix model of the wall or floor to statistical energy analysis subsystem models of the adjacent room volumes. The modal behavior of the wall is approximately accounted for by projecting the wall displacement onto a set of sinusoidal lateral basis functions. This hybrid modal transfer matrix-statistical energy analysis method is validated on multiple wall systems: a thin steel plate, a polymethyl methacrylate panel, a thick brick wall, a sandwich panel, a double-leaf wall with poro-elastic material in the cavity, and a double glazing. The predictions are compared with experimental data and with results obtained using alternative prediction methods such as the transfer matrix method with spatial windowing, the hybrid wave based-transfer matrix method, and the hybrid finite element-statistical energy analysis method. These comparisons confirm the prediction accuracy of the proposed method and the computational efficiency against the conventional hybrid finite element-statistical energy analysis method.

  19. Variable Screening for Cluster Analysis.

    ERIC Educational Resources Information Center

    Donoghue, John R.

    Inclusion of irrelevant variables in a cluster analysis adversely affects subgroup recovery. This paper examines using moment-based statistics to screen variables; only variables that pass the screening are then used in clustering. Normal mixtures are analytically shown often to possess negative kurtosis. Two related measures, "m" and…

  20. Alignment-free genetic sequence comparisons: a review of recent approaches by word analysis.

    PubMed

    Bonham-Carter, Oliver; Steele, Joe; Bastola, Dhundy

    2014-11-01

    Modern sequencing and genome assembly technologies have provided a wealth of data, which will soon require an analysis by comparison for discovery. Sequence alignment, a fundamental task in bioinformatics research, may be used but with some caveats. Seminal techniques and methods from dynamic programming are proving ineffective for this work owing to their inherent computational expense when processing large amounts of sequence data. These methods are prone to giving misleading information because of genetic recombination, genetic shuffling and other inherent biological events. New approaches from information theory, frequency analysis and data compression are available and provide powerful alternatives to dynamic programming. These new methods are often preferred, as their algorithms are simpler and are not affected by synteny-related problems. In this review, we provide a detailed discussion of computational tools, which stem from alignment-free methods based on statistical analysis from word frequencies. We provide several clear examples to demonstrate applications and the interpretations over several different areas of alignment-free analysis such as base-base correlations, feature frequency profiles, compositional vectors, an improved string composition and the D2 statistic metric. Additionally, we provide detailed discussion and an example of analysis by Lempel-Ziv techniques from data compression. © The Author 2013. Published by Oxford University Press. For Permissions, please email: journals.permissions@oup.com.

  1. Performance of an Axisymmetric Rocket Based Combined Cycle Engine During Rocket Only Operation Using Linear Regression Analysis

    NASA Technical Reports Server (NTRS)

    Smith, Timothy D.; Steffen, Christopher J., Jr.; Yungster, Shaye; Keller, Dennis J.

    1998-01-01

    The all rocket mode of operation is shown to be a critical factor in the overall performance of a rocket based combined cycle (RBCC) vehicle. An axisymmetric RBCC engine was used to determine specific impulse efficiency values based upon both full flow and gas generator configurations. Design of experiments methodology was used to construct a test matrix and multiple linear regression analysis was used to build parametric models. The main parameters investigated in this study were: rocket chamber pressure, rocket exit area ratio, injected secondary flow, mixer-ejector inlet area, mixer-ejector area ratio, and mixer-ejector length-to-inlet diameter ratio. A perfect gas computational fluid dynamics analysis, using both the Spalart-Allmaras and k-omega turbulence models, was performed with the NPARC code to obtain values of vacuum specific impulse. Results from the multiple linear regression analysis showed that for both the full flow and gas generator configurations increasing mixer-ejector area ratio and rocket area ratio increase performance, while increasing mixer-ejector inlet area ratio and mixer-ejector length-to-diameter ratio decrease performance. Increasing injected secondary flow increased performance for the gas generator analysis, but was not statistically significant for the full flow analysis. Chamber pressure was found to be not statistically significant.

  2. Gauging Skills of Hospital Security Personnel: a Statistically-driven, Questionnaire-based Approach.

    PubMed

    Rinkoo, Arvind Vashishta; Mishra, Shubhra; Rahesuddin; Nabi, Tauqeer; Chandra, Vidha; Chandra, Hem

    2013-01-01

    This study aims to gauge the technical and soft skills of the hospital security personnel so as to enable prioritization of their training needs. A cross sectional questionnaire based study was conducted in December 2011. Two separate predesigned and pretested questionnaires were used for gauging soft skills and technical skills of the security personnel. Extensive statistical analysis, including Multivariate Analysis (Pillai-Bartlett trace along with Multi-factorial ANOVA) and Post-hoc Tests (Bonferroni Test) was applied. The 143 participants performed better on the soft skills front with an average score of 6.43 and standard deviation of 1.40. The average technical skills score was 5.09 with a standard deviation of 1.44. The study avowed a need for formal hands on training with greater emphasis on technical skills. Multivariate analysis of the available data further helped in identifying 20 security personnel who should be prioritized for soft skills training and a group of 36 security personnel who should receive maximum attention during technical skills training. This statistically driven approach can be used as a prototype by healthcare delivery institutions worldwide, after situation specific customizations, to identify the training needs of any category of healthcare staff.

  3. Gauging Skills of Hospital Security Personnel: a Statistically-driven, Questionnaire-based Approach

    PubMed Central

    Rinkoo, Arvind Vashishta; Mishra, Shubhra; Rahesuddin; Nabi, Tauqeer; Chandra, Vidha; Chandra, Hem

    2013-01-01

    Objectives This study aims to gauge the technical and soft skills of the hospital security personnel so as to enable prioritization of their training needs. Methodology A cross sectional questionnaire based study was conducted in December 2011. Two separate predesigned and pretested questionnaires were used for gauging soft skills and technical skills of the security personnel. Extensive statistical analysis, including Multivariate Analysis (Pillai-Bartlett trace along with Multi-factorial ANOVA) and Post-hoc Tests (Bonferroni Test) was applied. Results The 143 participants performed better on the soft skills front with an average score of 6.43 and standard deviation of 1.40. The average technical skills score was 5.09 with a standard deviation of 1.44. The study avowed a need for formal hands on training with greater emphasis on technical skills. Multivariate analysis of the available data further helped in identifying 20 security personnel who should be prioritized for soft skills training and a group of 36 security personnel who should receive maximum attention during technical skills training. Conclusion This statistically driven approach can be used as a prototype by healthcare delivery institutions worldwide, after situation specific customizations, to identify the training needs of any category of healthcare staff. PMID:23559904

  4. Compositional data analysis for physical activity, sedentary time and sleep research.

    PubMed

    Dumuid, Dorothea; Stanford, Tyman E; Martin-Fernández, Josep-Antoni; Pedišić, Željko; Maher, Carol A; Lewis, Lucy K; Hron, Karel; Katzmarzyk, Peter T; Chaput, Jean-Philippe; Fogelholm, Mikael; Hu, Gang; Lambert, Estelle V; Maia, José; Sarmiento, Olga L; Standage, Martyn; Barreira, Tiago V; Broyles, Stephanie T; Tudor-Locke, Catrine; Tremblay, Mark S; Olds, Timothy

    2017-01-01

    The health effects of daily activity behaviours (physical activity, sedentary time and sleep) are widely studied. While previous research has largely examined activity behaviours in isolation, recent studies have adjusted for multiple behaviours. However, the inclusion of all activity behaviours in traditional multivariate analyses has not been possible due to the perfect multicollinearity of 24-h time budget data. The ensuing lack of adjustment for known effects on the outcome undermines the validity of study findings. We describe a statistical approach that enables the inclusion of all daily activity behaviours, based on the principles of compositional data analysis. Using data from the International Study of Childhood Obesity, Lifestyle and the Environment, we demonstrate the application of compositional multiple linear regression to estimate adiposity from children's daily activity behaviours expressed as isometric log-ratio coordinates. We present a novel method for predicting change in a continuous outcome based on relative changes within a composition, and for calculating associated confidence intervals to allow for statistical inference. The compositional data analysis presented overcomes the lack of adjustment that has plagued traditional statistical methods in the field, and provides robust and reliable insights into the health effects of daily activity behaviours.

  5. Statistical framework for detection of genetically modified organisms based on Next Generation Sequencing.

    PubMed

    Willems, Sander; Fraiture, Marie-Alice; Deforce, Dieter; De Keersmaecker, Sigrid C J; De Loose, Marc; Ruttink, Tom; Herman, Philippe; Van Nieuwerburgh, Filip; Roosens, Nancy

    2016-02-01

    Because the number and diversity of genetically modified (GM) crops has significantly increased, their analysis based on real-time PCR (qPCR) methods is becoming increasingly complex and laborious. While several pioneers already investigated Next Generation Sequencing (NGS) as an alternative to qPCR, its practical use has not been assessed for routine analysis. In this study a statistical framework was developed to predict the number of NGS reads needed to detect transgene sequences, to prove their integration into the host genome and to identify the specific transgene event in a sample with known composition. This framework was validated by applying it to experimental data from food matrices composed of pure GM rice, processed GM rice (noodles) or a 10% GM/non-GM rice mixture, revealing some influential factors. Finally, feasibility of NGS for routine analysis of GM crops was investigated by applying the framework to samples commonly encountered in routine analysis of GM crops. Copyright © 2015 The Authors. Published by Elsevier Ltd.. All rights reserved.

  6. Rare-Variant Association Analysis: Study Designs and Statistical Tests

    PubMed Central

    Lee, Seunggeung; Abecasis, Gonçalo R.; Boehnke, Michael; Lin, Xihong

    2014-01-01

    Despite the extensive discovery of trait- and disease-associated common variants, much of the genetic contribution to complex traits remains unexplained. Rare variants can explain additional disease risk or trait variability. An increasing number of studies are underway to identify trait- and disease-associated rare variants. In this review, we provide an overview of statistical issues in rare-variant association studies with a focus on study designs and statistical tests. We present the design and analysis pipeline of rare-variant studies and review cost-effective sequencing designs and genotyping platforms. We compare various gene- or region-based association tests, including burden tests, variance-component tests, and combined omnibus tests, in terms of their assumptions and performance. Also discussed are the related topics of meta-analysis, population-stratification adjustment, genotype imputation, follow-up studies, and heritability due to rare variants. We provide guidelines for analysis and discuss some of the challenges inherent in these studies and future research directions. PMID:24995866

  7. Research Update: Spatially resolved mapping of electronic structure on atomic level by multivariate statistical analysis

    DOE PAGES

    Belianinov, Alex; Panchapakesan, G.; Lin, Wenzhi; ...

    2014-12-02

    Atomic level spatial variability of electronic structure in Fe-based superconductor FeTe0.55Se0.45 (Tc = 15 K) is explored using current-imaging tunneling-spectroscopy. Multivariate statistical analysis of the data differentiates regions of dissimilar electronic behavior that can be identified with the segregation of chalcogen atoms, as well as boundaries between terminations and near neighbor interactions. Subsequent clustering analysis allows identification of the spatial localization of these dissimilar regions. Similar statistical analysis of modeled calculated density of states of chemically inhomogeneous FeTe1 x Sex structures further confirms that the two types of chalcogens, i.e., Te and Se, can be identified by their electronic signaturemore » and differentiated by their local chemical environment. This approach allows detailed chemical discrimination of the scanning tunneling microscopy data including separation of atomic identities, proximity, and local configuration effects and can be universally applicable to chemically and electronically inhomogeneous surfaces.« less

  8. Research Update: Spatially resolved mapping of electronic structure on atomic level by multivariate statistical analysis

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Belianinov, Alex, E-mail: belianinova@ornl.gov; Ganesh, Panchapakesan; Lin, Wenzhi

    2014-12-01

    Atomic level spatial variability of electronic structure in Fe-based superconductor FeTe{sub 0.55}Se{sub 0.45} (T{sub c} = 15 K) is explored using current-imaging tunneling-spectroscopy. Multivariate statistical analysis of the data differentiates regions of dissimilar electronic behavior that can be identified with the segregation of chalcogen atoms, as well as boundaries between terminations and near neighbor interactions. Subsequent clustering analysis allows identification of the spatial localization of these dissimilar regions. Similar statistical analysis of modeled calculated density of states of chemically inhomogeneous FeTe{sub 1−x}Se{sub x} structures further confirms that the two types of chalcogens, i.e., Te and Se, can be identified bymore » their electronic signature and differentiated by their local chemical environment. This approach allows detailed chemical discrimination of the scanning tunneling microscopy data including separation of atomic identities, proximity, and local configuration effects and can be universally applicable to chemically and electronically inhomogeneous surfaces.« less

  9. Significance testing of rules in rule-based models of human problem solving

    NASA Technical Reports Server (NTRS)

    Lewis, C. M.; Hammer, J. M.

    1986-01-01

    Rule-based models of human problem solving have typically not been tested for statistical significance. Three methods of testing rules - analysis of variance, randomization, and contingency tables - are presented. Advantages and disadvantages of the methods are also described.

  10. An evidence-based systematic review of gymnema (Gymnema sylvestre R. Br.) by the Natural Standard Research Collaboration.

    PubMed

    Ulbricht, Catherine; Abrams, Tracee Rae; Basch, Ethan; Davies-Heerema, Theresa; Foppa, Ivo; Hammerness, Paul; Rusie, Erica; Tanguay-Colucci, Shaina; Taylor, Sarah; Ulbricht, Catherine; Varghese, Minney; Weissner, Wendy; Woods, Jen

    2011-09-01

    An evidence-based systematic review of gymnema (Gymnema sylvestre R. Br.), including written and statistical analysis of scientific literature, expert opinion, folkloric precedent, history, pharmacology, kinetics/dynamics, interactions, adverse effects, toxicology, and dosing.

  11. The statistics of identifying differentially expressed genes in Expresso and TM4: a comparison

    PubMed Central

    Sioson, Allan A; Mane, Shrinivasrao P; Li, Pinghua; Sha, Wei; Heath, Lenwood S; Bohnert, Hans J; Grene, Ruth

    2006-01-01

    Background Analysis of DNA microarray data takes as input spot intensity measurements from scanner software and returns differential expression of genes between two conditions, together with a statistical significance assessment. This process typically consists of two steps: data normalization and identification of differentially expressed genes through statistical analysis. The Expresso microarray experiment management system implements these steps with a two-stage, log-linear ANOVA mixed model technique, tailored to individual experimental designs. The complement of tools in TM4, on the other hand, is based on a number of preset design choices that limit its flexibility. In the TM4 microarray analysis suite, normalization, filter, and analysis methods form an analysis pipeline. TM4 computes integrated intensity values (IIV) from the average intensities and spot pixel counts returned by the scanner software as input to its normalization steps. By contrast, Expresso can use either IIV data or median intensity values (MIV). Here, we compare Expresso and TM4 analysis of two experiments and assess the results against qRT-PCR data. Results The Expresso analysis using MIV data consistently identifies more genes as differentially expressed, when compared to Expresso analysis with IIV data. The typical TM4 normalization and filtering pipeline corrects systematic intensity-specific bias on a per microarray basis. Subsequent statistical analysis with Expresso or a TM4 t-test can effectively identify differentially expressed genes. The best agreement with qRT-PCR data is obtained through the use of Expresso analysis and MIV data. Conclusion The results of this research are of practical value to biologists who analyze microarray data sets. The TM4 normalization and filtering pipeline corrects microarray-specific systematic bias and complements the normalization stage in Expresso analysis. The results of Expresso using MIV data have the best agreement with qRT-PCR results. In one experiment, MIV is a better choice than IIV as input to data normalization and statistical analysis methods, as it yields as greater number of statistically significant differentially expressed genes; TM4 does not support the choice of MIV input data. Overall, the more flexible and extensive statistical models of Expresso achieve more accurate analytical results, when judged by the yardstick of qRT-PCR data, in the context of an experimental design of modest complexity. PMID:16626497

  12. Experimental analysis of computer system dependability

    NASA Technical Reports Server (NTRS)

    Iyer, Ravishankar, K.; Tang, Dong

    1993-01-01

    This paper reviews an area which has evolved over the past 15 years: experimental analysis of computer system dependability. Methodologies and advances are discussed for three basic approaches used in the area: simulated fault injection, physical fault injection, and measurement-based analysis. The three approaches are suited, respectively, to dependability evaluation in the three phases of a system's life: design phase, prototype phase, and operational phase. Before the discussion of these phases, several statistical techniques used in the area are introduced. For each phase, a classification of research methods or study topics is outlined, followed by discussion of these methods or topics as well as representative studies. The statistical techniques introduced include the estimation of parameters and confidence intervals, probability distribution characterization, and several multivariate analysis methods. Importance sampling, a statistical technique used to accelerate Monte Carlo simulation, is also introduced. The discussion of simulated fault injection covers electrical-level, logic-level, and function-level fault injection methods as well as representative simulation environments such as FOCUS and DEPEND. The discussion of physical fault injection covers hardware, software, and radiation fault injection methods as well as several software and hybrid tools including FIAT, FERARI, HYBRID, and FINE. The discussion of measurement-based analysis covers measurement and data processing techniques, basic error characterization, dependency analysis, Markov reward modeling, software-dependability, and fault diagnosis. The discussion involves several important issues studies in the area, including fault models, fast simulation techniques, workload/failure dependency, correlated failures, and software fault tolerance.

  13. Cognitive and Socio-Affective Outcomes of Project-Based Learning: Perceptions of Greek Second Chance School Students

    ERIC Educational Resources Information Center

    Koutrouba, Konstantina; Karageorgou, Elissavet

    2013-01-01

    The present questionnaire-based study was conducted in 2010 in order to examine 677 Greek Second Chance School (SCS) students' perceptions about the cognitive and socio-affective outcomes of project-based learning. Data elaboration, statistical and factor analysis showed that the participants found that project-based learning offered a second…

  14. Imperial College near infrared spectroscopy neuroimaging analysis framework.

    PubMed

    Orihuela-Espina, Felipe; Leff, Daniel R; James, David R C; Darzi, Ara W; Yang, Guang-Zhong

    2018-01-01

    This paper describes the Imperial College near infrared spectroscopy neuroimaging analysis (ICNNA) software tool for functional near infrared spectroscopy neuroimaging data. ICNNA is a MATLAB-based object-oriented framework encompassing an application programming interface and a graphical user interface. ICNNA incorporates reconstruction based on the modified Beer-Lambert law and basic processing and data validation capabilities. Emphasis is placed on the full experiment rather than individual neuroimages as the central element of analysis. The software offers three types of analyses including classical statistical methods based on comparison of changes in relative concentrations of hemoglobin between the task and baseline periods, graph theory-based metrics of connectivity and, distinctively, an analysis approach based on manifold embedding. This paper presents the different capabilities of ICNNA in its current version.

  15. Source apportionment of groundwater pollution around landfill site in Nagpur, India.

    PubMed

    Pujari, Paras R; Deshpande, Vijaya

    2005-12-01

    The present work attempts statistical analysis of groundwater quality near a Landfill site in Nagpur, India. The objective of the present work is to figure out the impact of different factors on the quality of groundwater in the study area. Statistical analysis of the data has been attempted by applying Factor Analysis concept. The analysis brings out the effect of five different factors governing the groundwater quality in the study area. Based on the contribution of the different parameters present in the extracted factors, the latter are linked to the geological setting, the leaching from the host rock, leachate of heavy metals from the landfill as well as the bacterial contamination from landfill site and other anthropogenic activities. The analysis brings out the vulnerability of the unconfined aquifer to contamination.

  16. Statistical analysis of target acquisition sensor modeling experiments

    NASA Astrophysics Data System (ADS)

    Deaver, Dawne M.; Moyer, Steve

    2015-05-01

    The U.S. Army RDECOM CERDEC NVESD Modeling and Simulation Division is charged with the development and advancement of military target acquisition models to estimate expected soldier performance when using all types of imaging sensors. Two elements of sensor modeling are (1) laboratory-based psychophysical experiments used to measure task performance and calibrate the various models and (2) field-based experiments used to verify the model estimates for specific sensors. In both types of experiments, it is common practice to control or measure environmental, sensor, and target physical parameters in order to minimize uncertainty of the physics based modeling. Predicting the minimum number of test subjects required to calibrate or validate the model should be, but is not always, done during test planning. The objective of this analysis is to develop guidelines for test planners which recommend the number and types of test samples required to yield a statistically significant result.

  17. Literature review of some selected types of results and statistical analyses of total-ozone data. [for the ozonosphere

    NASA Technical Reports Server (NTRS)

    Myers, R. H.

    1976-01-01

    The depletion of ozone in the stratosphere is examined, and causes for the depletion are cited. Ground station and satellite measurements of ozone, which are taken on a worldwide basis, are discussed. Instruments used in ozone measurement are discussed, such as the Dobson spectrophotometer, which is credited with providing the longest and most extensive series of observations for ground based observation of stratospheric ozone. Other ground based instruments used to measure ozone are also discussed. The statistical differences of ground based measurements of ozone from these different instruments are compared to each other, and to satellite measurements. Mathematical methods (i.e., trend analysis or linear regression analysis) of analyzing the variability of ozone concentration with respect to time and lattitude are described. Various time series models which can be employed in accounting for ozone concentration variability are examined.

  18. Specialized data analysis of SSME and advanced propulsion system vibration measurements

    NASA Technical Reports Server (NTRS)

    Coffin, Thomas; Swanson, Wayne L.; Jong, Yen-Yi

    1993-01-01

    The basic objectives of this contract were to perform detailed analysis and evaluation of dynamic data obtained during Space Shuttle Main Engine (SSME) test and flight operations, including analytical/statistical assessment of component dynamic performance, and to continue the development and implementation of analytical/statistical models to effectively define nominal component dynamic characteristics, detect anomalous behavior, and assess machinery operational conditions. This study was to provide timely assessment of engine component operational status, identify probable causes of malfunction, and define feasible engineering solutions. The work was performed under three broad tasks: (1) Analysis, Evaluation, and Documentation of SSME Dynamic Test Results; (2) Data Base and Analytical Model Development and Application; and (3) Development and Application of Vibration Signature Analysis Techniques.

  19. Cross-correlation detection and analysis for California's electricity market based on analogous multifractal analysis

    NASA Astrophysics Data System (ADS)

    Wang, Fang; Liao, Gui-ping; Li, Jian-hui; Zou, Rui-biao; Shi, Wen

    2013-03-01

    A novel method, which we called the analogous multifractal cross-correlation analysis, is proposed in this paper to study the multifractal behavior in the power-law cross-correlation between price and load in California electricity market. In addition, a statistic ρAMF -XA, which we call the analogous multifractal cross-correlation coefficient, is defined to test whether the cross-correlation between two given signals is genuine or not. Our analysis finds that both the price and load time series in California electricity market express multifractal nature. While, as indicated by the ρAMF -XA statistical test, there is a huge difference in the cross-correlation behavior between the years 1999 and 2000 in California electricity markets.

  20. Cross-correlation detection and analysis for California's electricity market based on analogous multifractal analysis.

    PubMed

    Wang, Fang; Liao, Gui-ping; Li, Jian-hui; Zou, Rui-biao; Shi, Wen

    2013-03-01

    A novel method, which we called the analogous multifractal cross-correlation analysis, is proposed in this paper to study the multifractal behavior in the power-law cross-correlation between price and load in California electricity market. In addition, a statistic ρAMF-XA, which we call the analogous multifractal cross-correlation coefficient, is defined to test whether the cross-correlation between two given signals is genuine or not. Our analysis finds that both the price and load time series in California electricity market express multifractal nature. While, as indicated by the ρAMF-XA statistical test, there is a huge difference in the cross-correlation behavior between the years 1999 and 2000 in California electricity markets.

  1. Capturing rogue waves by multi-point statistics

    NASA Astrophysics Data System (ADS)

    Hadjihosseini, A.; Wächter, Matthias; Hoffmann, N. P.; Peinke, J.

    2016-01-01

    As an example of a complex system with extreme events, we investigate ocean wave states exhibiting rogue waves. We present a statistical method of data analysis based on multi-point statistics which for the first time allows the grasping of extreme rogue wave events in a highly satisfactory statistical manner. The key to the success of the approach is mapping the complexity of multi-point data onto the statistics of hierarchically ordered height increments for different time scales, for which we can show that a stochastic cascade process with Markov properties is governed by a Fokker-Planck equation. Conditional probabilities as well as the Fokker-Planck equation itself can be estimated directly from the available observational data. With this stochastic description surrogate data sets can in turn be generated, which makes it possible to work out arbitrary statistical features of the complex sea state in general, and extreme rogue wave events in particular. The results also open up new perspectives for forecasting the occurrence probability of extreme rogue wave events, and even for forecasting the occurrence of individual rogue waves based on precursory dynamics.

  2. Knowledge level of effect size statistics, confidence intervals and meta-analysis in Spanish academic psychologists.

    PubMed

    Badenes-Ribera, Laura; Frias-Navarro, Dolores; Pascual-Soler, Marcos; Monterde-I-Bort, Héctor

    2016-11-01

    The statistical reform movement and the American Psychological Association (APA) defend the use of estimators of the effect size and its confidence intervals, as well as the interpretation of the clinical significance of the findings. A survey was conducted in which academic psychologists were asked about their behavior in designing and carrying out their studies. The sample was composed of 472 participants (45.8% men). The mean number of years as a university professor was 13.56 years (SD= 9.27). The use of effect-size estimators is becoming generalized, as well as the consideration of meta-analytic studies. However, several inadequate practices still persist. A traditional model of methodological behavior based on statistical significance tests is maintained, based on the predominance of Cohen’s d and the unadjusted R2/η2, which are not immune to outliers or departure from normality and the violations of statistical assumptions, and the under-reporting of confidence intervals of effect-size statistics. The paper concludes with recommendations for improving statistical practice.

  3. White Matter Fiber-based Analysis of T1w/T2w Ratio Map.

    PubMed

    Chen, Haiwei; Budin, Francois; Noel, Jean; Prieto, Juan Carlos; Gilmore, John; Rasmussen, Jerod; Wadhwa, Pathik D; Entringer, Sonja; Buss, Claudia; Styner, Martin

    2017-02-01

    To develop, test, evaluate and apply a novel tool for the white matter fiber-based analysis of T1w/T2w ratio maps quantifying myelin content. The cerebral white matter in the human brain develops from a mostly non-myelinated state to a nearly fully mature white matter myelination within the first few years of life. High resolution T1w/T2w ratio maps are believed to be effective in quantitatively estimating myelin content on a voxel-wise basis. We propose the use of a fiber-tract-based analysis of such T1w/T2w ratio data, as it allows us to separate fiber bundles that a common regional analysis imprecisely groups together, and to associate effects to specific tracts rather than large, broad regions. We developed an intuitive, open source tool to facilitate such fiber-based studies of T1w/T2w ratio maps. Via its Graphical User Interface (GUI) the tool is accessible to non-technical users. The framework uses calibrated T1w/T2w ratio maps and a prior fiber atlas as an input to generate profiles of T1w/T2w values. The resulting fiber profiles are used in a statistical analysis that performs along-tract functional statistical analysis. We applied this approach to a preliminary study of early brain development in neonates. We developed an open-source tool for the fiber based analysis of T1w/T2w ratio maps and tested it in a study of brain development.

  4. White matter fiber-based analysis of T1w/T2w ratio map

    NASA Astrophysics Data System (ADS)

    Chen, Haiwei; Budin, Francois; Noel, Jean; Prieto, Juan Carlos; Gilmore, John; Rasmussen, Jerod; Wadhwa, Pathik D.; Entringer, Sonja; Buss, Claudia; Styner, Martin

    2017-02-01

    Purpose: To develop, test, evaluate and apply a novel tool for the white matter fiber-based analysis of T1w/T2w ratio maps quantifying myelin content. Background: The cerebral white matter in the human brain develops from a mostly non-myelinated state to a nearly fully mature white matter myelination within the first few years of life. High resolution T1w/T2w ratio maps are believed to be effective in quantitatively estimating myelin content on a voxel-wise basis. We propose the use of a fiber-tract-based analysis of such T1w/T2w ratio data, as it allows us to separate fiber bundles that a common regional analysis imprecisely groups together, and to associate effects to specific tracts rather than large, broad regions. Methods: We developed an intuitive, open source tool to facilitate such fiber-based studies of T1w/T2w ratio maps. Via its Graphical User Interface (GUI) the tool is accessible to non-technical users. The framework uses calibrated T1w/T2w ratio maps and a prior fiber atlas as an input to generate profiles of T1w/T2w values. The resulting fiber profiles are used in a statistical analysis that performs along-tract functional statistical analysis. We applied this approach to a preliminary study of early brain development in neonates. Results: We developed an open-source tool for the fiber based analysis of T1w/T2w ratio maps and tested it in a study of brain development.

  5. Trends in study design and the statistical methods employed in a leading general medicine journal.

    PubMed

    Gosho, M; Sato, Y; Nagashima, K; Takahashi, S

    2018-02-01

    Study design and statistical methods have become core components of medical research, and the methodology has become more multifaceted and complicated over time. The study of the comprehensive details and current trends of study design and statistical methods is required to support the future implementation of well-planned clinical studies providing information about evidence-based medicine. Our purpose was to illustrate study design and statistical methods employed in recent medical literature. This was an extension study of Sato et al. (N Engl J Med 2017; 376: 1086-1087), which reviewed 238 articles published in 2015 in the New England Journal of Medicine (NEJM) and briefly summarized the statistical methods employed in NEJM. Using the same database, we performed a new investigation of the detailed trends in study design and individual statistical methods that were not reported in the Sato study. Due to the CONSORT statement, prespecification and justification of sample size are obligatory in planning intervention studies. Although standard survival methods (eg Kaplan-Meier estimator and Cox regression model) were most frequently applied, the Gray test and Fine-Gray proportional hazard model for considering competing risks were sometimes used for a more valid statistical inference. With respect to handling missing data, model-based methods, which are valid for missing-at-random data, were more frequently used than single imputation methods. These methods are not recommended as a primary analysis, but they have been applied in many clinical trials. Group sequential design with interim analyses was one of the standard designs, and novel design, such as adaptive dose selection and sample size re-estimation, was sometimes employed in NEJM. Model-based approaches for handling missing data should replace single imputation methods for primary analysis in the light of the information found in some publications. Use of adaptive design with interim analyses is increasing after the presentation of the FDA guidance for adaptive design. © 2017 John Wiley & Sons Ltd.

  6. Enhanced secondary analysis of survival data: reconstructing the data from published Kaplan-Meier survival curves.

    PubMed

    Guyot, Patricia; Ades, A E; Ouwens, Mario J N M; Welton, Nicky J

    2012-02-01

    The results of Randomized Controlled Trials (RCTs) on time-to-event outcomes that are usually reported are median time to events and Cox Hazard Ratio. These do not constitute the sufficient statistics required for meta-analysis or cost-effectiveness analysis, and their use in secondary analyses requires strong assumptions that may not have been adequately tested. In order to enhance the quality of secondary data analyses, we propose a method which derives from the published Kaplan Meier survival curves a close approximation to the original individual patient time-to-event data from which they were generated. We develop an algorithm that maps from digitised curves back to KM data by finding numerical solutions to the inverted KM equations, using where available information on number of events and numbers at risk. The reproducibility and accuracy of survival probabilities, median survival times and hazard ratios based on reconstructed KM data was assessed by comparing published statistics (survival probabilities, medians and hazard ratios) with statistics based on repeated reconstructions by multiple observers. The validation exercise established there was no material systematic error and that there was a high degree of reproducibility for all statistics. Accuracy was excellent for survival probabilities and medians, for hazard ratios reasonable accuracy can only be obtained if at least numbers at risk or total number of events are reported. The algorithm is a reliable tool for meta-analysis and cost-effectiveness analyses of RCTs reporting time-to-event data. It is recommended that all RCTs should report information on numbers at risk and total number of events alongside KM curves.

  7. Challenges in Biomarker Discovery: Combining Expert Insights with Statistical Analysis of Complex Omics Data

    PubMed Central

    McDermott, Jason E.; Wang, Jing; Mitchell, Hugh; Webb-Robertson, Bobbie-Jo; Hafen, Ryan; Ramey, John; Rodland, Karin D.

    2012-01-01

    Introduction The advent of high throughput technologies capable of comprehensive analysis of genes, transcripts, proteins and other significant biological molecules has provided an unprecedented opportunity for the identification of molecular markers of disease processes. However, it has simultaneously complicated the problem of extracting meaningful molecular signatures of biological processes from these complex datasets. The process of biomarker discovery and characterization provides opportunities for more sophisticated approaches to integrating purely statistical and expert knowledge-based approaches. Areas covered In this review we will present examples of current practices for biomarker discovery from complex omic datasets and the challenges that have been encountered in deriving valid and useful signatures of disease. We will then present a high-level review of data-driven (statistical) and knowledge-based methods applied to biomarker discovery, highlighting some current efforts to combine the two distinct approaches. Expert opinion Effective, reproducible and objective tools for combining data-driven and knowledge-based approaches to identify predictive signatures of disease are key to future success in the biomarker field. We will describe our recommendations for possible approaches to this problem including metrics for the evaluation of biomarkers. PMID:23335946

  8. Challenges in Biomarker Discovery: Combining Expert Insights with Statistical Analysis of Complex Omics Data

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    McDermott, Jason E.; Wang, Jing; Mitchell, Hugh D.

    2013-01-01

    The advent of high throughput technologies capable of comprehensive analysis of genes, transcripts, proteins and other significant biological molecules has provided an unprecedented opportunity for the identification of molecular markers of disease processes. However, it has simultaneously complicated the problem of extracting meaningful signatures of biological processes from these complex datasets. The process of biomarker discovery and characterization provides opportunities both for purely statistical and expert knowledge-based approaches and would benefit from improved integration of the two. Areas covered In this review we will present examples of current practices for biomarker discovery from complex omic datasets and the challenges thatmore » have been encountered. We will then present a high-level review of data-driven (statistical) and knowledge-based methods applied to biomarker discovery, highlighting some current efforts to combine the two distinct approaches. Expert opinion Effective, reproducible and objective tools for combining data-driven and knowledge-based approaches to biomarker discovery and characterization are key to future success in the biomarker field. We will describe our recommendations of possible approaches to this problem including metrics for the evaluation of biomarkers.« less

  9. EBprot: Statistical analysis of labeling-based quantitative proteomics data.

    PubMed

    Koh, Hiromi W L; Swa, Hannah L F; Fermin, Damian; Ler, Siok Ghee; Gunaratne, Jayantha; Choi, Hyungwon

    2015-08-01

    Labeling-based proteomics is a powerful method for detection of differentially expressed proteins (DEPs). The current data analysis platform typically relies on protein-level ratios, which is obtained by summarizing peptide-level ratios for each protein. In shotgun proteomics, however, some proteins are quantified with more peptides than others, and this reproducibility information is not incorporated into the differential expression (DE) analysis. Here, we propose a novel probabilistic framework EBprot that directly models the peptide-protein hierarchy and rewards the proteins with reproducible evidence of DE over multiple peptides. To evaluate its performance with known DE states, we conducted a simulation study to show that the peptide-level analysis of EBprot provides better receiver-operating characteristic and more accurate estimation of the false discovery rates than the methods based on protein-level ratios. We also demonstrate superior classification performance of peptide-level EBprot analysis in a spike-in dataset. To illustrate the wide applicability of EBprot in different experimental designs, we applied EBprot to a dataset for lung cancer subtype analysis with biological replicates and another dataset for time course phosphoproteome analysis of EGF-stimulated HeLa cells with multiplexed labeling. Through these examples, we show that the peptide-level analysis of EBprot is a robust alternative to the existing statistical methods for the DE analysis of labeling-based quantitative datasets. The software suite is freely available on the Sourceforge website http://ebprot.sourceforge.net/. All MS data have been deposited in the ProteomeXchange with identifier PXD001426 (http://proteomecentral.proteomexchange.org/dataset/PXD001426/). © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  10. Empirical performance of interpolation techniques in risk-neutral density (RND) estimation

    NASA Astrophysics Data System (ADS)

    Bahaludin, H.; Abdullah, M. H.

    2017-03-01

    The objective of this study is to evaluate the empirical performance of interpolation techniques in risk-neutral density (RND) estimation. Firstly, the empirical performance is evaluated by using statistical analysis based on the implied mean and the implied variance of RND. Secondly, the interpolation performance is measured based on pricing error. We propose using the leave-one-out cross-validation (LOOCV) pricing error for interpolation selection purposes. The statistical analyses indicate that there are statistical differences between the interpolation techniques:second-order polynomial, fourth-order polynomial and smoothing spline. The results of LOOCV pricing error shows that interpolation by using fourth-order polynomial provides the best fitting to option prices in which it has the lowest value error.

  11. A Case-Based Curriculum for Introductory Geology

    ERIC Educational Resources Information Center

    Goldsmith, David W.

    2011-01-01

    For the past 5 years I have been teaching my introductory geology class using a case-based method that promotes student engagement and inquiry. This article presents an explanation of how a case-based curriculum differs from a more traditional approach to the material. It also presents a statistical analysis of several years' worth of student…

  12. A discrimlnant function approach to ecological site classification in northern New England

    Treesearch

    James M. Fincher; Marie-Louise Smith

    1994-01-01

    Describes one approach to ecologically based classification of upland forest community types of the White and Green Mountain physiographic regions. The classification approach is based on an intensive statistical analysis of the relationship between the communities and soil-site factors. Discriminant functions useful in distinguishing between types based on soil-site...

  13. StatisticAl Characteristics of Cloud over Beijing, China Obtained FRom Ka band Doppler Radar Observation

    NASA Astrophysics Data System (ADS)

    LIU, J.; Bi, Y.; Duan, S.; Lu, D.

    2017-12-01

    It is well-known that cloud characteristics, such as top and base heights and their layering structure of micro-physical parameters, spatial coverage and temporal duration are very important factors influencing both radiation budget and its vertical partitioning as well as hydrological cycle through precipitation data. Also, cloud structure and their statistical distribution and typical values will have respective characteristics with geographical and seasonal variation. Ka band radar is a powerful tool to obtain above parameters around the world, such as ARM cloud radar at the Oklahoma US, Since 2006, Cloudsat is one of NASA's A-Train satellite constellation, continuously observe the cloud structure with global coverage, but only twice a day it monitor clouds over same local site at same local time.By using IAP Ka band Doppler radar which has been operating continuously since early 2013 over the roof of IAP building in Beijing, we obtained the statistical characteristic of clouds, including cloud layering, cloud top and base heights, as well as the thickness of each cloud layer and their distribution, and were analyzed monthly and seasonal and diurnal variation, statistical analysis of cloud reflectivity profiles is also made. The analysis covers both non-precipitating clouds and precipitating clouds. Also, some preliminary comparison of the results with Cloudsat/Calipso products for same period and same area are made.

  14. Characterization of Surface Water and Groundwater Quality in the Lower Tano River Basin Using Statistical and Isotopic Approach.

    NASA Astrophysics Data System (ADS)

    Edjah, Adwoba; Stenni, Barbara; Cozzi, Giulio; Turetta, Clara; Dreossi, Giuliano; Tetteh Akiti, Thomas; Yidana, Sandow

    2017-04-01

    Adwoba Kua- Manza Edjaha, Barbara Stennib,c,Giuliano Dreossib, Giulio Cozzic, Clara Turetta c,T.T Akitid ,Sandow Yidanae a,eDepartment of Earth Science, University of Ghana Legon, Ghana West Africa bDepartment of Enviromental Sciences, Informatics and Statistics, Ca Foscari University of Venice, Italy cInstitute for the Dynamics of Environmental Processes, CNR, Venice, Italy dDepartment of Nuclear Application and Techniques, Graduate School of Nuclear and Allied Sciences University of Ghana Legon This research is part of a PhD research work "Hydrogeological Assessment of the Lower Tano river basin for sustainable economic usage, Ghana, West - Africa". In this study, the researcher investigated surface water and groundwater quality in the Lower Tano river basin. This assessment was based on some selected sampling sites associated with mining activities, and the development of oil and gas. Statistical approach was applied to characterize the quality of surface water and groundwater. Also, water stable isotopes, which is a natural tracer of the hydrological cycle was used to investigate the origin of groundwater recharge in the basin. The study revealed that Pb and Ni values of the surface water and groundwater samples exceeded the WHO standards for drinking water. In addition, water quality index (WQI), based on physicochemical parameters(EC, TDS, pH) and major ions(Ca2+, Na+, Mg2+, HCO3-,NO3-, CL-, SO42-, K+) exhibited good quality water for 60% of the sampled surface water and groundwater. Other statistical techniques, such as Heavy metal pollution index (HPI), degree of contamination (Cd), and heavy metal evaluation index (HEI), based on trace element parameters in the water samples, reveal that 90% of the surface water and groundwater samples belong to high level of pollution. Principal component analysis (PCA) also suggests that the water quality in the basin is likely affected by rock - water interaction and anthropogenic activities (sea water intrusion). This was confirm by further statistical analysis (cluster analysis and correlation matrix) of the water quality parameters. Spatial distribution of water quality parameters, trace elements and the results obtained from the statistical analysis was determined by geographical information system (GIS). In addition, the isotopic analysis of the sampled surface water and groundwater revealed that most of the surface water and groundwater were of meteoric origin with little or no isotopic variations. It is expected that outcomes of this research will form a baseline for making appropriate decision on water quality management by decision makers in the Lower Tano river Basin. Keywords: Water stable isotopes, Trace elements, Multivariate statistics, Evaluation indices, Lower Tano river basin.

  15. Toward statistical modeling of saccadic eye-movement and visual saliency.

    PubMed

    Sun, Xiaoshuai; Yao, Hongxun; Ji, Rongrong; Liu, Xian-Ming

    2014-11-01

    In this paper, we present a unified statistical framework for modeling both saccadic eye movements and visual saliency. By analyzing the statistical properties of human eye fixations on natural images, we found that human attention is sparsely distributed and usually deployed to locations with abundant structural information. This observations inspired us to model saccadic behavior and visual saliency based on super-Gaussian component (SGC) analysis. Our model sequentially obtains SGC using projection pursuit, and generates eye movements by selecting the location with maximum SGC response. Besides human saccadic behavior simulation, we also demonstrated our superior effectiveness and robustness over state-of-the-arts by carrying out dense experiments on synthetic patterns and human eye fixation benchmarks. Multiple key issues in saliency modeling research, such as individual differences, the effects of scale and blur, are explored in this paper. Based on extensive qualitative and quantitative experimental results, we show promising potentials of statistical approaches for human behavior research.

  16. An analysis of I/O efficient order-statistic-based techniques for noise power estimation in the HRMS sky survey's operational system

    NASA Technical Reports Server (NTRS)

    Zimmerman, G. A.; Olsen, E. T.

    1992-01-01

    Noise power estimation in the High-Resolution Microwave Survey (HRMS) sky survey element is considered as an example of a constant false alarm rate (CFAR) signal detection problem. Order-statistic-based noise power estimators for CFAR detection are considered in terms of required estimator accuracy and estimator dynamic range. By limiting the dynamic range of the value to be estimated, the performance of an order-statistic estimator can be achieved by simpler techniques requiring only a single pass of the data. Simple threshold-and-count techniques are examined, and it is shown how several parallel threshold-and-count estimation devices can be used to expand the dynamic range to meet HRMS system requirements with minimal hardware complexity. An input/output (I/O) efficient limited-precision order-statistic estimator with wide but limited dynamic range is also examined.

  17. Bayesian models based on test statistics for multiple hypothesis testing problems.

    PubMed

    Ji, Yuan; Lu, Yiling; Mills, Gordon B

    2008-04-01

    We propose a Bayesian method for the problem of multiple hypothesis testing that is routinely encountered in bioinformatics research, such as the differential gene expression analysis. Our algorithm is based on modeling the distributions of test statistics under both null and alternative hypotheses. We substantially reduce the complexity of the process of defining posterior model probabilities by modeling the test statistics directly instead of modeling the full data. Computationally, we apply a Bayesian FDR approach to control the number of rejections of null hypotheses. To check if our model assumptions for the test statistics are valid for various bioinformatics experiments, we also propose a simple graphical model-assessment tool. Using extensive simulations, we demonstrate the performance of our models and the utility of the model-assessment tool. In the end, we apply the proposed methodology to an siRNA screening and a gene expression experiment.

  18. Quantitative investigation of inappropriate regression model construction and the importance of medical statistics experts in observational medical research: a cross-sectional study.

    PubMed

    Nojima, Masanori; Tokunaga, Mutsumi; Nagamura, Fumitaka

    2018-05-05

    To investigate under what circumstances inappropriate use of 'multivariate analysis' is likely to occur and to identify the population that needs more support with medical statistics. The frequency of inappropriate regression model construction in multivariate analysis and related factors were investigated in observational medical research publications. The inappropriate algorithm of using only variables that were significant in univariate analysis was estimated to occur at 6.4% (95% CI 4.8% to 8.5%). This was observed in 1.1% of the publications with a medical statistics expert (hereinafter 'expert') as the first author, 3.5% if an expert was included as coauthor and in 12.2% if experts were not involved. In the publications where the number of cases was 50 or less and the study did not include experts, inappropriate algorithm usage was observed with a high proportion of 20.2%. The OR of the involvement of experts for this outcome was 0.28 (95% CI 0.15 to 0.53). A further, nation-level, analysis showed that the involvement of experts and the implementation of unfavourable multivariate analysis are associated at the nation-level analysis (R=-0.652). Based on the results of this study, the benefit of participation of medical statistics experts is obvious. Experts should be involved for proper confounding adjustment and interpretation of statistical models. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2018. All rights reserved. No commercial use is permitted unless otherwise expressly granted.

  19. Partial Least Squares Regression Can Aid in Detecting Differential Abundance of Multiple Features in Sets of Metagenomic Samples

    PubMed Central

    Libiger, Ondrej; Schork, Nicholas J.

    2015-01-01

    It is now feasible to examine the composition and diversity of microbial communities (i.e., “microbiomes”) that populate different human organs and orifices using DNA sequencing and related technologies. To explore the potential links between changes in microbial communities and various diseases in the human body, it is essential to test associations involving different species within and across microbiomes, environmental settings and disease states. Although a number of statistical techniques exist for carrying out relevant analyses, it is unclear which of these techniques exhibit the greatest statistical power to detect associations given the complexity of most microbiome datasets. We compared the statistical power of principal component regression, partial least squares regression, regularized regression, distance-based regression, Hill's diversity measures, and a modified test implemented in the popular and widely used microbiome analysis methodology “Metastats” across a wide range of simulated scenarios involving changes in feature abundance between two sets of metagenomic samples. For this purpose, simulation studies were used to change the abundance of microbial species in a real dataset from a published study examining human hands. Each technique was applied to the same data, and its ability to detect the simulated change in abundance was assessed. We hypothesized that a small subset of methods would outperform the rest in terms of the statistical power. Indeed, we found that the Metastats technique modified to accommodate multivariate analysis and partial least squares regression yielded high power under the models and data sets we studied. The statistical power of diversity measure-based tests, distance-based regression and regularized regression was significantly lower. Our results provide insight into powerful analysis strategies that utilize information on species counts from large microbiome data sets exhibiting skewed frequency distributions obtained on a small to moderate number of samples. PMID:26734061

  20. A wind proxy based on migrating dunes at the Baltic coast: statistical analysis of the link between wind conditions and sand movement

    NASA Astrophysics Data System (ADS)

    Bierstedt, Svenja E.; Hünicke, Birgit; Zorita, Eduardo; Ludwig, Juliane

    2017-07-01

    We statistically analyse the relationship between the structure of migrating dunes in the southern Baltic and the driving wind conditions over the past 26 years, with the long-term aim of using migrating dunes as a proxy for past wind conditions at an interannual resolution. The present analysis is based on the dune record derived from geo-radar measurements by Ludwig et al. (2017). The dune system is located at the Baltic Sea coast of Poland and is migrating from west to east along the coast. The dunes present layers with different thicknesses that can be assigned to absolute dates at interannual timescales and put in relation to seasonal wind conditions. To statistically analyse this record and calibrate it as a wind proxy, we used a gridded regional meteorological reanalysis data set (coastDat2) covering recent decades. The identified link between the dune annual layers and wind conditions was additionally supported by the co-variability between dune layers and observed sea level variations in the southern Baltic Sea. We include precipitation and temperature into our analysis, in addition to wind, to learn more about the dependency between these three atmospheric factors and their common influence on the dune system. We set up a statistical linear model based on the correlation between the frequency of days with specific wind conditions in a given season and dune migration velocities derived for that season. To some extent, the dune records can be seen as analogous to tree-ring width records, and hence we use a proxy validation method usually applied in dendrochronology, cross-validation with the leave-one-out method, when the observational record is short. The revealed correlations between the wind record from the reanalysis and the wind record derived from the dune structure is in the range between 0.28 and 0.63, yielding similar statistical validation skill as dendroclimatological records.

  1. Using statistical process control to make data-based clinical decisions.

    PubMed

    Pfadt, A; Wheeler, D J

    1995-01-01

    Applied behavior analysis is based on an investigation of variability due to interrelationships among antecedents, behavior, and consequences. This permits testable hypotheses about the causes of behavior as well as for the course of treatment to be evaluated empirically. Such information provides corrective feedback for making data-based clinical decisions. This paper considers how a different approach to the analysis of variability based on the writings of Walter Shewart and W. Edwards Deming in the area of industrial quality control helps to achieve similar objectives. Statistical process control (SPC) was developed to implement a process of continual product improvement while achieving compliance with production standards and other requirements for promoting customer satisfaction. SPC involves the use of simple statistical tools, such as histograms and control charts, as well as problem-solving techniques, such as flow charts, cause-and-effect diagrams, and Pareto charts, to implement Deming's management philosophy. These data-analytic procedures can be incorporated into a human service organization to help to achieve its stated objectives in a manner that leads to continuous improvement in the functioning of the clients who are its customers. Examples are provided to illustrate how SPC procedures can be used to analyze behavioral data. Issues related to the application of these tools for making data-based clinical decisions and for creating an organizational climate that promotes their routine use in applied settings are also considered.

  2. Organization and Visualization for Initial Analysis of Forced-Choice Ipsative Data

    ERIC Educational Resources Information Center

    Cochran, Jill A.

    2015-01-01

    Forced-choice ipsative data are common in personality, philosophy and other preference-based studies. However, this type of data inherently contains dependencies that are challenging for usual statistical analysis. In order to utilize the structure of the data as a guide for analysis rather than as a challenge to manage, a visualisation tool was…

  3. The vulnerability of electric equipment to carbon fibers of mixed lengths: An analysis

    NASA Technical Reports Server (NTRS)

    Elber, W.

    1980-01-01

    The susceptibility of a stereo amplifier to damage from a spectrum of lengths of graphite fibers was calculated. A simple analysis was developed by which such calculations can be based on test results with fibers of uniform lengths. A statistical analysis was applied for the conversation of data for various logical failure criteria.

  4. Rasch Based Analysis of Oral Proficiency Test Data.

    ERIC Educational Resources Information Center

    Nakamura, Yuji

    2001-01-01

    This paper examines the rating scale data of oral proficiency tests analyzed by a Rasch Analysis focusing on an item map and factor analysis. In discussing the item map, the difficulty order of six items and students' answering patterns are analyzed using descriptive statistics and measures of central tendency of test scores. The data ranks the…

  5. The Social Profile of Students in Basic General Education in Ecuador: A Data Analysis

    ERIC Educational Resources Information Center

    Buri, Olga Elizabeth Minchala; Stefos, Efstathios

    2017-01-01

    The objective of this study is to examine the social profile of students who are enrolled in Basic General Education in Ecuador. Both a descriptive and multidimensional statistical analysis was carried out based on the data provided by the National Survey of Employment, Unemployment and Underemployment in 2015. The descriptive analysis shows the…

  6. ANALYSIS TO ACCOUNT FOR SMALL AGE RANGE CATEGORIES IN DISTRIBUTIONS OF WATER CONSUMPTION AND BODY WEIGHT IN THE U.S. USING CSFII DATA

    EPA Science Inventory

    Statistical population based estimates of water ingestion play a vital role in many types of exposure and risk analysis. A significant large scale analysis of water ingestion by the population of the United States was recently completed and is documented in the report titled ...

  7. Incorporating Multi-criteria Optimization and Uncertainty Analysis in the Model-Based Systems Engineering of an Autonomous Surface Craft

    DTIC Science & Technology

    2009-09-01

    SAS Statistical Analysis Software SE Systems Engineering SEP Systems Engineering Process SHP Shaft Horsepower SIGINT Signals Intelligence......management occurs (OSD 2002). The Systems Engineering Process (SEP), displayed in Figure 2, is a comprehensive , iterative and recursive problem

  8. The skeletal maturation status estimated by statistical shape analysis: axial images of Japanese cervical vertebra.

    PubMed

    Shin, S M; Kim, Y-I; Choi, Y-S; Yamaguchi, T; Maki, K; Cho, B-H; Park, S-B

    2015-01-01

    To evaluate axial cervical vertebral (ACV) shape quantitatively and to build a prediction model for skeletal maturation level using statistical shape analysis for Japanese individuals. The sample included 24 female and 19 male patients with hand-wrist radiographs and CBCT images. Through generalized Procrustes analysis and principal components (PCs) analysis, the meaningful PCs were extracted from each ACV shape and analysed for the estimation regression model. Each ACV shape had meaningful PCs, except for the second axial cervical vertebra. Based on these models, the smallest prediction intervals (PIs) were from the combination of the shape space PCs, age and gender. Overall, the PIs of the male group were smaller than those of the female group. There was no significant correlation between centroid size as a size factor and skeletal maturation level. Our findings suggest that the ACV maturation method, which was applied by statistical shape analysis, could confirm information about skeletal maturation in Japanese individuals as an available quantifier of skeletal maturation and could be as useful a quantitative method as the skeletal maturation index.

  9. The skeletal maturation status estimated by statistical shape analysis: axial images of Japanese cervical vertebra

    PubMed Central

    Shin, S M; Choi, Y-S; Yamaguchi, T; Maki, K; Cho, B-H; Park, S-B

    2015-01-01

    Objectives: To evaluate axial cervical vertebral (ACV) shape quantitatively and to build a prediction model for skeletal maturation level using statistical shape analysis for Japanese individuals. Methods: The sample included 24 female and 19 male patients with hand–wrist radiographs and CBCT images. Through generalized Procrustes analysis and principal components (PCs) analysis, the meaningful PCs were extracted from each ACV shape and analysed for the estimation regression model. Results: Each ACV shape had meaningful PCs, except for the second axial cervical vertebra. Based on these models, the smallest prediction intervals (PIs) were from the combination of the shape space PCs, age and gender. Overall, the PIs of the male group were smaller than those of the female group. There was no significant correlation between centroid size as a size factor and skeletal maturation level. Conclusions: Our findings suggest that the ACV maturation method, which was applied by statistical shape analysis, could confirm information about skeletal maturation in Japanese individuals as an available quantifier of skeletal maturation and could be as useful a quantitative method as the skeletal maturation index. PMID:25411713

  10. Combined data preprocessing and multivariate statistical analysis characterizes fed-batch culture of mouse hybridoma cells for rational medium design.

    PubMed

    Selvarasu, Suresh; Kim, Do Yun; Karimi, Iftekhar A; Lee, Dong-Yup

    2010-10-01

    We present an integrated framework for characterizing fed-batch cultures of mouse hybridoma cells producing monoclonal antibody (mAb). This framework systematically combines data preprocessing, elemental balancing and statistical analysis technique. Initially, specific rates of cell growth, glucose/amino acid consumptions and mAb/metabolite productions were calculated via curve fitting using logistic equations, with subsequent elemental balancing of the preprocessed data indicating the presence of experimental measurement errors. Multivariate statistical analysis was then employed to understand physiological characteristics of the cellular system. The results from principal component analysis (PCA) revealed three major clusters of amino acids with similar trends in their consumption profiles: (i) arginine, threonine and serine, (ii) glycine, tyrosine, phenylalanine, methionine, histidine and asparagine, and (iii) lysine, valine and isoleucine. Further analysis using partial least square (PLS) regression identified key amino acids which were positively or negatively correlated with the cell growth, mAb production and the generation of lactate and ammonia. Based on these results, the optimal concentrations of key amino acids in the feed medium can be inferred, potentially leading to an increase in cell viability and productivity, as well as a decrease in toxic waste production. The study demonstrated how the current methodological framework using multivariate statistical analysis techniques can serve as a potential tool for deriving rational medium design strategies. Copyright © 2010 Elsevier B.V. All rights reserved.

  11. Toward improved analysis of concentration data: Embracing nondetects.

    PubMed

    Shoari, Niloofar; Dubé, Jean-Sébastien

    2018-03-01

    Various statistical tests on concentration data serve to support decision-making regarding characterization and monitoring of contaminated media, assessing exposure to a chemical, and quantifying the associated risks. However, the routine statistical protocols cannot be directly applied because of challenges arising from nondetects or left-censored observations, which are concentration measurements below the detection limit of measuring instruments. Despite the existence of techniques based on survival analysis that can adjust for nondetects, these are seldom taken into account properly. A comprehensive review of the literature showed that managing policies regarding analysis of censored data do not always agree and that guidance from regulatory agencies may be outdated. Therefore, researchers and practitioners commonly resort to the most convenient way of tackling the censored data problem by substituting nondetects with arbitrary constants prior to data analysis, although this is generally regarded as a bias-prone approach. Hoping to improve the interpretation of concentration data, the present article aims to familiarize researchers in different disciplines with the significance of left-censored observations and provides theoretical and computational recommendations (under both frequentist and Bayesian frameworks) for adequate analysis of censored data. In particular, the present article synthesizes key findings from previous research with respect to 3 noteworthy aspects of inferential statistics: estimation of descriptive statistics, hypothesis testing, and regression analysis. Environ Toxicol Chem 2018;37:643-656. © 2017 SETAC. © 2017 SETAC.

  12. Assessment of chamber pressure oscillations in the Shuttle SRB

    NASA Technical Reports Server (NTRS)

    Mathes, H. B.

    1980-01-01

    Combustion stability evaluations of the Shuttle solid propellant booster motor are reviewed. Measurement of the amplitude and frequency of low level chamber pressure oscillations which have been detected in motor firings, are discussed and a statistical analysis of the data is presented. Oscillatory data from three recent motor firings are shown and the results are compared with statistical predictions which are based on earlier motor firings.

  13. Two-Bin Kanban: Ordering Impact at Navy Medical Center San Diego

    DTIC Science & Technology

    2016-06-17

    pretest (2013 data set) and posttest (2015 data set) analysis to avoid having the findings influenced by price changes. DMLSS does not track shipping...statistics based on those observations (Kabacoff, 2011, p. 112). Replacing the groups of observations with summary statistics allows the analyst...listed on the Acquisition Research Program website (www.acquisitionresearch.net). Acquisition Research Program Graduate School of Business & Public

  14. Marketing of Personalized Cancer Care on the Web: An Analysis of Internet Websites

    PubMed Central

    Cronin, Angel; Bair, Elizabeth; Lindeman, Neal; Viswanath, Vish; Janeway, Katherine A.

    2015-01-01

    Internet marketing may accelerate the use of care based on genomic or tumor-derived data. However, online marketing may be detrimental if it endorses products of unproven benefit. We conducted an analysis of Internet websites to identify personalized cancer medicine (PCM) products and claims. A Delphi Panel categorized PCM as standard or nonstandard based on evidence of clinical utility. Fifty-five websites, sponsored by commercial entities, academic institutions, physicians, research institutes, and organizations, that marketed PCM included somatic (58%) and germline (20%) analysis, interpretive services (15%), and physicians/institutions offering personalized care (44%). Of 32 sites offering somatic analysis, 56% included specific test information (range 1–152 tests). All statistical tests were two-sided, and comparisons of website content were conducted using McNemar’s test. More websites contained information about the benefits than limitations of PCM (85% vs 27%, P < .001). Websites specifying somatic analysis were statistically significantly more likely to market one or more nonstandard tests as compared with standard tests (88% vs 44%, P = .04). PMID:25745021

  15. Disorganization of white matter architecture in major depressive disorder: a meta-analysis of diffusion tensor imaging with tract-based spatial statistics.

    PubMed

    Chen, Guangxiang; Hu, Xinyu; Li, Lei; Huang, Xiaoqi; Lui, Su; Kuang, Weihong; Ai, Hua; Bi, Feng; Gu, Zhongwei; Gong, Qiyong

    2016-02-24

    White matter (WM) abnormalities have long been suspected in major depressive disorder (MDD). Tract-based spatial statistics (TBSS) studies have detected abnormalities in fractional anisotropy (FA) in MDD, but the available evidence has been inconsistent. We performed a quantitative meta-analysis of TBSS studies contrasting MDD patients with healthy control subjects (HCS). A total of 17 studies with 18 datasets that included 641 MDD patients and 581 HCS were identified. Anisotropic effect size-signed differential mapping (AES-SDM) meta-analysis was performed to assess FA alterations in MDD patients compared to HCS. FA reductions were identified in the genu of the corpus callosum (CC) extending to the body of the CC and left anterior limb of the internal capsule (ALIC) in MDD patients relative to HCS. Descriptive analysis of quartiles, sensitivity analysis and subgroup analysis further confirmed these findings. Meta-regression analysis revealed that individuals with more severe MDD were significantly more likely to have FA reductions in the genu of the CC. This study provides a thorough profile of WM abnormalities in MDD and evidence that interhemispheric connections and frontal-striatal-thalamic pathways are the most convergent circuits affected in MDD.

  16. Unit of Analysis: Impact of Silverman and Solmon's Article on Field-Based Intervention Research in Physical Education in the U.S.A.

    ERIC Educational Resources Information Center

    Li, Weidong; Chen, Yung-Ju; Xiang, Ping; Xie, Xiuge; Li, Yilin

    2017-01-01

    Purpose: The purposes of this study were to: (a) examine the impact of the Silverman and Solmon article (1998) on how researchers handle the unit of analysis issue in their field-based intervention research in physical education in the United States and summarize statistical approaches that have been used to analyze the data, and (b) provide…

  17. Tips and Tricks for Successful Application of Statistical Methods to Biological Data.

    PubMed

    Schlenker, Evelyn

    2016-01-01

    This chapter discusses experimental design and use of statistics to describe characteristics of data (descriptive statistics) and inferential statistics that test the hypothesis posed by the investigator. Inferential statistics, based on probability distributions, depend upon the type and distribution of the data. For data that are continuous, randomly and independently selected, as well as normally distributed more powerful parametric tests such as Student's t test and analysis of variance (ANOVA) can be used. For non-normally distributed or skewed data, transformation of the data (using logarithms) may normalize the data allowing use of parametric tests. Alternatively, with skewed data nonparametric tests can be utilized, some of which rely on data that are ranked prior to statistical analysis. Experimental designs and analyses need to balance between committing type 1 errors (false positives) and type 2 errors (false negatives). For a variety of clinical studies that determine risk or benefit, relative risk ratios (random clinical trials and cohort studies) or odds ratios (case-control studies) are utilized. Although both use 2 × 2 tables, their premise and calculations differ. Finally, special statistical methods are applied to microarray and proteomics data, since the large number of genes or proteins evaluated increase the likelihood of false discoveries. Additional studies in separate samples are used to verify microarray and proteomic data. Examples in this chapter and references are available to help continued investigation of experimental designs and appropriate data analysis.

  18. Statistics and Discoveries at the LHC (1/4)

    ScienceCinema

    Cowan, Glen

    2018-02-09

    The lectures will give an introduction to statistics as applied in particle physics and will provide all the necessary basics for data analysis at the LHC. Special emphasis will be placed on the the problems and questions that arise when searching for new phenomena, including p-values, discovery significance, limit setting procedures, treatment of small signals in the presence of large backgrounds. Specific issues that will be addressed include the advantages and drawbacks of different statistical test procedures (cut-based, likelihood-ratio, etc.), the look-elsewhere effect and treatment of systematic uncertainties.

  19. Statistics and Discoveries at the LHC (3/4)

    ScienceCinema

    Cowan, Glen

    2018-02-19

    The lectures will give an introduction to statistics as applied in particle physics and will provide all the necessary basics for data analysis at the LHC. Special emphasis will be placed on the the problems and questions that arise when searching for new phenomena, including p-values, discovery significance, limit setting procedures, treatment of small signals in the presence of large backgrounds. Specific issues that will be addressed include the advantages and drawbacks of different statistical test procedures (cut-based, likelihood-ratio, etc.), the look-elsewhere effect and treatment of systematic uncertainties.

  20. Statistics and Discoveries at the LHC (4/4)

    ScienceCinema

    Cowan, Glen

    2018-05-22

    The lectures will give an introduction to statistics as applied in particle physics and will provide all the necessary basics for data analysis at the LHC. Special emphasis will be placed on the the problems and questions that arise when searching for new phenomena, including p-values, discovery significance, limit setting procedures, treatment of small signals in the presence of large backgrounds. Specific issues that will be addressed include the advantages and drawbacks of different statistical test procedures (cut-based, likelihood-ratio, etc.), the look-elsewhere effect and treatment of systematic uncertainties.

  1. Statistics and Discoveries at the LHC (2/4)

    ScienceCinema

    Cowan, Glen

    2018-04-26

    The lectures will give an introduction to statistics as applied in particle physics and will provide all the necessary basics for data analysis at the LHC. Special emphasis will be placed on the the problems and questions that arise when searching for new phenomena, including p-values, discovery significance, limit setting procedures, treatment of small signals in the presence of large backgrounds. Specific issues that will be addressed include the advantages and drawbacks of different statistical test procedures (cut-based, likelihood-ratio, etc.), the look-elsewhere effect and treatment of systematic uncertainties.

  2. Zonation in the deep benthic megafauna : Application of a general test.

    PubMed

    Gardiner, Frederick P; Haedrich, Richard L

    1978-01-01

    A test based on Maxwell-Boltzman statistics, instead of the formerly suggested but inappropriate Bose-Einstein statistics (Pielou and Routledge, 1976), examines the distribution of the boundaries of species' ranges distributed along a gradient, and indicates whether they are random or clustered (zoned). The test is most useful as a preliminary to the application of more instructive but less statistically rigorous methods such as cluster analysis. The test indicates zonation is marked in the deep benthic megafauna living between 200 and 3000 m, but below 3000 m little zonation may be found.

  3. Remote Sensing/gis Integration for Site Planning and Resource Management

    NASA Technical Reports Server (NTRS)

    Fellows, J. D.

    1982-01-01

    The development of an interactive/batch gridded information system (array of cells georeferenced to USGS quad sheets) and interfacing application programs (e.g., hydrologic models) is discussed. This system allows non-programer users to request any data set(s) stored in the data base by inputing any random polygon's (watershed, political zone) boundary points. The data base information contained within this polygon can be used to produce maps, statistics, and define model parameters for the area. Present/proposed conditions for the area may be compared by inputing future usage (land cover, soils, slope, etc.). This system, known as the Hydrologic Analysis Program (HAP), is especially effective in the real time analysis of proposed land cover changes on runoff hydrographs and graphics/statistics resource inventories of random study area/watersheds.

  4. A neural network model of metaphor understanding with dynamic interaction based on a statistical language analysis: targeting a human-like model.

    PubMed

    Terai, Asuka; Nakagawa, Masanori

    2007-08-01

    The purpose of this paper is to construct a model that represents the human process of understanding metaphors, focusing specifically on similes of the form an "A like B". Generally speaking, human beings are able to generate and understand many sorts of metaphors. This study constructs the model based on a probabilistic knowledge structure for concepts which is computed from a statistical analysis of a large-scale corpus. Consequently, this model is able to cover the many kinds of metaphors that human beings can generate. Moreover, the model implements the dynamic process of metaphor understanding by using a neural network with dynamic interactions. Finally, the validity of the model is confirmed by comparing model simulations with the results from a psychological experiment.

  5. A Cyber-Attack Detection Model Based on Multivariate Analyses

    NASA Astrophysics Data System (ADS)

    Sakai, Yuto; Rinsaka, Koichiro; Dohi, Tadashi

    In the present paper, we propose a novel cyber-attack detection model based on two multivariate-analysis methods to the audit data observed on a host machine. The statistical techniques used here are the well-known Hayashi's quantification method IV and cluster analysis method. We quantify the observed qualitative audit event sequence via the quantification method IV, and collect similar audit event sequence in the same groups based on the cluster analysis. It is shown in simulation experiments that our model can improve the cyber-attack detection accuracy in some realistic cases where both normal and attack activities are intermingled.

  6. Quantitative Analysis of the Interdisciplinarity of Applied Mathematics.

    PubMed

    Xie, Zheng; Duan, Xiaojun; Ouyang, Zhenzheng; Zhang, Pengyuan

    2015-01-01

    The increasing use of mathematical techniques in scientific research leads to the interdisciplinarity of applied mathematics. This viewpoint is validated quantitatively here by statistical and network analysis on the corpus PNAS 1999-2013. A network describing the interdisciplinary relationships between disciplines in a panoramic view is built based on the corpus. Specific network indicators show the hub role of applied mathematics in interdisciplinary research. The statistical analysis on the corpus content finds that algorithms, a primary topic of applied mathematics, positively correlates, increasingly co-occurs, and has an equilibrium relationship in the long-run with certain typical research paradigms and methodologies. The finding can be understood as an intrinsic cause of the interdisciplinarity of applied mathematics.

  7. Robust Strategy for Rocket Engine Health Monitoring

    NASA Technical Reports Server (NTRS)

    Santi, L. Michael

    2001-01-01

    Monitoring the health of rocket engine systems is essentially a two-phase process. The acquisition phase involves sensing physical conditions at selected locations, converting physical inputs to electrical signals, conditioning the signals as appropriate to establish scale or filter interference, and recording results in a form that is easy to interpret. The inference phase involves analysis of results from the acquisition phase, comparison of analysis results to established health measures, and assessment of health indications. A variety of analytical tools may be employed in the inference phase of health monitoring. These tools can be separated into three broad categories: statistical, rule based, and model based. Statistical methods can provide excellent comparative measures of engine operating health. They require well-characterized data from an ensemble of "typical" engines, or "golden" data from a specific test assumed to define the operating norm in order to establish reliable comparative measures. Statistical methods are generally suitable for real-time health monitoring because they do not deal with the physical complexities of engine operation. The utility of statistical methods in rocket engine health monitoring is hindered by practical limits on the quantity and quality of available data. This is due to the difficulty and high cost of data acquisition, the limited number of available test engines, and the problem of simulating flight conditions in ground test facilities. In addition, statistical methods incur a penalty for disregarding flow complexity and are therefore limited in their ability to define performance shift causality. Rule based methods infer the health state of the engine system based on comparison of individual measurements or combinations of measurements with defined health norms or rules. This does not mean that rule based methods are necessarily simple. Although binary yes-no health assessment can sometimes be established by relatively simple rules, the causality assignment needed for refined health monitoring often requires an exceptionally complex rule base involving complicated logical maps. Structuring the rule system to be clear and unambiguous can be difficult, and the expert input required to maintain a large logic network and associated rule base can be prohibitive.

  8. A standards-based method for compositional analysis by energy dispersive X-ray spectrometry using multivariate statistical analysis: application to multicomponent alloys.

    PubMed

    Rathi, Monika; Ahrenkiel, S P; Carapella, J J; Wanlass, M W

    2013-02-01

    Given an unknown multicomponent alloy, and a set of standard compounds or alloys of known composition, can one improve upon popular standards-based methods for energy dispersive X-ray (EDX) spectrometry to quantify the elemental composition of the unknown specimen? A method is presented here for determining elemental composition of alloys using transmission electron microscopy-based EDX with appropriate standards. The method begins with a discrete set of related reference standards of known composition, applies multivariate statistical analysis to those spectra, and evaluates the compositions with a linear matrix algebra method to relate the spectra to elemental composition. By using associated standards, only limited assumptions about the physical origins of the EDX spectra are needed. Spectral absorption corrections can be performed by providing an estimate of the foil thickness of one or more reference standards. The technique was applied to III-V multicomponent alloy thin films: composition and foil thickness were determined for various III-V alloys. The results were then validated by comparing with X-ray diffraction and photoluminescence analysis, demonstrating accuracy of approximately 1% in atomic fraction.

  9. Development of computer-assisted instruction application for statistical data analysis android platform as learning resource

    NASA Astrophysics Data System (ADS)

    Hendikawati, P.; Arifudin, R.; Zahid, M. Z.

    2018-03-01

    This study aims to design an android Statistics Data Analysis application that can be accessed through mobile devices to making it easier for users to access. The Statistics Data Analysis application includes various topics of basic statistical along with a parametric statistics data analysis application. The output of this application system is parametric statistics data analysis that can be used for students, lecturers, and users who need the results of statistical calculations quickly and easily understood. Android application development is created using Java programming language. The server programming language uses PHP with the Code Igniter framework, and the database used MySQL. The system development methodology used is the Waterfall methodology with the stages of analysis, design, coding, testing, and implementation and system maintenance. This statistical data analysis application is expected to support statistical lecturing activities and make students easier to understand the statistical analysis of mobile devices.

  10. Compositional differences among Chinese soy sauce types studied by (13)C NMR spectroscopy coupled with multivariate statistical analysis.

    PubMed

    Kamal, Ghulam Mustafa; Wang, Xiaohua; Bin Yuan; Wang, Jie; Sun, Peng; Zhang, Xu; Liu, Maili

    2016-09-01

    Soy sauce a well known seasoning all over the world, especially in Asia, is available in global market in a wide range of types based on its purpose and the processing methods. Its composition varies with respect to the fermentation processes and addition of additives, preservatives and flavor enhancers. A comprehensive (1)H NMR based study regarding the metabonomic variations of soy sauce to differentiate among different types of soy sauce available on the global market has been limited due to the complexity of the mixture. In present study, (13)C NMR spectroscopy coupled with multivariate statistical data analysis like principle component analysis (PCA), and orthogonal partial least square-discriminant analysis (OPLS-DA) was applied to investigate metabonomic variations among different types of soy sauce, namely super light, super dark, red cooking and mushroom soy sauce. The main additives in soy sauce like glutamate, sucrose and glucose were easily distinguished and quantified using (13)C NMR spectroscopy which were otherwise difficult to be assigned and quantified due to serious signal overlaps in (1)H NMR spectra. The significantly higher concentration of sucrose in dark, red cooking and mushroom flavored soy sauce can directly be linked to the addition of caramel in soy sauce. Similarly, significantly higher level of glutamate in super light as compared to super dark and mushroom flavored soy sauce may come from the addition of monosodium glutamate. The study highlights the potentiality of (13)C NMR based metabonomics coupled with multivariate statistical data analysis in differentiating between the types of soy sauce on the basis of level of additives, raw materials and fermentation procedures. Copyright © 2016 Elsevier B.V. All rights reserved.

  11. Noise removing in encrypted color images by statistical analysis

    NASA Astrophysics Data System (ADS)

    Islam, N.; Puech, W.

    2012-03-01

    Cryptographic techniques are used to secure confidential data from unauthorized access but these techniques are very sensitive to noise. A single bit change in encrypted data can have catastrophic impact over the decrypted data. This paper addresses the problem of removing bit error in visual data which are encrypted using AES algorithm in the CBC mode. In order to remove the noise, a method is proposed which is based on the statistical analysis of each block during the decryption. The proposed method exploits local statistics of the visual data and confusion/diffusion properties of the encryption algorithm to remove the errors. Experimental results show that the proposed method can be used at the receiving end for the possible solution for noise removing in visual data in encrypted domain.

  12. Statistical Research on the Bioactivity of New Marine Natural Products Discovered during the 28 Years from 1985 to 2012

    PubMed Central

    Hu, Yiwen; Chen, Jiahui; Hu, Guping; Yu, Jianchen; Zhu, Xun; Lin, Yongcheng; Chen, Shengping; Yuan, Jie

    2015-01-01

    Every year, hundreds of new compounds are discovered from the metabolites of marine organisms. Finding new and useful compounds is one of the crucial drivers for this field of research. Here we describe the statistics of bioactive compounds discovered from marine organisms from 1985 to 2012. This work is based on our database, which contains information on more than 15,000 chemical substances including 4196 bioactive marine natural products. We performed a comprehensive statistical analysis to understand the characteristics of the novel bioactive compounds and detail temporal trends, chemical structures, species distribution, and research progress. We hope this meta-analysis will provide useful information for research into the bioactivity of marine natural products and drug development. PMID:25574736

  13. Cluster size statistic and cluster mass statistic: two novel methods for identifying changes in functional connectivity between groups or conditions.

    PubMed

    Ing, Alex; Schwarzbauer, Christian

    2014-01-01

    Functional connectivity has become an increasingly important area of research in recent years. At a typical spatial resolution, approximately 300 million connections link each voxel in the brain with every other. This pattern of connectivity is known as the functional connectome. Connectivity is often compared between experimental groups and conditions. Standard methods used to control the type 1 error rate are likely to be insensitive when comparisons are carried out across the whole connectome, due to the huge number of statistical tests involved. To address this problem, two new cluster based methods--the cluster size statistic (CSS) and cluster mass statistic (CMS)--are introduced to control the family wise error rate across all connectivity values. These methods operate within a statistical framework similar to the cluster based methods used in conventional task based fMRI. Both methods are data driven, permutation based and require minimal statistical assumptions. Here, the performance of each procedure is evaluated in a receiver operator characteristic (ROC) analysis, utilising a simulated dataset. The relative sensitivity of each method is also tested on real data: BOLD (blood oxygen level dependent) fMRI scans were carried out on twelve subjects under normal conditions and during the hypercapnic state (induced through the inhalation of 6% CO2 in 21% O2 and 73%N2). Both CSS and CMS detected significant changes in connectivity between normal and hypercapnic states. A family wise error correction carried out at the individual connection level exhibited no significant changes in connectivity.

  14. Cluster Size Statistic and Cluster Mass Statistic: Two Novel Methods for Identifying Changes in Functional Connectivity Between Groups or Conditions

    PubMed Central

    Ing, Alex; Schwarzbauer, Christian

    2014-01-01

    Functional connectivity has become an increasingly important area of research in recent years. At a typical spatial resolution, approximately 300 million connections link each voxel in the brain with every other. This pattern of connectivity is known as the functional connectome. Connectivity is often compared between experimental groups and conditions. Standard methods used to control the type 1 error rate are likely to be insensitive when comparisons are carried out across the whole connectome, due to the huge number of statistical tests involved. To address this problem, two new cluster based methods – the cluster size statistic (CSS) and cluster mass statistic (CMS) – are introduced to control the family wise error rate across all connectivity values. These methods operate within a statistical framework similar to the cluster based methods used in conventional task based fMRI. Both methods are data driven, permutation based and require minimal statistical assumptions. Here, the performance of each procedure is evaluated in a receiver operator characteristic (ROC) analysis, utilising a simulated dataset. The relative sensitivity of each method is also tested on real data: BOLD (blood oxygen level dependent) fMRI scans were carried out on twelve subjects under normal conditions and during the hypercapnic state (induced through the inhalation of 6% CO2 in 21% O2 and 73%N2). Both CSS and CMS detected significant changes in connectivity between normal and hypercapnic states. A family wise error correction carried out at the individual connection level exhibited no significant changes in connectivity. PMID:24906136

  15. Detecting the contagion effect in mass killings; a constructive example of the statistical advantages of unbinned likelihood methods.

    PubMed

    Towers, Sherry; Mubayi, Anuj; Castillo-Chavez, Carlos

    2018-01-01

    When attempting to statistically distinguish between a null and an alternative hypothesis, many researchers in the life and social sciences turn to binned statistical analysis methods, or methods that are simply based on the moments of a distribution (such as the mean, and variance). These methods have the advantage of simplicity of implementation, and simplicity of explanation. However, when null and alternative hypotheses manifest themselves in subtle differences in patterns in the data, binned analysis methods may be insensitive to these differences, and researchers may erroneously fail to reject the null hypothesis when in fact more sensitive statistical analysis methods might produce a different result when the null hypothesis is actually false. Here, with a focus on two recent conflicting studies of contagion in mass killings as instructive examples, we discuss how the use of unbinned likelihood methods makes optimal use of the information in the data; a fact that has been long known in statistical theory, but perhaps is not as widely appreciated amongst general researchers in the life and social sciences. In 2015, Towers et al published a paper that quantified the long-suspected contagion effect in mass killings. However, in 2017, Lankford & Tomek subsequently published a paper, based upon the same data, that claimed to contradict the results of the earlier study. The former used unbinned likelihood methods, and the latter used binned methods, and comparison of distribution moments. Using these analyses, we also discuss how visualization of the data can aid in determination of the most appropriate statistical analysis methods to distinguish between a null and alternate hypothesis. We also discuss the importance of assessment of the robustness of analysis results to methodological assumptions made (for example, arbitrary choices of number of bins and bin widths when using binned methods); an issue that is widely overlooked in the literature, but is critical to analysis reproducibility and robustness. When an analysis cannot distinguish between a null and alternate hypothesis, care must be taken to ensure that the analysis methodology itself maximizes the use of information in the data that can distinguish between the two hypotheses. The use of binned methods by Lankford & Tomek (2017), that examined how many mass killings fell within a 14 day window from a previous mass killing, substantially reduced the sensitivity of their analysis to contagion effects. The unbinned likelihood methods used by Towers et al (2015) did not suffer from this problem. While a binned analysis might be favorable for simplicity and clarity of presentation, unbinned likelihood methods are preferable when effects might be somewhat subtle.

  16. Detecting the contagion effect in mass killings; a constructive example of the statistical advantages of unbinned likelihood methods

    PubMed Central

    Mubayi, Anuj; Castillo-Chavez, Carlos

    2018-01-01

    Background When attempting to statistically distinguish between a null and an alternative hypothesis, many researchers in the life and social sciences turn to binned statistical analysis methods, or methods that are simply based on the moments of a distribution (such as the mean, and variance). These methods have the advantage of simplicity of implementation, and simplicity of explanation. However, when null and alternative hypotheses manifest themselves in subtle differences in patterns in the data, binned analysis methods may be insensitive to these differences, and researchers may erroneously fail to reject the null hypothesis when in fact more sensitive statistical analysis methods might produce a different result when the null hypothesis is actually false. Here, with a focus on two recent conflicting studies of contagion in mass killings as instructive examples, we discuss how the use of unbinned likelihood methods makes optimal use of the information in the data; a fact that has been long known in statistical theory, but perhaps is not as widely appreciated amongst general researchers in the life and social sciences. Methods In 2015, Towers et al published a paper that quantified the long-suspected contagion effect in mass killings. However, in 2017, Lankford & Tomek subsequently published a paper, based upon the same data, that claimed to contradict the results of the earlier study. The former used unbinned likelihood methods, and the latter used binned methods, and comparison of distribution moments. Using these analyses, we also discuss how visualization of the data can aid in determination of the most appropriate statistical analysis methods to distinguish between a null and alternate hypothesis. We also discuss the importance of assessment of the robustness of analysis results to methodological assumptions made (for example, arbitrary choices of number of bins and bin widths when using binned methods); an issue that is widely overlooked in the literature, but is critical to analysis reproducibility and robustness. Conclusions When an analysis cannot distinguish between a null and alternate hypothesis, care must be taken to ensure that the analysis methodology itself maximizes the use of information in the data that can distinguish between the two hypotheses. The use of binned methods by Lankford & Tomek (2017), that examined how many mass killings fell within a 14 day window from a previous mass killing, substantially reduced the sensitivity of their analysis to contagion effects. The unbinned likelihood methods used by Towers et al (2015) did not suffer from this problem. While a binned analysis might be favorable for simplicity and clarity of presentation, unbinned likelihood methods are preferable when effects might be somewhat subtle. PMID:29742115

  17. An Exploratory Data Analysis System for Support in Medical Decision-Making

    PubMed Central

    Copeland, J. A.; Hamel, B.; Bourne, J. R.

    1979-01-01

    An experimental system was developed to allow retrieval and analysis of data collected during a study of neurobehavioral correlates of renal disease. After retrieving data organized in a relational data base, simple bivariate statistics of parametric and nonparametric nature could be conducted. An “exploratory” mode in which the system provided guidance in selection of appropriate statistical analyses was also available to the user. The system traversed a decision tree using the inherent qualities of the data (e.g., the identity and number of patients, tests, and time epochs) to search for the appropriate analyses to employ.

  18. An Analysis Methodology for the Gamma-ray Large Area Space Telescope

    NASA Technical Reports Server (NTRS)

    Morris, Robin D.; Cohen-Tanugi, Johann

    2004-01-01

    The Large Area Telescope (LAT) instrument on the Gamma Ray Large Area Space Telescope (GLAST) has been designed to detect high-energy gamma rays and determine their direction of incidence and energy. We propose a reconstruction algorithm based on recent advances in statistical methodology. This method, alternative to the standard event analysis inherited from high energy collider physics experiments, incorporates more accurately the physical processes occurring in the detector, and makes full use of the statistical information available. It could thus provide a better estimate of the direction and energy of the primary photon.

  19. BOOTSTRAPPING AND MONTE CARLO METHODS OF POWER ANALYSIS USED TO ESTABLISH CONDITION CATEGORIES FOR BIOTIC INDICES

    EPA Science Inventory

    Biotic indices have been used ot assess biological condition by dividing index scores into condition categories. Historically the number of categories has been based on professional judgement. Alternatively, statistical methods such as power analysis can be used to determine the ...

  20. Cellular Consequences of Telomere Shortening in Histologically Normal Breast Tissues

    DTIC Science & Technology

    2013-09-01

    using the open source, JAVA -based image analysis software package ImageJ (http://rsb.info.nih.gov/ij/) and a custom designed plugin (“Telometer...Tabulated data were stored in a MySQL (http://www.mysql.com) database and viewed through Microsoft Access (Microsoft Corp.). Statistical Analysis For

  1. Analysis of Nursing Curriculum and Course Competencies.

    ERIC Educational Resources Information Center

    Trani, G. M.

    The objectives of this study were to relate the competencies of the Nursing Program at Delaware County Community College to national morbidity statistics and to recommend curriculum changes based on this analysis. Existing terminal objectives of the program and each nursing module were compared with college-wide terminal objectives, overlap was…

  2. Opportunities for Applied Behavior Analysis in the Total Quality Movement.

    ERIC Educational Resources Information Center

    Redmon, William K.

    1992-01-01

    This paper identifies critical components of recent organizational quality improvement programs and specifies how applied behavior analysis can contribute to quality technology. Statistical Process Control and Total Quality Management approaches are compared, and behavior analysts are urged to build their research base and market behavior change…

  3. Prediction of Recidivism in Juvenile Offenders Based on Discriminant Analysis.

    ERIC Educational Resources Information Center

    Proefrock, David W.

    The recent development of strong statistical techniques has made accurate predictions of recidivism possible. To investigate the utility of discriminant analysis methodology in making predictions of recidivism in juvenile offenders, the court records of 271 male and female juvenile offenders, aged 12-16, were reviewed. A cross validation group…

  4. An online sleep apnea detection method based on recurrence quantification analysis.

    PubMed

    Nguyen, Hoa Dinh; Wilkins, Brek A; Cheng, Qi; Benjamin, Bruce Allen

    2014-07-01

    This paper introduces an online sleep apnea detection method based on heart rate complexity as measured by recurrence quantification analysis (RQA) statistics of heart rate variability (HRV) data. RQA statistics can capture nonlinear dynamics of a complex cardiorespiratory system during obstructive sleep apnea. In order to obtain a more robust measurement of the nonstationarity of the cardiorespiratory system, we use different fixed amount of neighbor thresholdings for recurrence plot calculation. We integrate a feature selection algorithm based on conditional mutual information to select the most informative RQA features for classification, and hence, to speed up the real-time classification process without degrading the performance of the system. Two types of binary classifiers, i.e., support vector machine and neural network, are used to differentiate apnea from normal sleep. A soft decision fusion rule is developed to combine the results of these classifiers in order to improve the classification performance of the whole system. Experimental results show that our proposed method achieves better classification results compared with the previous recurrence analysis-based approach. We also show that our method is flexible and a strong candidate for a real efficient sleep apnea detection system.

  5. Statistical Analyses of Femur Parameters for Designing Anatomical Plates.

    PubMed

    Wang, Lin; He, Kunjin; Chen, Zhengming

    2016-01-01

    Femur parameters are key prerequisites for scientifically designing anatomical plates. Meanwhile, individual differences in femurs present a challenge to design well-fitting anatomical plates. Therefore, to design anatomical plates more scientifically, analyses of femur parameters with statistical methods were performed in this study. The specific steps were as follows. First, taking eight anatomical femur parameters as variables, 100 femur samples were classified into three classes with factor analysis and Q-type cluster analysis. Second, based on the mean parameter values of the three classes of femurs, three sizes of average anatomical plates corresponding to the three classes of femurs were designed. Finally, based on Bayes discriminant analysis, a new femur could be assigned to the proper class. Thereafter, the average anatomical plate suitable for that new femur was selected from the three available sizes of plates. Experimental results showed that the classification of femurs was quite reasonable based on the anatomical aspects of the femurs. For instance, three sizes of condylar buttress plates were designed. Meanwhile, 20 new femurs are judged to which classes the femurs belong. Thereafter, suitable condylar buttress plates were determined and selected.

  6. [Construction and application of special analysis database of geoherbs based on 3S technology].

    PubMed

    Guo, Lan-ping; Huang, Lu-qi; Lv, Dong-mei; Shao, Ai-juan; Wang, Jian

    2007-09-01

    In this paper,the structures, data sources, data codes of "the spacial analysis database of geoherbs" based 3S technology are introduced, and the essential functions of the database, such as data management, remote sensing, spacial interpolation, spacial statistics, spacial analysis and developing are described. At last, two examples for database usage are given, the one is classification and calculating of NDVI index of remote sensing image in geoherbal area of Atractylodes lancea, the other one is adaptation analysis of A. lancea. These indicate that "the spacial analysis database of geoherbs" has bright prospect in spacial analysis of geoherbs.

  7. 10 CFR 431.17 - Determination of efficiency.

    Code of Federal Regulations, 2014 CFR

    2014-01-01

    ... characteristics of that basic model, and (ii) Based on engineering or statistical analysis, computer simulation or... simulation or modeling, and other analytic evaluation of performance data on which the AEDM is based... applied. (iii) If requested by the Department, the manufacturer shall conduct simulations to predict the...

  8. 10 CFR 431.17 - Determination of efficiency.

    Code of Federal Regulations, 2012 CFR

    2012-01-01

    ... characteristics of that basic model, and (ii) Based on engineering or statistical analysis, computer simulation or... simulation or modeling, and other analytic evaluation of performance data on which the AEDM is based... applied. (iii) If requested by the Department, the manufacturer shall conduct simulations to predict the...

  9. Thinking big

    NASA Astrophysics Data System (ADS)

    Collins, Harry

    2008-02-01

    Physicists are often quick to discount social research based on qualitative techniques such as ethnography and "deep case studies" - where a researcher draws conclusions about a community based on immersion in the field - thinking that only quantitative research backed up by statistical analysis is sound. The balance is not so clear, however.

  10. The Statistical Analysis Techniques to Support the NGNP Fuel Performance Experiments

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bihn T. Pham; Jeffrey J. Einerson

    2010-06-01

    This paper describes the development and application of statistical analysis techniques to support the AGR experimental program on NGNP fuel performance. The experiments conducted in the Idaho National Laboratory’s Advanced Test Reactor employ fuel compacts placed in a graphite cylinder shrouded by a steel capsule. The tests are instrumented with thermocouples embedded in graphite blocks and the target quantity (fuel/graphite temperature) is regulated by the He-Ne gas mixture that fills the gap volume. Three techniques for statistical analysis, namely control charting, correlation analysis, and regression analysis, are implemented in the SAS-based NGNP Data Management and Analysis System (NDMAS) for automatedmore » processing and qualification of the AGR measured data. The NDMAS also stores daily neutronic (power) and thermal (heat transfer) code simulation results along with the measurement data, allowing for their combined use and comparative scrutiny. The ultimate objective of this work includes (a) a multi-faceted system for data monitoring and data accuracy testing, (b) identification of possible modes of diagnostics deterioration and changes in experimental conditions, (c) qualification of data for use in code validation, and (d) identification and use of data trends to support effective control of test conditions with respect to the test target. Analysis results and examples given in the paper show the three statistical analysis techniques providing a complementary capability to warn of thermocouple failures. It also suggests that the regression analysis models relating calculated fuel temperatures and thermocouple readings can enable online regulation of experimental parameters (i.e. gas mixture content), to effectively maintain the target quantity (fuel temperature) within a given range.« less

  11. Simulation-based estimation of mean and standard deviation for meta-analysis via Approximate Bayesian Computation (ABC).

    PubMed

    Kwon, Deukwoo; Reis, Isildinha M

    2015-08-12

    When conducting a meta-analysis of a continuous outcome, estimated means and standard deviations from the selected studies are required in order to obtain an overall estimate of the mean effect and its confidence interval. If these quantities are not directly reported in the publications, they must be estimated from other reported summary statistics, such as the median, the minimum, the maximum, and quartiles. We propose a simulation-based estimation approach using the Approximate Bayesian Computation (ABC) technique for estimating mean and standard deviation based on various sets of summary statistics found in published studies. We conduct a simulation study to compare the proposed ABC method with the existing methods of Hozo et al. (2005), Bland (2015), and Wan et al. (2014). In the estimation of the standard deviation, our ABC method performs better than the other methods when data are generated from skewed or heavy-tailed distributions. The corresponding average relative error (ARE) approaches zero as sample size increases. In data generated from the normal distribution, our ABC performs well. However, the Wan et al. method is best for estimating standard deviation under normal distribution. In the estimation of the mean, our ABC method is best regardless of assumed distribution. ABC is a flexible method for estimating the study-specific mean and standard deviation for meta-analysis, especially with underlying skewed or heavy-tailed distributions. The ABC method can be applied using other reported summary statistics such as the posterior mean and 95 % credible interval when Bayesian analysis has been employed.

  12. Spatial analyses for nonoverlapping objects with size variations and their application to coral communities.

    PubMed

    Muko, Soyoka; Shimatani, Ichiro K; Nozawa, Yoko

    2014-07-01

    Spatial distributions of individuals are conventionally analysed by representing objects as dimensionless points, in which spatial statistics are based on centre-to-centre distances. However, if organisms expand without overlapping and show size variations, such as is the case for encrusting corals, interobject spacing is crucial for spatial associations where interactions occur. We introduced new pairwise statistics using minimum distances between objects and demonstrated their utility when examining encrusting coral community data. We also calculated the conventional point process statistics and the grid-based statistics to clarify the advantages and limitations of each spatial statistical method. For simplicity, coral colonies were approximated by disks in these demonstrations. Focusing on short-distance effects, the use of minimum distances revealed that almost all coral genera were aggregated at a scale of 1-25 cm. However, when fragmented colonies (ramets) were treated as a genet, a genet-level analysis indicated weak or no aggregation, suggesting that most corals were randomly distributed and that fragmentation was the primary cause of colony aggregations. In contrast, point process statistics showed larger aggregation scales, presumably because centre-to-centre distances included both intercolony spacing and colony sizes (radius). The grid-based statistics were able to quantify the patch (aggregation) scale of colonies, but the scale was strongly affected by the colony size. Our approach quantitatively showed repulsive effects between an aggressive genus and a competitively weak genus, while the grid-based statistics (covariance function) also showed repulsion although the spatial scale indicated from the statistics was not directly interpretable in terms of ecological meaning. The use of minimum distances together with previously proposed spatial statistics helped us to extend our understanding of the spatial patterns of nonoverlapping objects that vary in size and the associated specific scales. © 2013 The Authors. Journal of Animal Ecology © 2013 British Ecological Society.

  13. Jllumina - A comprehensive Java-based API for statistical Illumina Infinium HumanMethylation450 and Infinium MethylationEPIC BeadChip data processing.

    PubMed

    Almeida, Diogo; Skov, Ida; Lund, Jesper; Mohammadnejad, Afsaneh; Silva, Artur; Vandin, Fabio; Tan, Qihua; Baumbach, Jan; Röttger, Richard

    2016-10-01

    Measuring differential methylation of the DNA is the nowadays most common approach to linking epigenetic modifications to diseases (called epigenome-wide association studies, EWAS). For its low cost, its efficiency and easy handling, the Illumina HumanMethylation450 BeadChip and its successor, the Infinium MethylationEPIC BeadChip, is the by far most popular techniques for conduction EWAS in large patient cohorts. Despite the popularity of this chip technology, raw data processing and statistical analysis of the array data remains far from trivial and still lacks dedicated software libraries enabling high quality and statistically sound downstream analyses. As of yet, only R-based solutions are freely available for low-level processing of the Illumina chip data. However, the lack of alternative libraries poses a hurdle for the development of new bioinformatic tools, in particular when it comes to web services or applications where run time and memory consumption matter, or EWAS data analysis is an integrative part of a bigger framework or data analysis pipeline. We have therefore developed and implemented Jllumina, an open-source Java library for raw data manipulation of Illumina Infinium HumanMethylation450 and Infinium MethylationEPIC BeadChip data, supporting the developer with Java functions covering reading and preprocessing the raw data, down to statistical assessment, permutation tests, and identification of differentially methylated loci. Jllumina is fully parallelizable and publicly available at http://dimmer.compbio.sdu.dk/download.html.

  14. Jllumina - A comprehensive Java-based API for statistical Illumina Infinium HumanMethylation450 and MethylationEPIC data processing.

    PubMed

    Almeida, Diogo; Skov, Ida; Lund, Jesper; Mohammadnejad, Afsaneh; Silva, Artur; Vandin, Fabio; Tan, Qihua; Baumbach, Jan; Röttger, Richard

    2016-12-18

    Measuring differential methylation of the DNA is the nowadays most common approach to linking epigenetic modifications to diseases (called epigenome-wide association studies, EWAS). For its low cost, its efficiency and easy handling, the Illumina HumanMethylation450 BeadChip and its successor, the Infinium MethylationEPIC BeadChip, is the by far most popular techniques for conduction EWAS in large patient cohorts. Despite the popularity of this chip technology, raw data processing and statistical analysis of the array data remains far from trivial and still lacks dedicated software libraries enabling high quality and statistically sound downstream analyses. As of yet, only R-based solutions are freely available for low-level processing of the Illumina chip data. However, the lack of alternative libraries poses a hurdle for the development of new bioinformatic tools, in particular when it comes to web services or applications where run time and memory consumption matter, or EWAS data analysis is an integrative part of a bigger framework or data analysis pipeline. We have therefore developed and implemented Jllumina, an open-source Java library for raw data manipulation of Illumina Infinium HumanMethylation450 and Infinium MethylationEPIC BeadChip data, supporting the developer with Java functions covering reading and preprocessing the raw data, down to statistical assessment, permutation tests, and identification of differentially methylated loci. Jllumina is fully parallelizable and publicly available at http://dimmer.compbio.sdu.dk/download.html.

  15. Assessing Attitudes towards Statistics among Medical Students: Psychometric Properties of the Serbian Version of the Survey of Attitudes Towards Statistics (SATS)

    PubMed Central

    Stanisavljevic, Dejana; Trajkovic, Goran; Marinkovic, Jelena; Bukumiric, Zoran; Cirkovic, Andja; Milic, Natasa

    2014-01-01

    Background Medical statistics has become important and relevant for future doctors, enabling them to practice evidence based medicine. Recent studies report that students’ attitudes towards statistics play an important role in their statistics achievements. The aim of the study was to test the psychometric properties of the Serbian version of the Survey of Attitudes Towards Statistics (SATS) in order to acquire a valid instrument to measure attitudes inside the Serbian educational context. Methods The validation study was performed on a cohort of 417 medical students who were enrolled in an obligatory introductory statistics course. The SATS adaptation was based on an internationally accepted methodology for translation and cultural adaptation. Psychometric properties of the Serbian version of the SATS were analyzed through the examination of factorial structure and internal consistency. Results Most medical students held positive attitudes towards statistics. The average total SATS score was above neutral (4.3±0.8), and varied from 1.9 to 6.2. Confirmatory factor analysis validated the six-factor structure of the questionnaire (Affect, Cognitive Competence, Value, Difficulty, Interest and Effort). Values for fit indices TLI (0.940) and CFI (0.961) were above the cut-off of ≥0.90. The RMSEA value of 0.064 (0.051–0.078) was below the suggested value of ≤0.08. Cronbach’s alpha of the entire scale was 0.90, indicating scale reliability. In a multivariate regression model, self-rating of ability in mathematics and current grade point average were significantly associated with the total SATS score after adjusting for age and gender. Conclusion Present study provided the evidence for the appropriate metric properties of the Serbian version of SATS. Confirmatory factor analysis validated the six-factor structure of the scale. The SATS might be reliable and a valid instrument for identifying medical students’ attitudes towards statistics in the Serbian educational context. PMID:25405489

  16. Assessing attitudes towards statistics among medical students: psychometric properties of the Serbian version of the Survey of Attitudes Towards Statistics (SATS).

    PubMed

    Stanisavljevic, Dejana; Trajkovic, Goran; Marinkovic, Jelena; Bukumiric, Zoran; Cirkovic, Andja; Milic, Natasa

    2014-01-01

    Medical statistics has become important and relevant for future doctors, enabling them to practice evidence based medicine. Recent studies report that students' attitudes towards statistics play an important role in their statistics achievements. The aim of the study was to test the psychometric properties of the Serbian version of the Survey of Attitudes Towards Statistics (SATS) in order to acquire a valid instrument to measure attitudes inside the Serbian educational context. The validation study was performed on a cohort of 417 medical students who were enrolled in an obligatory introductory statistics course. The SATS adaptation was based on an internationally accepted methodology for translation and cultural adaptation. Psychometric properties of the Serbian version of the SATS were analyzed through the examination of factorial structure and internal consistency. Most medical students held positive attitudes towards statistics. The average total SATS score was above neutral (4.3±0.8), and varied from 1.9 to 6.2. Confirmatory factor analysis validated the six-factor structure of the questionnaire (Affect, Cognitive Competence, Value, Difficulty, Interest and Effort). Values for fit indices TLI (0.940) and CFI (0.961) were above the cut-off of ≥0.90. The RMSEA value of 0.064 (0.051-0.078) was below the suggested value of ≤0.08. Cronbach's alpha of the entire scale was 0.90, indicating scale reliability. In a multivariate regression model, self-rating of ability in mathematics and current grade point average were significantly associated with the total SATS score after adjusting for age and gender. Present study provided the evidence for the appropriate metric properties of the Serbian version of SATS. Confirmatory factor analysis validated the six-factor structure of the scale. The SATS might be reliable and a valid instrument for identifying medical students' attitudes towards statistics in the Serbian educational context.

  17. Identifiability of PBPK Models with Applications to ...

    EPA Pesticide Factsheets

    Any statistical model should be identifiable in order for estimates and tests using it to be meaningful. We consider statistical analysis of physiologically-based pharmacokinetic (PBPK) models in which parameters cannot be estimated precisely from available data, and discuss different types of identifiability that occur in PBPK models and give reasons why they occur. We particularly focus on how the mathematical structure of a PBPK model and lack of appropriate data can lead to statistical models in which it is impossible to estimate at least some parameters precisely. Methods are reviewed which can determine whether a purely linear PBPK model is globally identifiable. We propose a theorem which determines when identifiability at a set of finite and specific values of the mathematical PBPK model (global discrete identifiability) implies identifiability of the statistical model. However, we are unable to establish conditions that imply global discrete identifiability, and conclude that the only safe approach to analysis of PBPK models involves Bayesian analysis with truncated priors. Finally, computational issues regarding posterior simulations of PBPK models are discussed. The methodology is very general and can be applied to numerous PBPK models which can be expressed as linear time-invariant systems. A real data set of a PBPK model for exposure to dimethyl arsinic acid (DMA(V)) is presented to illustrate the proposed methodology. We consider statistical analy

  18. Effects of Heterogeniety on Spatial Pattern Analysis of Wild Pistachio Trees in Zagros Woodlands, Iran

    NASA Astrophysics Data System (ADS)

    Erfanifard, Y.; Rezayan, F.

    2014-10-01

    Vegetation heterogeneity biases second-order summary statistics, e.g., Ripley's K-function, applied for spatial pattern analysis in ecology. Second-order investigation based on Ripley's K-function and related statistics (i.e., L- and pair correlation function g) is widely used in ecology to develop hypothesis on underlying processes by characterizing spatial patterns of vegetation. The aim of this study was to demonstrate effects of underlying heterogeneity of wild pistachio (Pistacia atlantica Desf.) trees on the second-order summary statistics of point pattern analysis in a part of Zagros woodlands, Iran. The spatial distribution of 431 wild pistachio trees was accurately mapped in a 40 ha stand in the Wild Pistachio & Almond Research Site, Fars province, Iran. Three commonly used second-order summary statistics (i.e., K-, L-, and g-functions) were applied to analyse their spatial pattern. The two-sample Kolmogorov-Smirnov goodness-of-fit test showed that the observed pattern significantly followed an inhomogeneous Poisson process null model in the study region. The results also showed that heterogeneous pattern of wild pistachio trees biased the homogeneous form of K-, L-, and g-functions, demonstrating a stronger aggregation of the trees at the scales of 0-50 m than actually existed and an aggregation at scales of 150-200 m, while regularly distributed. Consequently, we showed that heterogeneity of point patterns may bias the results of homogeneous second-order summary statistics and we also suggested applying inhomogeneous summary statistics with related null models for spatial pattern analysis of heterogeneous vegetations.

  19. Meta-analysis of neutropenia or leukopenia as a prognostic factor in patients with malignant disease undergoing chemotherapy.

    PubMed

    Shitara, Kohei; Matsuo, Keitaro; Oze, Isao; Mizota, Ayako; Kondo, Chihiro; Nomura, Motoo; Yokota, Tomoya; Takahari, Daisuke; Ura, Takashi; Muro, Kei

    2011-08-01

    We performed a systematic review and meta-analysis to determine the impact of neutropenia or leukopenia experienced during chemotherapy on survival. Eligible studies included prospective or retrospective analyses that evaluated neutropenia or leukopenia as a prognostic factor for overall survival or disease-free survival. Statistical analyses were conducted to calculate a summary hazard ratio and 95% confidence interval (CI) using random-effects or fixed-effects models based on the heterogeneity of the included studies. Thirteen trials were selected for the meta-analysis, with a total of 9,528 patients. The hazard ratio of death was 0.69 (95% CI, 0.64-0.75) for patients with higher-grade neutropenia or leukopenia compared to patients with lower-grade or lack of cytopenia. Our analysis was also stratified by statistical method (any statistical method to decrease lead-time bias; time-varying analysis or landmark analysis), but no differences were observed. Our results indicate that neutropenia or leukopenia experienced during chemotherapy is associated with improved survival in patients with advanced cancer or hematological malignancies undergoing chemotherapy. Future prospective analyses designed to investigate the potential impact of chemotherapy dose adjustment coupled with monitoring of neutropenia or leukopenia on survival are warranted.

  20. Application of statistical shape analysis for the estimation of bone and forensic age using the shapes of the 2nd, 3rd, and 4th cervical vertebrae in a young Japanese population.

    PubMed

    Rhee, Chang-Hoon; Shin, Sang Min; Choi, Yong-Seok; Yamaguchi, Tetsutaro; Maki, Koutaro; Kim, Yong-Il; Kim, Seong-Sik; Park, Soo-Byung; Son, Woo-Sung

    2015-12-01

    From computed tomographic images, the dentocentral synchondrosis can be identified in the second cervical vertebra. This can demarcate the border between the odontoid process and the body of the 2nd cervical vertebra and serve as a good model for the prediction of bone and forensic age. Nevertheless, until now, there has been no application of the 2nd cervical vertebra based on the dentocentral synchondrosis. In this study, statistical shape analysis was used to build bone and forensic age estimation regression models. Following the principles of statistical shape analysis and principal components analysis, we used cone-beam computed tomography (CBCT) to evaluate a Japanese population (35 males and 45 females, from 5 to 19 years old). The narrowest prediction intervals among the multivariate regression models were 19.63 for bone age and 2.99 for forensic age. There was no significant difference between form space and shape space in the bone and forensic age estimation models. However, for gender comparison, the bone and forensic age estimation models for males had the higher explanatory power. This study derived an improved objective and quantitative method for bone and forensic age estimation based on only the 2nd, 3rd and 4th cervical vertebral shapes. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.

Top