Sample records for analysis sampling statistics

  1. Time Series Analysis Based on Running Mann Whitney Z Statistics

    USDA-ARS?s Scientific Manuscript database

    A sensitive and objective time series analysis method based on the calculation of Mann Whitney U statistics is described. This method samples data rankings over moving time windows, converts those samples to Mann-Whitney U statistics, and then normalizes the U statistics to Z statistics using Monte-...

  2. Statistical Symbolic Execution with Informed Sampling

    NASA Technical Reports Server (NTRS)

    Filieri, Antonio; Pasareanu, Corina S.; Visser, Willem; Geldenhuys, Jaco

    2014-01-01

    Symbolic execution techniques have been proposed recently for the probabilistic analysis of programs. These techniques seek to quantify the likelihood of reaching program events of interest, e.g., assert violations. They have many promising applications but have scalability issues due to high computational demand. To address this challenge, we propose a statistical symbolic execution technique that performs Monte Carlo sampling of the symbolic program paths and uses the obtained information for Bayesian estimation and hypothesis testing with respect to the probability of reaching the target events. To speed up the convergence of the statistical analysis, we propose Informed Sampling, an iterative symbolic execution that first explores the paths that have high statistical significance, prunes them from the state space and guides the execution towards less likely paths. The technique combines Bayesian estimation with a partial exact analysis for the pruned paths leading to provably improved convergence of the statistical analysis. We have implemented statistical symbolic execution with in- formed sampling in the Symbolic PathFinder tool. We show experimentally that the informed sampling obtains more precise results and converges faster than a purely statistical analysis and may also be more efficient than an exact symbolic analysis. When the latter does not terminate symbolic execution with informed sampling can give meaningful results under the same time and memory limits.

  3. Quantitative Methods for Analysing Joint Questionnaire Data: Exploring the Role of Joint in Force Design

    DTIC Science & Technology

    2015-08-01

    the nine questions. The Statistical Package for the Social Sciences ( SPSS ) [11] was used to conduct statistical analysis on the sample. Two types...constructs. SPSS was again used to conduct statistical analysis on the sample. This time factor analysis was conducted. Factor analysis attempts to...Business Research Methods and Statistics using SPSS . P432. 11 IBM SPSS Statistics . (2012) 12 Burns, R.B., Burns, R.A. (2008) ‘Business Research

  4. Statistical Analysis Techniques for Small Sample Sizes

    NASA Technical Reports Server (NTRS)

    Navard, S. E.

    1984-01-01

    The small sample sizes problem which is encountered when dealing with analysis of space-flight data is examined. Because of such a amount of data available, careful analyses are essential to extract the maximum amount of information with acceptable accuracy. Statistical analysis of small samples is described. The background material necessary for understanding statistical hypothesis testing is outlined and the various tests which can be done on small samples are explained. Emphasis is on the underlying assumptions of each test and on considerations needed to choose the most appropriate test for a given type of analysis.

  5. STATISTICAL SAMPLING AND DATA ANALYSIS

    EPA Science Inventory

    Research is being conducted to develop approaches to improve soil and sediment sampling techniques, measurement design and geostatistics, and data analysis via chemometric, environmetric, and robust statistical methods. Improvements in sampling contaminated soil and other hetero...

  6. An audit of the statistics and the comparison with the parameter in the population

    NASA Astrophysics Data System (ADS)

    Bujang, Mohamad Adam; Sa'at, Nadiah; Joys, A. Reena; Ali, Mariana Mohamad

    2015-10-01

    The sufficient sample size that is needed to closely estimate the statistics for particular parameters are use to be an issue. Although sample size might had been calculated referring to objective of the study, however, it is difficult to confirm whether the statistics are closed with the parameter for a particular population. All these while, guideline that uses a p-value less than 0.05 is widely used as inferential evidence. Therefore, this study had audited results that were analyzed from various sub sample and statistical analyses and had compared the results with the parameters in three different populations. Eight types of statistical analysis and eight sub samples for each statistical analysis were analyzed. Results found that the statistics were consistent and were closed to the parameters when the sample study covered at least 15% to 35% of population. Larger sample size is needed to estimate parameter that involve with categorical variables compared with numerical variables. Sample sizes with 300 to 500 are sufficient to estimate the parameters for medium size of population.

  7. Effect of the absolute statistic on gene-sampling gene-set analysis methods.

    PubMed

    Nam, Dougu

    2017-06-01

    Gene-set enrichment analysis and its modified versions have commonly been used for identifying altered functions or pathways in disease from microarray data. In particular, the simple gene-sampling gene-set analysis methods have been heavily used for datasets with only a few sample replicates. The biggest problem with this approach is the highly inflated false-positive rate. In this paper, the effect of absolute gene statistic on gene-sampling gene-set analysis methods is systematically investigated. Thus far, the absolute gene statistic has merely been regarded as a supplementary method for capturing the bidirectional changes in each gene set. Here, it is shown that incorporating the absolute gene statistic in gene-sampling gene-set analysis substantially reduces the false-positive rate and improves the overall discriminatory ability. Its effect was investigated by power, false-positive rate, and receiver operating curve for a number of simulated and real datasets. The performances of gene-set analysis methods in one-tailed (genome-wide association study) and two-tailed (gene expression data) tests were also compared and discussed.

  8. Item Analysis Appropriate for Domain-Referenced Classroom Testing. (Project Technical Report Number 1).

    ERIC Educational Resources Information Center

    Nitko, Anthony J.; Hsu, Tse-chi

    Item analysis procedures appropriate for domain-referenced classroom testing are described. A conceptual framework within which item statistics can be considered and promising statistics in light of this framework are presented. The sampling fluctuations of the more promising item statistics for sample sizes comparable to the typical classroom…

  9. Developing Sampling Frame for Case Study: Challenges and Conditions

    ERIC Educational Resources Information Center

    Ishak, Noriah Mohd; Abu Bakar, Abu Yazid

    2014-01-01

    Due to statistical analysis, the issue of random sampling is pertinent to any quantitative study. Unlike quantitative study, the elimination of inferential statistical analysis, allows qualitative researchers to be more creative in dealing with sampling issue. Since results from qualitative study cannot be generalized to the bigger population,…

  10. Statistical power and optimal design in experiments in which samples of participants respond to samples of stimuli.

    PubMed

    Westfall, Jacob; Kenny, David A; Judd, Charles M

    2014-10-01

    Researchers designing experiments in which a sample of participants responds to a sample of stimuli are faced with difficult questions about optimal study design. The conventional procedures of statistical power analysis fail to provide appropriate answers to these questions because they are based on statistical models in which stimuli are not assumed to be a source of random variation in the data, models that are inappropriate for experiments involving crossed random factors of participants and stimuli. In this article, we present new methods of power analysis for designs with crossed random factors, and we give detailed, practical guidance to psychology researchers planning experiments in which a sample of participants responds to a sample of stimuli. We extensively examine 5 commonly used experimental designs, describe how to estimate statistical power in each, and provide power analysis results based on a reasonable set of default parameter values. We then develop general conclusions and formulate rules of thumb concerning the optimal design of experiments in which a sample of participants responds to a sample of stimuli. We show that in crossed designs, statistical power typically does not approach unity as the number of participants goes to infinity but instead approaches a maximum attainable power value that is possibly small, depending on the stimulus sample. We also consider the statistical merits of designs involving multiple stimulus blocks. Finally, we provide a simple and flexible Web-based power application to aid researchers in planning studies with samples of stimuli.

  11. Statistical analysis of hydrological response in urbanising catchments based on adaptive sampling using inter-amount times

    NASA Astrophysics Data System (ADS)

    ten Veldhuis, Marie-Claire; Schleiss, Marc

    2017-04-01

    In this study, we introduced an alternative approach for analysis of hydrological flow time series, using an adaptive sampling framework based on inter-amount times (IATs). The main difference with conventional flow time series is the rate at which low and high flows are sampled: the unit of analysis for IATs is a fixed flow amount, instead of a fixed time window. We analysed statistical distributions of flows and IATs across a wide range of sampling scales to investigate sensitivity of statistical properties such as quantiles, variance, skewness, scaling parameters and flashiness indicators to the sampling scale. We did this based on streamflow time series for 17 (semi)urbanised basins in North Carolina, US, ranging from 13 km2 to 238 km2 in size. Results showed that adaptive sampling of flow time series based on inter-amounts leads to a more balanced representation of low flow and peak flow values in the statistical distribution. While conventional sampling gives a lot of weight to low flows, as these are most ubiquitous in flow time series, IAT sampling gives relatively more weight to high flow values, when given flow amounts are accumulated in shorter time. As a consequence, IAT sampling gives more information about the tail of the distribution associated with high flows, while conventional sampling gives relatively more information about low flow periods. We will present results of statistical analyses across a range of subdaily to seasonal scales and will highlight some interesting insights that can be derived from IAT statistics with respect to basin flashiness and impact urbanisation on hydrological response.

  12. Statistical Analysis of Research Data | Center for Cancer Research

    Cancer.gov

    Recent advances in cancer biology have resulted in the need for increased statistical analysis of research data. The Statistical Analysis of Research Data (SARD) course will be held on April 5-6, 2018 from 9 a.m.-5 p.m. at the National Institutes of Health's Natcher Conference Center, Balcony C on the Bethesda Campus. SARD is designed to provide an overview on the general principles of statistical analysis of research data.  The first day will feature univariate data analysis, including descriptive statistics, probability distributions, one- and two-sample inferential statistics.

  13. [The research protocol VI: How to choose the appropriate statistical test. Inferential statistics].

    PubMed

    Flores-Ruiz, Eric; Miranda-Novales, María Guadalupe; Villasís-Keever, Miguel Ángel

    2017-01-01

    The statistical analysis can be divided in two main components: descriptive analysis and inferential analysis. An inference is to elaborate conclusions from the tests performed with the data obtained from a sample of a population. Statistical tests are used in order to establish the probability that a conclusion obtained from a sample is applicable to the population from which it was obtained. However, choosing the appropriate statistical test in general poses a challenge for novice researchers. To choose the statistical test it is necessary to take into account three aspects: the research design, the number of measurements and the scale of measurement of the variables. Statistical tests are divided into two sets, parametric and nonparametric. Parametric tests can only be used if the data show a normal distribution. Choosing the right statistical test will make it easier for readers to understand and apply the results.

  14. Examples of Data Analysis with SPSS-X.

    ERIC Educational Resources Information Center

    MacFarland, Thomas W.

    Intended for classroom use only, these unpublished notes contain computer lessons on descriptive statistics using SPSS-X Release 3.0 for VAX/UNIX. Statistical measures covered include Chi-square analysis; Spearman's rank correlation coefficient; Student's t-test with two independent samples; Student's t-test with a paired sample; One-way analysis…

  15. Qualitative Meta-Analysis on the Hospital Task: Implications for Research

    ERIC Educational Resources Information Center

    Noll, Jennifer; Sharma, Sashi

    2014-01-01

    The "law of large numbers" indicates that as sample size increases, sample statistics become less variable and more closely estimate their corresponding population parameters. Different research studies investigating how people consider sample size when evaluating the reliability of a sample statistic have found a wide range of…

  16. Statistical Power in Meta-Analysis

    ERIC Educational Resources Information Center

    Liu, Jin

    2015-01-01

    Statistical power is important in a meta-analysis study, although few studies have examined the performance of simulated power in meta-analysis. The purpose of this study is to inform researchers about statistical power estimation on two sample mean difference test under different situations: (1) the discrepancy between the analytical power and…

  17. The potential of statistical shape modelling for geometric morphometric analysis of human teeth in archaeological research

    PubMed Central

    Fernee, Christianne; Browne, Martin; Zakrzewski, Sonia

    2017-01-01

    This paper introduces statistical shape modelling (SSM) for use in osteoarchaeology research. SSM is a full field, multi-material analytical technique, and is presented as a supplementary geometric morphometric (GM) tool. Lower mandibular canines from two archaeological populations and one modern population were sampled, digitised using micro-CT, aligned, registered to a baseline and statistically modelled using principal component analysis (PCA). Sample material properties were incorporated as a binary enamel/dentin parameter. Results were assessed qualitatively and quantitatively using anatomical landmarks. Finally, the technique’s application was demonstrated for inter-sample comparison through analysis of the principal component (PC) weights. It was found that SSM could provide high detail qualitative and quantitative insight with respect to archaeological inter- and intra-sample variability. This technique has value for archaeological, biomechanical and forensic applications including identification, finite element analysis (FEA) and reconstruction from partial datasets. PMID:29216199

  18. Modified Distribution-Free Goodness-of-Fit Test Statistic.

    PubMed

    Chun, So Yeon; Browne, Michael W; Shapiro, Alexander

    2018-03-01

    Covariance structure analysis and its structural equation modeling extensions have become one of the most widely used methodologies in social sciences such as psychology, education, and economics. An important issue in such analysis is to assess the goodness of fit of a model under analysis. One of the most popular test statistics used in covariance structure analysis is the asymptotically distribution-free (ADF) test statistic introduced by Browne (Br J Math Stat Psychol 37:62-83, 1984). The ADF statistic can be used to test models without any specific distribution assumption (e.g., multivariate normal distribution) of the observed data. Despite its advantage, it has been shown in various empirical studies that unless sample sizes are extremely large, this ADF statistic could perform very poorly in practice. In this paper, we provide a theoretical explanation for this phenomenon and further propose a modified test statistic that improves the performance in samples of realistic size. The proposed statistic deals with the possible ill-conditioning of the involved large-scale covariance matrices.

  19. FUNSTAT and statistical image representations

    NASA Technical Reports Server (NTRS)

    Parzen, E.

    1983-01-01

    General ideas of functional statistical inference analysis of one sample and two samples, univariate and bivariate are outlined. ONESAM program is applied to analyze the univariate probability distributions of multi-spectral image data.

  20. A Third Moment Adjusted Test Statistic for Small Sample Factor Analysis.

    PubMed

    Lin, Johnny; Bentler, Peter M

    2012-01-01

    Goodness of fit testing in factor analysis is based on the assumption that the test statistic is asymptotically chi-square; but this property may not hold in small samples even when the factors and errors are normally distributed in the population. Robust methods such as Browne's asymptotically distribution-free method and Satorra Bentler's mean scaling statistic were developed under the presumption of non-normality in the factors and errors. This paper finds new application to the case where factors and errors are normally distributed in the population but the skewness of the obtained test statistic is still high due to sampling error in the observed indicators. An extension of Satorra Bentler's statistic is proposed that not only scales the mean but also adjusts the degrees of freedom based on the skewness of the obtained test statistic in order to improve its robustness under small samples. A simple simulation study shows that this third moment adjusted statistic asymptotically performs on par with previously proposed methods, and at a very small sample size offers superior Type I error rates under a properly specified model. Data from Mardia, Kent and Bibby's study of students tested for their ability in five content areas that were either open or closed book were used to illustrate the real-world performance of this statistic.

  1. Evidence for a Global Sampling Process in Extraction of Summary Statistics of Item Sizes in a Set.

    PubMed

    Tokita, Midori; Ueda, Sachiyo; Ishiguchi, Akira

    2016-01-01

    Several studies have shown that our visual system may construct a "summary statistical representation" over groups of visual objects. Although there is a general understanding that human observers can accurately represent sets of a variety of features, many questions on how summary statistics, such as an average, are computed remain unanswered. This study investigated sampling properties of visual information used by human observers to extract two types of summary statistics of item sets, average and variance. We presented three models of ideal observers to extract the summary statistics: a global sampling model without sampling noise, global sampling model with sampling noise, and limited sampling model. We compared the performance of an ideal observer of each model with that of human observers using statistical efficiency analysis. Results suggest that summary statistics of items in a set may be computed without representing individual items, which makes it possible to discard the limited sampling account. Moreover, the extraction of summary statistics may not necessarily require the representation of individual objects with focused attention when the sets of items are larger than 4.

  2. Statistical Inference for Data Adaptive Target Parameters.

    PubMed

    Hubbard, Alan E; Kherad-Pajouh, Sara; van der Laan, Mark J

    2016-05-01

    Consider one observes n i.i.d. copies of a random variable with a probability distribution that is known to be an element of a particular statistical model. In order to define our statistical target we partition the sample in V equal size sub-samples, and use this partitioning to define V splits in an estimation sample (one of the V subsamples) and corresponding complementary parameter-generating sample. For each of the V parameter-generating samples, we apply an algorithm that maps the sample to a statistical target parameter. We define our sample-split data adaptive statistical target parameter as the average of these V-sample specific target parameters. We present an estimator (and corresponding central limit theorem) of this type of data adaptive target parameter. This general methodology for generating data adaptive target parameters is demonstrated with a number of practical examples that highlight new opportunities for statistical learning from data. This new framework provides a rigorous statistical methodology for both exploratory and confirmatory analysis within the same data. Given that more research is becoming "data-driven", the theory developed within this paper provides a new impetus for a greater involvement of statistical inference into problems that are being increasingly addressed by clever, yet ad hoc pattern finding methods. To suggest such potential, and to verify the predictions of the theory, extensive simulation studies, along with a data analysis based on adaptively determined intervention rules are shown and give insight into how to structure such an approach. The results show that the data adaptive target parameter approach provides a general framework and resulting methodology for data-driven science.

  3. STATISTICAL ANALYSIS OF TANK 18F FLOOR SAMPLE RESULTS

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Harris, S.

    2010-09-02

    Representative sampling has been completed for characterization of the residual material on the floor of Tank 18F as per the statistical sampling plan developed by Shine [1]. Samples from eight locations have been obtained from the tank floor and two of the samples were archived as a contingency. Six samples, referred to in this report as the current scrape samples, have been submitted to and analyzed by SRNL [2]. This report contains the statistical analysis of the floor sample analytical results to determine if further data are needed to reduce uncertainty. Included are comparisons with the prior Mantis samples resultsmore » [3] to determine if they can be pooled with the current scrape samples to estimate the upper 95% confidence limits (UCL{sub 95%}) for concentration. Statistical analysis revealed that the Mantis and current scrape sample results are not compatible. Therefore, the Mantis sample results were not used to support the quantification of analytes in the residual material. Significant spatial variability among the current sample results was not found. Constituent concentrations were similar between the North and South hemispheres as well as between the inner and outer regions of the tank floor. The current scrape sample results from all six samples fall within their 3-sigma limits. In view of the results from numerous statistical tests, the data were pooled from all six current scrape samples. As such, an adequate sample size was provided for quantification of the residual material on the floor of Tank 18F. The uncertainty is quantified in this report by an upper 95% confidence limit (UCL{sub 95%}) on each analyte concentration. The uncertainty in analyte concentration was calculated as a function of the number of samples, the average, and the standard deviation of the analytical results. The UCL{sub 95%} was based entirely on the six current scrape sample results (each averaged across three analytical determinations).« less

  4. STATISTICAL ANALYSIS OF TANK 19F FLOOR SAMPLE RESULTS

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Harris, S.

    2010-09-02

    Representative sampling has been completed for characterization of the residual material on the floor of Tank 19F as per the statistical sampling plan developed by Harris and Shine. Samples from eight locations have been obtained from the tank floor and two of the samples were archived as a contingency. Six samples, referred to in this report as the current scrape samples, have been submitted to and analyzed by SRNL. This report contains the statistical analysis of the floor sample analytical results to determine if further data are needed to reduce uncertainty. Included are comparisons with the prior Mantis samples resultsmore » to determine if they can be pooled with the current scrape samples to estimate the upper 95% confidence limits (UCL95%) for concentration. Statistical analysis revealed that the Mantis and current scrape sample results are not compatible. Therefore, the Mantis sample results were not used to support the quantification of analytes in the residual material. Significant spatial variability among the current scrape sample results was not found. Constituent concentrations were similar between the North and South hemispheres as well as between the inner and outer regions of the tank floor. The current scrape sample results from all six samples fall within their 3-sigma limits. In view of the results from numerous statistical tests, the data were pooled from all six current scrape samples. As such, an adequate sample size was provided for quantification of the residual material on the floor of Tank 19F. The uncertainty is quantified in this report by an UCL95% on each analyte concentration. The uncertainty in analyte concentration was calculated as a function of the number of samples, the average, and the standard deviation of the analytical results. The UCL95% was based entirely on the six current scrape sample results (each averaged across three analytical determinations).« less

  5. Examples of Data Analysis with SPSS/PC+ Studentware.

    ERIC Educational Resources Information Center

    MacFarland, Thomas W.

    Intended for classroom use only, these unpublished notes contain computer lessons on descriptive statistics with files previously created in WordPerfect 4.2 and Lotus 1-2-3 Version 1.A for the IBM PC+. The statistical measures covered include Student's t-test with two independent samples; Student's t-test with a paired sample; Chi-square analysis;…

  6. Quantification of integrated HIV DNA by repetitive-sampling Alu-HIV PCR on the basis of poisson statistics.

    PubMed

    De Spiegelaere, Ward; Malatinkova, Eva; Lynch, Lindsay; Van Nieuwerburgh, Filip; Messiaen, Peter; O'Doherty, Una; Vandekerckhove, Linos

    2014-06-01

    Quantification of integrated proviral HIV DNA by repetitive-sampling Alu-HIV PCR is a candidate virological tool to monitor the HIV reservoir in patients. However, the experimental procedures and data analysis of the assay are complex and hinder its widespread use. Here, we provide an improved and simplified data analysis method by adopting binomial and Poisson statistics. A modified analysis method on the basis of Poisson statistics was used to analyze the binomial data of positive and negative reactions from a 42-replicate Alu-HIV PCR by use of dilutions of an integration standard and on samples of 57 HIV-infected patients. Results were compared with the quantitative output of the previously described Alu-HIV PCR method. Poisson-based quantification of the Alu-HIV PCR was linearly correlated with the standard dilution series, indicating that absolute quantification with the Poisson method is a valid alternative for data analysis of repetitive-sampling Alu-HIV PCR data. Quantitative outputs of patient samples assessed by the Poisson method correlated with the previously described Alu-HIV PCR analysis, indicating that this method is a valid alternative for quantifying integrated HIV DNA. Poisson-based analysis of the Alu-HIV PCR data enables absolute quantification without the need of a standard dilution curve. Implementation of the CI estimation permits improved qualitative analysis of the data and provides a statistical basis for the required minimal number of technical replicates. © 2014 The American Association for Clinical Chemistry.

  7. Sample size and power considerations in network meta-analysis

    PubMed Central

    2012-01-01

    Background Network meta-analysis is becoming increasingly popular for establishing comparative effectiveness among multiple interventions for the same disease. Network meta-analysis inherits all methodological challenges of standard pairwise meta-analysis, but with increased complexity due to the multitude of intervention comparisons. One issue that is now widely recognized in pairwise meta-analysis is the issue of sample size and statistical power. This issue, however, has so far only received little attention in network meta-analysis. To date, no approaches have been proposed for evaluating the adequacy of the sample size, and thus power, in a treatment network. Findings In this article, we develop easy-to-use flexible methods for estimating the ‘effective sample size’ in indirect comparison meta-analysis and network meta-analysis. The effective sample size for a particular treatment comparison can be interpreted as the number of patients in a pairwise meta-analysis that would provide the same degree and strength of evidence as that which is provided in the indirect comparison or network meta-analysis. We further develop methods for retrospectively estimating the statistical power for each comparison in a network meta-analysis. We illustrate the performance of the proposed methods for estimating effective sample size and statistical power using data from a network meta-analysis on interventions for smoking cessation including over 100 trials. Conclusion The proposed methods are easy to use and will be of high value to regulatory agencies and decision makers who must assess the strength of the evidence supporting comparative effectiveness estimates. PMID:22992327

  8. A Third Moment Adjusted Test Statistic for Small Sample Factor Analysis

    PubMed Central

    Lin, Johnny; Bentler, Peter M.

    2012-01-01

    Goodness of fit testing in factor analysis is based on the assumption that the test statistic is asymptotically chi-square; but this property may not hold in small samples even when the factors and errors are normally distributed in the population. Robust methods such as Browne’s asymptotically distribution-free method and Satorra Bentler’s mean scaling statistic were developed under the presumption of non-normality in the factors and errors. This paper finds new application to the case where factors and errors are normally distributed in the population but the skewness of the obtained test statistic is still high due to sampling error in the observed indicators. An extension of Satorra Bentler’s statistic is proposed that not only scales the mean but also adjusts the degrees of freedom based on the skewness of the obtained test statistic in order to improve its robustness under small samples. A simple simulation study shows that this third moment adjusted statistic asymptotically performs on par with previously proposed methods, and at a very small sample size offers superior Type I error rates under a properly specified model. Data from Mardia, Kent and Bibby’s study of students tested for their ability in five content areas that were either open or closed book were used to illustrate the real-world performance of this statistic. PMID:23144511

  9. A Guideline to Univariate Statistical Analysis for LC/MS-Based Untargeted Metabolomics-Derived Data

    PubMed Central

    Vinaixa, Maria; Samino, Sara; Saez, Isabel; Duran, Jordi; Guinovart, Joan J.; Yanes, Oscar

    2012-01-01

    Several metabolomic software programs provide methods for peak picking, retention time alignment and quantification of metabolite features in LC/MS-based metabolomics. Statistical analysis, however, is needed in order to discover those features significantly altered between samples. By comparing the retention time and MS/MS data of a model compound to that from the altered feature of interest in the research sample, metabolites can be then unequivocally identified. This paper reports on a comprehensive overview of a workflow for statistical analysis to rank relevant metabolite features that will be selected for further MS/MS experiments. We focus on univariate data analysis applied in parallel on all detected features. Characteristics and challenges of this analysis are discussed and illustrated using four different real LC/MS untargeted metabolomic datasets. We demonstrate the influence of considering or violating mathematical assumptions on which univariate statistical test rely, using high-dimensional LC/MS datasets. Issues in data analysis such as determination of sample size, analytical variation, assumption of normality and homocedasticity, or correction for multiple testing are discussed and illustrated in the context of our four untargeted LC/MS working examples. PMID:24957762

  10. A Guideline to Univariate Statistical Analysis for LC/MS-Based Untargeted Metabolomics-Derived Data.

    PubMed

    Vinaixa, Maria; Samino, Sara; Saez, Isabel; Duran, Jordi; Guinovart, Joan J; Yanes, Oscar

    2012-10-18

    Several metabolomic software programs provide methods for peak picking, retention time alignment and quantification of metabolite features in LC/MS-based metabolomics. Statistical analysis, however, is needed in order to discover those features significantly altered between samples. By comparing the retention time and MS/MS data of a model compound to that from the altered feature of interest in the research sample, metabolites can be then unequivocally identified. This paper reports on a comprehensive overview of a workflow for statistical analysis to rank relevant metabolite features that will be selected for further MS/MS experiments. We focus on univariate data analysis applied in parallel on all detected features. Characteristics and challenges of this analysis are discussed and illustrated using four different real LC/MS untargeted metabolomic datasets. We demonstrate the influence of considering or violating mathematical assumptions on which univariate statistical test rely, using high-dimensional LC/MS datasets. Issues in data analysis such as determination of sample size, analytical variation, assumption of normality and homocedasticity, or correction for multiple testing are discussed and illustrated in the context of our four untargeted LC/MS working examples.

  11. Statistical analysis of arsenic contamination in drinking water in a city of Iran and its modeling using GIS.

    PubMed

    Sadeghi, Fatemeh; Nasseri, Simin; Mosaferi, Mohammad; Nabizadeh, Ramin; Yunesian, Masud; Mesdaghinia, Alireza

    2017-05-01

    In this research, probable arsenic contamination in drinking water in the city of Ardabil was studied in 163 samples during four seasons. In each season, sampling was carried out randomly in the study area. Results were analyzed statistically applying SPSS 19 software, and the data was also modeled by Arc GIS 10.1 software. The maximum permissible arsenic concentration in drinking water defined by the World Health Organization and Iranian national standard is 10 μg/L. Statistical analysis showed 75, 88, 47, and 69% of samples in autumn, winter, spring, and summer, respectively, had concentrations higher than the national standard. The mean concentrations of arsenic in autumn, winter, spring, and summer were 19.89, 15.9, 10.87, and 14.6 μg/L, respectively, and the overall average in all samples through the year was 15.32 μg/L. Although GIS outputs indicated that the concentration distribution profiles changed in four consecutive seasons, variance analysis of the results showed that statistically there is no significant difference in arsenic levels in four seasons.

  12. Comparative forensic soil analysis of New Jersey state parks using a combination of simple techniques with multivariate statistics.

    PubMed

    Bonetti, Jennifer; Quarino, Lawrence

    2014-05-01

    This study has shown that the combination of simple techniques with the use of multivariate statistics offers the potential for the comparative analysis of soil samples. Five samples were obtained from each of twelve state parks across New Jersey in both the summer and fall seasons. Each sample was examined using particle-size distribution, pH analysis in both water and 1 M CaCl2 , and a loss on ignition technique. Data from each of the techniques were combined, and principal component analysis (PCA) and canonical discriminant analysis (CDA) were used for multivariate data transformation. Samples from different locations could be visually differentiated from one another using these multivariate plots. Hold-one-out cross-validation analysis showed error rates as low as 3.33%. Ten blind study samples were analyzed resulting in no misclassifications using Mahalanobis distance calculations and visual examinations of multivariate plots. Seasonal variation was minimal between corresponding samples, suggesting potential success in forensic applications. © 2014 American Academy of Forensic Sciences.

  13. [A comparison of convenience sampling and purposive sampling].

    PubMed

    Suen, Lee-Jen Wu; Huang, Hui-Man; Lee, Hao-Hsien

    2014-06-01

    Convenience sampling and purposive sampling are two different sampling methods. This article first explains sampling terms such as target population, accessible population, simple random sampling, intended sample, actual sample, and statistical power analysis. These terms are then used to explain the difference between "convenience sampling" and purposive sampling." Convenience sampling is a non-probabilistic sampling technique applicable to qualitative or quantitative studies, although it is most frequently used in quantitative studies. In convenience samples, subjects more readily accessible to the researcher are more likely to be included. Thus, in quantitative studies, opportunity to participate is not equal for all qualified individuals in the target population and study results are not necessarily generalizable to this population. As in all quantitative studies, increasing the sample size increases the statistical power of the convenience sample. In contrast, purposive sampling is typically used in qualitative studies. Researchers who use this technique carefully select subjects based on study purpose with the expectation that each participant will provide unique and rich information of value to the study. As a result, members of the accessible population are not interchangeable and sample size is determined by data saturation not by statistical power analysis.

  14. Genome-wide association analysis of secondary imaging phenotypes from the Alzheimer's disease neuroimaging initiative study.

    PubMed

    Zhu, Wensheng; Yuan, Ying; Zhang, Jingwen; Zhou, Fan; Knickmeyer, Rebecca C; Zhu, Hongtu

    2017-02-01

    The aim of this paper is to systematically evaluate a biased sampling issue associated with genome-wide association analysis (GWAS) of imaging phenotypes for most imaging genetic studies, including the Alzheimer's Disease Neuroimaging Initiative (ADNI). Specifically, the original sampling scheme of these imaging genetic studies is primarily the retrospective case-control design, whereas most existing statistical analyses of these studies ignore such sampling scheme by directly correlating imaging phenotypes (called the secondary traits) with genotype. Although it has been well documented in genetic epidemiology that ignoring the case-control sampling scheme can produce highly biased estimates, and subsequently lead to misleading results and suspicious associations, such findings are not well documented in imaging genetics. We use extensive simulations and a large-scale imaging genetic data analysis of the Alzheimer's Disease Neuroimaging Initiative (ADNI) data to evaluate the effects of the case-control sampling scheme on GWAS results based on some standard statistical methods, such as linear regression methods, while comparing it with several advanced statistical methods that appropriately adjust for the case-control sampling scheme. Copyright © 2016 Elsevier Inc. All rights reserved.

  15. Assessing Statistically Significant Heavy-Metal Concentrations in Abandoned Mine Areas via Hot Spot Analysis of Portable XRF Data

    PubMed Central

    Kim, Sung-Min; Choi, Yosoon

    2017-01-01

    To develop appropriate measures to prevent soil contamination in abandoned mining areas, an understanding of the spatial variation of the potentially toxic trace elements (PTEs) in the soil is necessary. For the purpose of effective soil sampling, this study uses hot spot analysis, which calculates a z-score based on the Getis-Ord Gi* statistic to identify a statistically significant hot spot sample. To constitute a statistically significant hot spot, a feature with a high value should also be surrounded by other features with high values. Using relatively cost- and time-effective portable X-ray fluorescence (PXRF) analysis, sufficient input data are acquired from the Busan abandoned mine and used for hot spot analysis. To calibrate the PXRF data, which have a relatively low accuracy, the PXRF analysis data are transformed using the inductively coupled plasma atomic emission spectrometry (ICP-AES) data. The transformed PXRF data of the Busan abandoned mine are classified into four groups according to their normalized content and z-scores: high content with a high z-score (HH), high content with a low z-score (HL), low content with a high z-score (LH), and low content with a low z-score (LL). The HL and LH cases may be due to measurement errors. Additional or complementary surveys are required for the areas surrounding these suspect samples or for significant hot spot areas. The soil sampling is conducted according to a four-phase procedure in which the hot spot analysis and proposed group classification method are employed to support the development of a sampling plan for the following phase. Overall, 30, 50, 80, and 100 samples are investigated and analyzed in phases 1–4, respectively. The method implemented in this case study may be utilized in the field for the assessment of statistically significant soil contamination and the identification of areas for which an additional survey is required. PMID:28629168

  16. Assessing Statistically Significant Heavy-Metal Concentrations in Abandoned Mine Areas via Hot Spot Analysis of Portable XRF Data.

    PubMed

    Kim, Sung-Min; Choi, Yosoon

    2017-06-18

    To develop appropriate measures to prevent soil contamination in abandoned mining areas, an understanding of the spatial variation of the potentially toxic trace elements (PTEs) in the soil is necessary. For the purpose of effective soil sampling, this study uses hot spot analysis, which calculates a z -score based on the Getis-Ord Gi* statistic to identify a statistically significant hot spot sample. To constitute a statistically significant hot spot, a feature with a high value should also be surrounded by other features with high values. Using relatively cost- and time-effective portable X-ray fluorescence (PXRF) analysis, sufficient input data are acquired from the Busan abandoned mine and used for hot spot analysis. To calibrate the PXRF data, which have a relatively low accuracy, the PXRF analysis data are transformed using the inductively coupled plasma atomic emission spectrometry (ICP-AES) data. The transformed PXRF data of the Busan abandoned mine are classified into four groups according to their normalized content and z -scores: high content with a high z -score (HH), high content with a low z -score (HL), low content with a high z -score (LH), and low content with a low z -score (LL). The HL and LH cases may be due to measurement errors. Additional or complementary surveys are required for the areas surrounding these suspect samples or for significant hot spot areas. The soil sampling is conducted according to a four-phase procedure in which the hot spot analysis and proposed group classification method are employed to support the development of a sampling plan for the following phase. Overall, 30, 50, 80, and 100 samples are investigated and analyzed in phases 1-4, respectively. The method implemented in this case study may be utilized in the field for the assessment of statistically significant soil contamination and the identification of areas for which an additional survey is required.

  17. Analysis of select Dalbergia and trade timber using direct analysis in real time and time-of-flight mass spectrometry for CITES enforcement.

    PubMed

    Lancaster, Cady; Espinoza, Edgard

    2012-05-15

    International trade of several Dalbergia wood species is regulated by The Convention on International Trade in Endangered Species of Wild Fauna and Flora (CITES). In order to supplement morphological identification of these species, a rapid chemical method of analysis was developed. Using Direct Analysis in Real Time (DART) ionization coupled with Time-of-Flight (TOF) Mass Spectrometry (MS), selected Dalbergia and common trade species were analyzed. Each of the 13 wood species was classified using principal component analysis and linear discriminant analysis (LDA). These statistical data clusters served as reliable anchors for species identification of unknowns. Analysis of 20 or more samples from the 13 species studied in this research indicates that the DART-TOFMS results are reproducible. Statistical analysis of the most abundant ions gave good classifications that were useful for identifying unknown wood samples. DART-TOFMS and LDA analysis of 13 species of selected timber samples and the statistical classification allowed for the correct assignment of unknown wood samples. This method is rapid and can be useful when anatomical identification is difficult but needed in order to support CITES enforcement. Published 2012. This article is a US Government work and is in the public domain in the USA.

  18. The Math Problem: Advertising Students' Attitudes toward Statistics

    ERIC Educational Resources Information Center

    Fullerton, Jami A.; Kendrick, Alice

    2013-01-01

    This study used the Students' Attitudes toward Statistics Scale (STATS) to measure attitude toward statistics among a national sample of advertising students. A factor analysis revealed four underlying factors make up the attitude toward statistics construct--"Interest & Future Applicability," "Confidence," "Statistical Tools," and "Initiative."…

  19. Statistics 101 for Radiologists.

    PubMed

    Anvari, Arash; Halpern, Elkan F; Samir, Anthony E

    2015-10-01

    Diagnostic tests have wide clinical applications, including screening, diagnosis, measuring treatment effect, and determining prognosis. Interpreting diagnostic test results requires an understanding of key statistical concepts used to evaluate test efficacy. This review explains descriptive statistics and discusses probability, including mutually exclusive and independent events and conditional probability. In the inferential statistics section, a statistical perspective on study design is provided, together with an explanation of how to select appropriate statistical tests. Key concepts in recruiting study samples are discussed, including representativeness and random sampling. Variable types are defined, including predictor, outcome, and covariate variables, and the relationship of these variables to one another. In the hypothesis testing section, we explain how to determine if observed differences between groups are likely to be due to chance. We explain type I and II errors, statistical significance, and study power, followed by an explanation of effect sizes and how confidence intervals can be used to generalize observed effect sizes to the larger population. Statistical tests are explained in four categories: t tests and analysis of variance, proportion analysis tests, nonparametric tests, and regression techniques. We discuss sensitivity, specificity, accuracy, receiver operating characteristic analysis, and likelihood ratios. Measures of reliability and agreement, including κ statistics, intraclass correlation coefficients, and Bland-Altman graphs and analysis, are introduced. © RSNA, 2015.

  20. Methods for evaluating temporal groundwater quality data and results of decadal-scale changes in chloride, dissolved solids, and nitrate concentrations in groundwater in the United States, 1988-2010

    USGS Publications Warehouse

    Lindsey, Bruce D.; Rupert, Michael G.

    2012-01-01

    Decadal-scale changes in groundwater quality were evaluated by the U.S. Geological Survey National Water-Quality Assessment (NAWQA) Program. Samples of groundwater collected from wells during 1988-2000 - a first sampling event representing the decade ending the 20th century - were compared on a pair-wise basis to samples from the same wells collected during 2001-2010 - a second sampling event representing the decade beginning the 21st century. The data set consists of samples from 1,236 wells in 56 well networks, representing major aquifers and urban and agricultural land-use areas, with analytical results for chloride, dissolved solids, and nitrate. Statistical analysis was done on a network basis rather than by individual wells. Although spanning slightly more or less than a 10-year period, the two-sample comparison between the first and second sampling events is referred to as an analysis of decadal-scale change based on a step-trend analysis. The 22 principal aquifers represented by these 56 networks account for nearly 80 percent of the estimated withdrawals of groundwater used for drinking-water supply in the Nation. Well networks where decadal-scale changes in concentrations were statistically significant were identified using the Wilcoxon-Pratt signed-rank test. For the statistical analysis of chloride, dissolved solids, and nitrate concentrations at the network level, more than half revealed no statistically significant change over the decadal period. However, for networks that had statistically significant changes, increased concentrations outnumbered decreased concentrations by a large margin. Statistically significant increases of chloride concentrations were identified for 43 percent of 56 networks. Dissolved solids concentrations increased significantly in 41 percent of the 54 networks with dissolved solids data, and nitrate concentrations increased significantly in 23 percent of 56 networks. At least one of the three - chloride, dissolved solids, or nitrate - had a statistically significant increase in concentration in 66 percent of the networks. Statistically significant decreases in concentrations were identified in 4 percent of the networks for chloride, 2 percent of the networks for dissolved solids, and 9 percent of the networks for nitrate. A larger percentage of urban land-use networks had statistically significant increases in chloride, dissolved solids, and nitrate concentrations than agricultural land-use networks. In order to assess the magnitude of statistically significant changes, the median of the differences between constituent concentrations from the first full-network sampling event and those from the second full-network sampling event was calculated using the Turnbull method. The largest median decadal increases in chloride concentrations were in networks in the Upper Illinois River Basin (67 mg/L) and in the New England Coastal Basins (34 mg/L), whereas the largest median decadal decrease in chloride concentrations was in the Upper Snake River Basin (1 mg/L). The largest median decadal increases in dissolved solids concentrations were in networks in the Rio Grande Valley (260 mg/L) and the Upper Illinois River Basin (160 mg/L). The largest median decadal decrease in dissolved solids concentrations was in the Apalachicola-Chattahoochee-Flint River Basin (6.0 mg/L). The largest median decadal increases in nitrate as nitrogen (N) concentrations were in networks in the South Platte River Basin (2.0 mg/L as N) and the San Joaquin-Tulare Basins (1.0 mg/L as N). The largest median decadal decrease in nitrate concentrations was in the Santee River Basin and Coastal Drainages (0.63 mg/L). The magnitude of change in networks with statistically significant increases typically was much larger than the magnitude of change in networks with statistically significant decreases. The magnitude of change was greatest for chloride in the urban land-use networks and greatest for dissolved solids and nitrate in the agricultural land-use networks. Analysis of data from all networks combined indicated statistically significant increases for chloride, dissolved solids, and nitrate. Although chloride, dissolved solids, and nitrate concentrations were typically less than the drinking-water standards and guidelines, a statistical test was used to determine whether or not the proportion of samples exceeding the drinking-water standard or guideline changed significantly between the first and second full-network sampling events. The proportion of samples exceeding the U.S. Environmental Protection Agency (USEPA) Secondary Maximum Contaminant Level for dissolved solids (500 milligrams per liter) increased significantly between the first and second full-network sampling events when evaluating all networks combined at the national level. Also, for all networks combined, the proportion of samples exceeding the USEPA Maximum Contaminant Level (MCL) of 10 mg/L as N for nitrate increased significantly. One network in the Delmarva Peninsula had a significant increase in the proportion of samples exceeding the MCL for nitrate. A subset of 261 wells was sampled every other year (biennially) to evaluate decadal-scale changes using a time-series analysis. The analysis of the biennial data set showed that changes were generally similar to the findings from the analysis of decadal-scale change that was based on a step-trend analysis. Because of the small number of wells in a network with biennial data (typically 4-5 wells), the time-series analysis is more useful for understanding water-quality responses to changes in site-specific conditions rather than as an indicator of the change for the entire network.

  1. Integrated GIS and multivariate statistical analysis for regional scale assessment of heavy metal soil contamination: A critical review.

    PubMed

    Hou, Deyi; O'Connor, David; Nathanail, Paul; Tian, Li; Ma, Yan

    2017-12-01

    Heavy metal soil contamination is associated with potential toxicity to humans or ecotoxicity. Scholars have increasingly used a combination of geographical information science (GIS) with geostatistical and multivariate statistical analysis techniques to examine the spatial distribution of heavy metals in soils at a regional scale. A review of such studies showed that most soil sampling programs were based on grid patterns and composite sampling methodologies. Many programs intended to characterize various soil types and land use types. The most often used sampling depth intervals were 0-0.10 m, or 0-0.20 m, below surface; and the sampling densities used ranged from 0.0004 to 6.1 samples per km 2 , with a median of 0.4 samples per km 2 . The most widely used spatial interpolators were inverse distance weighted interpolation and ordinary kriging; and the most often used multivariate statistical analysis techniques were principal component analysis and cluster analysis. The review also identified several determining and correlating factors in heavy metal distribution in soils, including soil type, soil pH, soil organic matter, land use type, Fe, Al, and heavy metal concentrations. The major natural and anthropogenic sources of heavy metals were found to derive from lithogenic origin, roadway and transportation, atmospheric deposition, wastewater and runoff from industrial and mining facilities, fertilizer application, livestock manure, and sewage sludge. This review argues that the full potential of integrated GIS and multivariate statistical analysis for assessing heavy metal distribution in soils on a regional scale has not yet been fully realized. It is proposed that future research be conducted to map multivariate results in GIS to pinpoint specific anthropogenic sources, to analyze temporal trends in addition to spatial patterns, to optimize modeling parameters, and to expand the use of different multivariate analysis tools beyond principal component analysis (PCA) and cluster analysis (CA). Copyright © 2017 Elsevier Ltd. All rights reserved.

  2. mapDIA: Preprocessing and statistical analysis of quantitative proteomics data from data independent acquisition mass spectrometry.

    PubMed

    Teo, Guoshou; Kim, Sinae; Tsou, Chih-Chiang; Collins, Ben; Gingras, Anne-Claude; Nesvizhskii, Alexey I; Choi, Hyungwon

    2015-11-03

    Data independent acquisition (DIA) mass spectrometry is an emerging technique that offers more complete detection and quantification of peptides and proteins across multiple samples. DIA allows fragment-level quantification, which can be considered as repeated measurements of the abundance of the corresponding peptides and proteins in the downstream statistical analysis. However, few statistical approaches are available for aggregating these complex fragment-level data into peptide- or protein-level statistical summaries. In this work, we describe a software package, mapDIA, for statistical analysis of differential protein expression using DIA fragment-level intensities. The workflow consists of three major steps: intensity normalization, peptide/fragment selection, and statistical analysis. First, mapDIA offers normalization of fragment-level intensities by total intensity sums as well as a novel alternative normalization by local intensity sums in retention time space. Second, mapDIA removes outlier observations and selects peptides/fragments that preserve the major quantitative patterns across all samples for each protein. Last, using the selected fragments and peptides, mapDIA performs model-based statistical significance analysis of protein-level differential expression between specified groups of samples. Using a comprehensive set of simulation datasets, we show that mapDIA detects differentially expressed proteins with accurate control of the false discovery rates. We also describe the analysis procedure in detail using two recently published DIA datasets generated for 14-3-3β dynamic interaction network and prostate cancer glycoproteome. The software was written in C++ language and the source code is available for free through SourceForge website http://sourceforge.net/projects/mapdia/.This article is part of a Special Issue entitled: Computational Proteomics. Copyright © 2015 Elsevier B.V. All rights reserved.

  3. On the analysis of very small samples of Gaussian repeated measurements: an alternative approach.

    PubMed

    Westgate, Philip M; Burchett, Woodrow W

    2017-03-15

    The analysis of very small samples of Gaussian repeated measurements can be challenging. First, due to a very small number of independent subjects contributing outcomes over time, statistical power can be quite small. Second, nuisance covariance parameters must be appropriately accounted for in the analysis in order to maintain the nominal test size. However, available statistical strategies that ensure valid statistical inference may lack power, whereas more powerful methods may have the potential for inflated test sizes. Therefore, we explore an alternative approach to the analysis of very small samples of Gaussian repeated measurements, with the goal of maintaining valid inference while also improving statistical power relative to other valid methods. This approach uses generalized estimating equations with a bias-corrected empirical covariance matrix that accounts for all small-sample aspects of nuisance correlation parameter estimation in order to maintain valid inference. Furthermore, the approach utilizes correlation selection strategies with the goal of choosing the working structure that will result in the greatest power. In our study, we show that when accurate modeling of the nuisance correlation structure impacts the efficiency of regression parameter estimation, this method can improve power relative to existing methods that yield valid inference. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.

  4. Analysis of Variance with Summary Statistics in Microsoft® Excel®

    ERIC Educational Resources Information Center

    Larson, David A.; Hsu, Ko-Cheng

    2010-01-01

    Students regularly are asked to solve Single Factor Analysis of Variance problems given only the sample summary statistics (number of observations per category, category means, and corresponding category standard deviations). Most undergraduate students today use Excel for data analysis of this type. However, Excel, like all other statistical…

  5. A Meta-analysis of Gender Differences in Applied Statistics Achievement.

    ERIC Educational Resources Information Center

    Schram, Christine M.

    1996-01-01

    A meta-analysis of gender differences examined statistics achievement in postsecondary level psychology, education, and business courses. Analysis of 13 articles (18 samples) found that undergraduate males had an advantage, outscoring females when the outcome was a series of examinations. Females outscored males when the outcome was total course…

  6. A First Assignment to Create Student Buy-In in an Introductory Business Statistics Course

    ERIC Educational Resources Information Center

    Newfeld, Daria

    2016-01-01

    This paper presents a sample assignment to be administered after the first two weeks of an introductory business focused statistics course in order to promote student buy-in. This assignment integrates graphical displays of data, descriptive statistics and cross-tabulation analysis through the lens of a marketing analysis study. A marketing sample…

  7. EVALUATION OF A NEW MEAN SCALED AND MOMENT ADJUSTED TEST STATISTIC FOR SEM.

    PubMed

    Tong, Xiaoxiao; Bentler, Peter M

    2013-01-01

    Recently a new mean scaled and skewness adjusted test statistic was developed for evaluating structural equation models in small samples and with potentially nonnormal data, but this statistic has received only limited evaluation. The performance of this statistic is compared to normal theory maximum likelihood and two well-known robust test statistics. A modification to the Satorra-Bentler scaled statistic is developed for the condition that sample size is smaller than degrees of freedom. The behavior of the four test statistics is evaluated with a Monte Carlo confirmatory factor analysis study that varies seven sample sizes and three distributional conditions obtained using Headrick's fifth-order transformation to nonnormality. The new statistic performs badly in most conditions except under the normal distribution. The goodness-of-fit χ(2) test based on maximum-likelihood estimation performed well under normal distributions as well as under a condition of asymptotic robustness. The Satorra-Bentler scaled test statistic performed best overall, while the mean scaled and variance adjusted test statistic outperformed the others at small and moderate sample sizes under certain distributional conditions.

  8. The classification of secondary colorectal liver cancer in human biopsy samples using angular dispersive x-ray diffraction and multivariate analysis

    NASA Astrophysics Data System (ADS)

    Theodorakou, Chrysoula; Farquharson, Michael J.

    2009-08-01

    The motivation behind this study is to assess whether angular dispersive x-ray diffraction (ADXRD) data, processed using multivariate analysis techniques, can be used for classifying secondary colorectal liver cancer tissue and normal surrounding liver tissue in human liver biopsy samples. The ADXRD profiles from a total of 60 samples of normal liver tissue and colorectal liver metastases were measured using a synchrotron radiation source. The data were analysed for 56 samples using nonlinear peak-fitting software. Four peaks were fitted to all of the ADXRD profiles, and the amplitude, area, amplitude and area ratios for three of the four peaks were calculated and used for the statistical and multivariate analysis. The statistical analysis showed that there are significant differences between all the peak-fitting parameters and ratios between the normal and the diseased tissue groups. The technique of soft independent modelling of class analogy (SIMCA) was used to classify normal liver tissue and colorectal liver metastases resulting in 67% of the normal tissue samples and 60% of the secondary colorectal liver tissue samples being classified correctly. This study has shown that the ADXRD data of normal and secondary colorectal liver cancer are statistically different and x-ray diffraction data analysed using multivariate analysis have the potential to be used as a method of tissue classification.

  9. Assessment of statistical uncertainty in the quantitative analysis of solid samples in motion using laser-induced breakdown spectroscopy

    NASA Astrophysics Data System (ADS)

    Cabalín, L. M.; González, A.; Ruiz, J.; Laserna, J. J.

    2010-08-01

    Statistical uncertainty in the quantitative analysis of solid samples in motion by laser-induced breakdown spectroscopy (LIBS) has been assessed. For this purpose, a LIBS demonstrator was designed and constructed in our laboratory. The LIBS system consisted of a laboratory-scale conveyor belt, a compact optical module and a Nd:YAG laser operating at 532 nm. The speed of the conveyor belt was variable and could be adjusted up to a maximum speed of 2 m s - 1 . Statistical uncertainty in the analytical measurements was estimated in terms of precision (reproducibility and repeatability) and accuracy. The results obtained by LIBS on shredded scrap samples under real conditions have demonstrated that the analytical precision and accuracy of LIBS is dependent on the sample geometry, position on the conveyor belt and surface cleanliness. Flat, relatively clean scrap samples exhibited acceptable reproducibility and repeatability; by contrast, samples with an irregular shape or a dirty surface exhibited a poor relative standard deviation.

  10. Computer program uses Monte Carlo techniques for statistical system performance analysis

    NASA Technical Reports Server (NTRS)

    Wohl, D. P.

    1967-01-01

    Computer program with Monte Carlo sampling techniques determines the effect of a component part of a unit upon the overall system performance. It utilizes the full statistics of the disturbances and misalignments of each component to provide unbiased results through simulated random sampling.

  11. A review of empirical research related to the use of small quantitative samples in clinical outcome scale development.

    PubMed

    Houts, Carrie R; Edwards, Michael C; Wirth, R J; Deal, Linda S

    2016-11-01

    There has been a notable increase in the advocacy of using small-sample designs as an initial quantitative assessment of item and scale performance during the scale development process. This is particularly true in the development of clinical outcome assessments (COAs), where Rasch analysis has been advanced as an appropriate statistical tool for evaluating the developing COAs using a small sample. We review the benefits such methods are purported to offer from both a practical and statistical standpoint and detail several problematic areas, including both practical and statistical theory concerns, with respect to the use of quantitative methods, including Rasch-consistent methods, with small samples. The feasibility of obtaining accurate information and the potential negative impacts of misusing large-sample statistical methods with small samples during COA development are discussed.

  12. Differential gene expression detection and sample classification using penalized linear regression models.

    PubMed

    Wu, Baolin

    2006-02-15

    Differential gene expression detection and sample classification using microarray data have received much research interest recently. Owing to the large number of genes p and small number of samples n (p > n), microarray data analysis poses big challenges for statistical analysis. An obvious problem owing to the 'large p small n' is over-fitting. Just by chance, we are likely to find some non-differentially expressed genes that can classify the samples very well. The idea of shrinkage is to regularize the model parameters to reduce the effects of noise and produce reliable inferences. Shrinkage has been successfully applied in the microarray data analysis. The SAM statistics proposed by Tusher et al. and the 'nearest shrunken centroid' proposed by Tibshirani et al. are ad hoc shrinkage methods. Both methods are simple, intuitive and prove to be useful in empirical studies. Recently Wu proposed the penalized t/F-statistics with shrinkage by formally using the (1) penalized linear regression models for two-class microarray data, showing good performance. In this paper we systematically discussed the use of penalized regression models for analyzing microarray data. We generalize the two-class penalized t/F-statistics proposed by Wu to multi-class microarray data. We formally derive the ad hoc shrunken centroid used by Tibshirani et al. using the (1) penalized regression models. And we show that the penalized linear regression models provide a rigorous and unified statistical framework for sample classification and differential gene expression detection.

  13. Statistical Analysis for Collision-free Boson Sampling.

    PubMed

    Huang, He-Liang; Zhong, Han-Sen; Li, Tan; Li, Feng-Guang; Fu, Xiang-Qun; Zhang, Shuo; Wang, Xiang; Bao, Wan-Su

    2017-11-10

    Boson sampling is strongly believed to be intractable for classical computers but solvable with photons in linear optics, which raises widespread concern as a rapid way to demonstrate the quantum supremacy. However, due to its solution is mathematically unverifiable, how to certify the experimental results becomes a major difficulty in the boson sampling experiment. Here, we develop a statistical analysis scheme to experimentally certify the collision-free boson sampling. Numerical simulations are performed to show the feasibility and practicability of our scheme, and the effects of realistic experimental conditions are also considered, demonstrating that our proposed scheme is experimentally friendly. Moreover, our broad approach is expected to be generally applied to investigate multi-particle coherent dynamics beyond the boson sampling.

  14. Statistical description of non-Gaussian samples in the F2 layer of the ionosphere during heliogeophysical disturbances

    NASA Astrophysics Data System (ADS)

    Sergeenko, N. P.

    2017-11-01

    An adequate statistical method should be developed in order to predict probabilistically the range of ionospheric parameters. This problem is solved in this paper. The time series of the critical frequency of the layer F2- foF2( t) were subjected to statistical processing. For the obtained samples {δ foF2}, statistical distributions and invariants up to the fourth order are calculated. The analysis shows that the distributions differ from the Gaussian law during the disturbances. At levels of sufficiently small probability distributions, there are arbitrarily large deviations from the model of the normal process. Therefore, it is attempted to describe statistical samples {δ foF2} based on the Poisson model. For the studied samples, the exponential characteristic function is selected under the assumption that time series are a superposition of some deterministic and random processes. Using the Fourier transform, the characteristic function is transformed into a nonholomorphic excessive-asymmetric probability-density function. The statistical distributions of the samples {δ foF2} calculated for the disturbed periods are compared with the obtained model distribution function. According to the Kolmogorov's criterion, the probabilities of the coincidence of a posteriori distributions with the theoretical ones are P 0.7-0.9. The conducted analysis makes it possible to draw a conclusion about the applicability of a model based on the Poisson random process for the statistical description and probabilistic variation estimates during heliogeophysical disturbances of the variations {δ foF2}.

  15. Sampling and Data Analysis for Environmental Microbiology

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Murray, Christopher J.

    2001-06-01

    A brief review of the literature indicates the importance of statistical analysis in applied and environmental microbiology. Sampling designs are particularly important for successful studies, and it is highly recommended that researchers review their sampling design before heading to the laboratory or the field. Most statisticians have numerous stories of scientists who approached them after their study was complete only to have to tell them that the data they gathered could not be used to test the hypothesis they wanted to address. Once the data are gathered, a large and complex body of statistical techniques are available for analysis ofmore » the data. Those methods include both numerical and graphical techniques for exploratory characterization of the data. Hypothesis testing and analysis of variance (ANOVA) are techniques that can be used to compare the mean and variance of two or more groups of samples. Regression can be used to examine the relationships between sets of variables and is often used to examine the dependence of microbiological populations on microbiological parameters. Multivariate statistics provides several methods that can be used for interpretation of datasets with a large number of variables and to partition samples into similar groups, a task that is very common in taxonomy, but also has applications in other fields of microbiology. Geostatistics and other techniques have been used to examine the spatial distribution of microorganisms. The objectives of this chapter are to provide a brief survey of some of the statistical techniques that can be used for sample design and data analysis of microbiological data in environmental studies, and to provide some examples of their use from the literature.« less

  16. Aspects of First Year Statistics Students' Reasoning When Performing Intuitive Analysis of Variance: Effects of Within- and Between-Group Variability

    ERIC Educational Resources Information Center

    Trumpower, David L.

    2015-01-01

    Making inferences about population differences based on samples of data, that is, performing intuitive analysis of variance (IANOVA), is common in everyday life. However, the intuitive reasoning of individuals when making such inferences (even following statistics instruction), often differs from the normative logic of formal statistics. The…

  17. Public and patient involvement in quantitative health research: A statistical perspective.

    PubMed

    Hannigan, Ailish

    2018-06-19

    The majority of studies included in recent reviews of impact for public and patient involvement (PPI) in health research had a qualitative design. PPI in solely quantitative designs is underexplored, particularly its impact on statistical analysis. Statisticians in practice have a long history of working in both consultative (indirect) and collaborative (direct) roles in health research, yet their perspective on PPI in quantitative health research has never been explicitly examined. To explore the potential and challenges of PPI from a statistical perspective at distinct stages of quantitative research, that is sampling, measurement and statistical analysis, distinguishing between indirect and direct PPI. Statistical analysis is underpinned by having a representative sample, and a collaborative or direct approach to PPI may help achieve that by supporting access to and increasing participation of under-represented groups in the population. Acknowledging and valuing the role of lay knowledge of the context in statistical analysis and in deciding what variables to measure may support collective learning and advance scientific understanding, as evidenced by the use of participatory modelling in other disciplines. A recurring issue for quantitative researchers, which reflects quantitative sampling methods, is the selection and required number of PPI contributors, and this requires further methodological development. Direct approaches to PPI in quantitative health research may potentially increase its impact, but the facilitation and partnership skills required may require further training for all stakeholders, including statisticians. © 2018 The Authors Health Expectations published by John Wiley & Sons Ltd.

  18. Analysis of Longitudinal Outcome Data with Missing Values in Total Knee Arthroplasty.

    PubMed

    Kang, Yeon Gwi; Lee, Jang Taek; Kang, Jong Yeal; Kim, Ga Hye; Kim, Tae Kyun

    2016-01-01

    We sought to determine the influence of missing data on the statistical results, and to determine which statistical method is most appropriate for the analysis of longitudinal outcome data of TKA with missing values among repeated measures ANOVA, generalized estimating equation (GEE) and mixed effects model repeated measures (MMRM). Data sets with missing values were generated with different proportion of missing data, sample size and missing-data generation mechanism. Each data set was analyzed with three statistical methods. The influence of missing data was greater with higher proportion of missing data and smaller sample size. MMRM tended to show least changes in the statistics. When missing values were generated by 'missing not at random' mechanism, no statistical methods could fully avoid deviations in the results. Copyright © 2016 Elsevier Inc. All rights reserved.

  19. [A Review on the Use of Effect Size in Nursing Research].

    PubMed

    Kang, Hyuncheol; Yeon, Kyupil; Han, Sang Tae

    2015-10-01

    The purpose of this study was to introduce the main concepts of statistical testing and effect size and to provide researchers in nursing science with guidance on how to calculate the effect size for the statistical analysis methods mainly used in nursing. For t-test, analysis of variance, correlation analysis, regression analysis which are used frequently in nursing research, the generally accepted definitions of the effect size were explained. Some formulae for calculating the effect size are described with several examples in nursing research. Furthermore, the authors present the required minimum sample size for each example utilizing G*Power 3 software that is the most widely used program for calculating sample size. It is noted that statistical significance testing and effect size measurement serve different purposes, and the reliance on only one side may be misleading. Some practical guidelines are recommended for combining statistical significance testing and effect size measure in order to make more balanced decisions in quantitative analyses.

  20. Advanced statistical methods for improved data analysis of NASA astrophysics missions

    NASA Technical Reports Server (NTRS)

    Feigelson, Eric D.

    1992-01-01

    The investigators under this grant studied ways to improve the statistical analysis of astronomical data. They looked at existing techniques, the development of new techniques, and the production and distribution of specialized software to the astronomical community. Abstracts of nine papers that were produced are included, as well as brief descriptions of four software packages. The articles that are abstracted discuss analytical and Monte Carlo comparisons of six different linear least squares fits, a (second) paper on linear regression in astronomy, two reviews of public domain software for the astronomer, subsample and half-sample methods for estimating sampling distributions, a nonparametric estimation of survival functions under dependent competing risks, censoring in astronomical data due to nondetections, an astronomy survival analysis computer package called ASURV, and improving the statistical methodology of astronomical data analysis.

  1. Oregon ground-water quality and its relation to hydrogeological factors; a statistical approach

    USGS Publications Warehouse

    Miller, T.L.; Gonthier, J.B.

    1984-01-01

    An appraisal of Oregon ground-water quality was made using existing data accessible through the U.S. Geological Survey computer system. The data available for about 1,000 sites were separated by aquifer units and hydrologic units. Selected statistical moments were described for 19 constituents including major ions. About 96 percent of all sites in the data base were sampled only once. The sample data were classified by aquifer unit and hydrologic unit and analysis of variance was run to determine if significant differences exist between the units within each of these two classifications for the same 19 constituents on which statistical moments were determined. Results of the analysis of variance indicated both classification variables performed about the same, but aquifer unit did provide more separation for some constituents. Samples from the Rogue River basin were classified by location within the flow system and type of flow system. The samples were then analyzed using analysis of variance on 14 constituents to determine if there were significant differences between subsets classified by flow path. Results of this analysis were not definitive, but classification as to the type of flow system did indicate potential for segregating water-quality data into distinct subsets. (USGS)

  2. Propensity-score matching in the cardiovascular surgery literature from 2004 to 2006: a systematic review and suggestions for improvement.

    PubMed

    Austin, Peter C

    2007-11-01

    I conducted a systematic review of the use of propensity score matching in the cardiovascular surgery literature. I examined the adequacy of reporting and whether appropriate statistical methods were used. I examined 60 articles published in the Annals of Thoracic Surgery, European Journal of Cardio-thoracic Surgery, Journal of Cardiovascular Surgery, and the Journal of Thoracic and Cardiovascular Surgery between January 1, 2004, and December 31, 2006. Thirty-one of the 60 studies did not provide adequate information on how the propensity score-matched pairs were formed. Eleven (18%) of studies did not report on whether matching on the propensity score balanced baseline characteristics between treated and untreated subjects in the matched sample. No studies used appropriate methods to compare baseline characteristics between treated and untreated subjects in the propensity score-matched sample. Eight (13%) of the 60 studies explicitly used statistical methods appropriate for the analysis of matched data when estimating the effect of treatment on the outcomes. Two studies used appropriate methods for some outcomes, but not for all outcomes. Thirty-nine (65%) studies explicitly used statistical methods that were inappropriate for matched-pairs data when estimating the effect of treatment on outcomes. Eleven studies did not report the statistical tests that were used to assess the statistical significance of the treatment effect. Analysis of propensity score-matched samples tended to be poor in the cardiovascular surgery literature. Most statistical analyses ignored the matched nature of the sample. I provide suggestions for improving the reporting and analysis of studies that use propensity score matching.

  3. Multi-reader ROC studies with split-plot designs: a comparison of statistical methods.

    PubMed

    Obuchowski, Nancy A; Gallas, Brandon D; Hillis, Stephen L

    2012-12-01

    Multireader imaging trials often use a factorial design, in which study patients undergo testing with all imaging modalities and readers interpret the results of all tests for all patients. A drawback of this design is the large number of interpretations required of each reader. Split-plot designs have been proposed as an alternative, in which one or a subset of readers interprets all images of a sample of patients, while other readers interpret the images of other samples of patients. In this paper, the authors compare three methods of analysis for the split-plot design. Three statistical methods are presented: the Obuchowski-Rockette method modified for the split-plot design, a newly proposed marginal-mean analysis-of-variance approach, and an extension of the three-sample U-statistic method. A simulation study using the Roe-Metz model was performed to compare the type I error rate, power, and confidence interval coverage of the three test statistics. The type I error rates for all three methods are close to the nominal level but tend to be slightly conservative. The statistical power is nearly identical for the three methods. The coverage of 95% confidence intervals falls close to the nominal coverage for small and large sample sizes. The split-plot multireader, multicase study design can be statistically efficient compared to the factorial design, reducing the number of interpretations required per reader. Three methods of analysis, shown to have nominal type I error rates, similar power, and nominal confidence interval coverage, are available for this study design. Copyright © 2012 AUR. All rights reserved.

  4. Mechanical properties of silicate glasses exposed to a low-Earth orbit

    NASA Technical Reports Server (NTRS)

    Wiedlocher, David E.; Tucker, Dennis S.; Nichols, Ron; Kinser, Donald L.

    1992-01-01

    The effects of a 5.8 year exposure to low earth orbit environment upon the mechanical properties of commercial optical fused silica, low iron soda-lime-silica, Pyrex 7740, Vycor 7913, BK-7, and the glass ceramic Zerodur were examined. Mechanical testing employed the ASTM-F-394 piston on 3-ball method in a liquid nitrogen environment. Samples were exposed on the Long Duration Exposure Facility (LDEF) in two locations. Impacts were observed on all specimens except Vycor. Weibull analysis as well as a standard statistical evaluation were conducted. The Weibull analysis revealed no differences between control samples and the two exposed samples. We thus concluded that radiation components of the Earth orbital environment did not degrade the mechanical strength of the samples examined within the limits of experimental error. The upper bound of strength degradation for meteorite impacted samples based upon statistical analysis and observation was 50 percent.

  5. Differentiation of commercial vermiculite based on statistical analysis of bulk chemical data: Fingerprinting vermiculite from Libby, Montana U.S.A

    USGS Publications Warehouse

    Gunter, M.E.; Singleton, E.; Bandli, B.R.; Lowers, H.A.; Meeker, G.P.

    2005-01-01

    Major-, minor-, and trace-element compositions, as determined by X-ray fluorescence (XRF) analysis, were obtained on 34 samples of vermiculite to ascertain whether chemical differences exist to the extent of determining the source of commercial products. The sample set included ores from four deposits, seven commercially available garden products, and insulation from four attics. The trace-element distributions of Ba, Cr, and V can be used to distinguish the Libby vermiculite samples from the garden products. In general, the overall composition of the Libby and South Carolina deposits appeared similar, but differed from the South Africa and China deposits based on simple statistical methods. Cluster analysis provided a good distinction of the four ore types, grouped the four attic samples with the Libby ore, and, with less certainty, grouped the garden samples with the South Africa ore.

  6. Statistical analysis tables for truncated or censored samples

    NASA Technical Reports Server (NTRS)

    Cohen, A. C.; Cooley, C. G.

    1971-01-01

    Compilation describes characteristics of truncated and censored samples, and presents six illustrations of practical use of tables in computing mean and variance estimates for normal distribution using selected samples.

  7. Determining the Statistical Significance of Relative Weights

    ERIC Educational Resources Information Center

    Tonidandel, Scott; LeBreton, James M.; Johnson, Jeff W.

    2009-01-01

    Relative weight analysis is a procedure for estimating the relative importance of correlated predictors in a regression equation. Because the sampling distribution of relative weights is unknown, researchers using relative weight analysis are unable to make judgments regarding the statistical significance of the relative weights. J. W. Johnson…

  8. Sexual Harassment Retaliation Climate DEOCS 4.1 Construct Validity Summary

    DTIC Science & Technology

    2017-08-01

    exploratory factor analysis, and bivariate correlations (sample 1) 2) To determine the factor structure of the remaining (final) questions via...statistics, reliability analysis, exploratory factor analysis, and bivariate correlations of the prospective Sexual Harassment Retaliation Climate...reported by the survey requester). For information regarding the composition of sample, refer to Table 1. Table 1. Sample 1 Demographics n

  9. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Zhang, Fangyan; Zhang, Song; Chung Wong, Pak

    Effectively visualizing large graphs and capturing the statistical properties are two challenging tasks. To aid in these two tasks, many sampling approaches for graph simplification have been proposed, falling into three categories: node sampling, edge sampling, and traversal-based sampling. It is still unknown which approach is the best. We evaluate commonly used graph sampling methods through a combined visual and statistical comparison of graphs sampled at various rates. We conduct our evaluation on three graph models: random graphs, small-world graphs, and scale-free graphs. Initial results indicate that the effectiveness of a sampling method is dependent on the graph model, themore » size of the graph, and the desired statistical property. This benchmark study can be used as a guideline in choosing the appropriate method for a particular graph sampling task, and the results presented can be incorporated into graph visualization and analysis tools.« less

  10. Analysis of defect structure in silicon. Characterization of samples from UCP ingot 5848-13C

    NASA Technical Reports Server (NTRS)

    Natesh, R.; Guyer, T.; Stringfellow, G. B.

    1982-01-01

    Statistically significant quantitative structural imperfection measurements were made on samples from ubiquitous crystalline process (UCP) Ingot 5848 - 13 C. Important trends were noticed between the measured data, cell efficiency, and diffusion length. Grain boundary substructure appears to have an important effect on the conversion efficiency of solar cells from Semix material. Quantitative microscopy measurements give statistically significant information compared to other microanalytical techniques. A surface preparation technique to obtain proper contrast of structural defects suitable for QTM analysis was perfected.

  11. Why weight? Modelling sample and observational level variability improves power in RNA-seq analyses

    PubMed Central

    Liu, Ruijie; Holik, Aliaksei Z.; Su, Shian; Jansz, Natasha; Chen, Kelan; Leong, Huei San; Blewitt, Marnie E.; Asselin-Labat, Marie-Liesse; Smyth, Gordon K.; Ritchie, Matthew E.

    2015-01-01

    Variations in sample quality are frequently encountered in small RNA-sequencing experiments, and pose a major challenge in a differential expression analysis. Removal of high variation samples reduces noise, but at a cost of reducing power, thus limiting our ability to detect biologically meaningful changes. Similarly, retaining these samples in the analysis may not reveal any statistically significant changes due to the higher noise level. A compromise is to use all available data, but to down-weight the observations from more variable samples. We describe a statistical approach that facilitates this by modelling heterogeneity at both the sample and observational levels as part of the differential expression analysis. At the sample level this is achieved by fitting a log-linear variance model that includes common sample-specific or group-specific parameters that are shared between genes. The estimated sample variance factors are then converted to weights and combined with observational level weights obtained from the mean–variance relationship of the log-counts-per-million using ‘voom’. A comprehensive analysis involving both simulations and experimental RNA-sequencing data demonstrates that this strategy leads to a universally more powerful analysis and fewer false discoveries when compared to conventional approaches. This methodology has wide application and is implemented in the open-source ‘limma’ package. PMID:25925576

  12. Standard deviation and standard error of the mean.

    PubMed

    Lee, Dong Kyu; In, Junyong; Lee, Sangseok

    2015-06-01

    In most clinical and experimental studies, the standard deviation (SD) and the estimated standard error of the mean (SEM) are used to present the characteristics of sample data and to explain statistical analysis results. However, some authors occasionally muddle the distinctive usage between the SD and SEM in medical literature. Because the process of calculating the SD and SEM includes different statistical inferences, each of them has its own meaning. SD is the dispersion of data in a normal distribution. In other words, SD indicates how accurately the mean represents sample data. However the meaning of SEM includes statistical inference based on the sampling distribution. SEM is the SD of the theoretical distribution of the sample means (the sampling distribution). While either SD or SEM can be applied to describe data and statistical results, one should be aware of reasonable methods with which to use SD and SEM. We aim to elucidate the distinctions between SD and SEM and to provide proper usage guidelines for both, which summarize data and describe statistical results.

  13. Standard deviation and standard error of the mean

    PubMed Central

    In, Junyong; Lee, Sangseok

    2015-01-01

    In most clinical and experimental studies, the standard deviation (SD) and the estimated standard error of the mean (SEM) are used to present the characteristics of sample data and to explain statistical analysis results. However, some authors occasionally muddle the distinctive usage between the SD and SEM in medical literature. Because the process of calculating the SD and SEM includes different statistical inferences, each of them has its own meaning. SD is the dispersion of data in a normal distribution. In other words, SD indicates how accurately the mean represents sample data. However the meaning of SEM includes statistical inference based on the sampling distribution. SEM is the SD of the theoretical distribution of the sample means (the sampling distribution). While either SD or SEM can be applied to describe data and statistical results, one should be aware of reasonable methods with which to use SD and SEM. We aim to elucidate the distinctions between SD and SEM and to provide proper usage guidelines for both, which summarize data and describe statistical results. PMID:26045923

  14. It's all relative: ranking the diversity of aquatic bacterial communities.

    PubMed

    Shaw, Allison K; Halpern, Aaron L; Beeson, Karen; Tran, Bao; Venter, J Craig; Martiny, Jennifer B H

    2008-09-01

    The study of microbial diversity patterns is hampered by the enormous diversity of microbial communities and the lack of resources to sample them exhaustively. For many questions about richness and evenness, however, one only needs to know the relative order of diversity among samples rather than total diversity. We used 16S libraries from the Global Ocean Survey to investigate the ability of 10 diversity statistics (including rarefaction, non-parametric, parametric, curve extrapolation and diversity indices) to assess the relative diversity of six aquatic bacterial communities. Overall, we found that the statistics yielded remarkably similar rankings of the samples for a given sequence similarity cut-off. This correspondence, despite the different underlying assumptions of the statistics, suggests that diversity statistics are a useful tool for ranking samples of microbial diversity. In addition, sequence similarity cut-off influenced the diversity ranking of the samples, demonstrating that diversity statistics can also be used to detect differences in phylogenetic structure among microbial communities. Finally, a subsampling analysis suggests that further sequencing from these particular clone libraries would not have substantially changed the richness rankings of the samples.

  15. Appropriate Statistical Analysis for Two Independent Groups of Likert-Type Data

    ERIC Educational Resources Information Center

    Warachan, Boonyasit

    2011-01-01

    The objective of this research was to determine the robustness and statistical power of three different methods for testing the hypothesis that ordinal samples of five and seven Likert categories come from equal populations. The three methods are the two sample t-test with equal variances, the Mann-Whitney test, and the Kolmogorov-Smirnov test. In…

  16. Consistent Tolerance Bounds for Statistical Distributions

    NASA Technical Reports Server (NTRS)

    Mezzacappa, M. A.

    1983-01-01

    Assumption that sample comes from population with particular distribution is made with confidence C if data lie between certain bounds. These "confidence bounds" depend on C and assumption about distribution of sampling errors around regression line. Graphical test criteria using tolerance bounds are applied in industry where statistical analysis influences product development and use. Applied to evaluate equipment life.

  17. A Civilian/Military Trauma Institute: National Trauma Coordinating Center

    DTIC Science & Technology

    2015-12-01

    zip codes was used in “proximity to violence” analysis. Data were analyzed using SPSS (version 20.0, SPSS Inc., Chicago, IL). Multivariable linear...number of adverse events and serious events was not statistically higher in one group, the incidence of deep venous thrombosis (DVT) was statistically ...subjects the lack of statistical difference on multivariate analysis may be related to an underpowered sample size. It was recommended that the

  18. The Statistical Power of Planned Comparisons.

    ERIC Educational Resources Information Center

    Benton, Roberta L.

    Basic principles underlying statistical power are examined; and issues pertaining to effect size, sample size, error variance, and significance level are highlighted via the use of specific hypothetical examples. Analysis of variance (ANOVA) and related methods remain popular, although other procedures sometimes have more statistical power against…

  19. Analysis of the Einstein sample of early-type galaxies

    NASA Technical Reports Server (NTRS)

    Eskridge, Paul B.; Fabbiano, Giuseppina

    1993-01-01

    The EINSTEIN galaxy catalog contains x-ray data for 148 early-type (E and SO) galaxies. A detailed analysis of the global properties of this sample are studied. By comparing the x-ray properties with other tracers of the ISM, as well as with observables related to the stellar dynamics and populations of the sample, we expect to determine more clearly the physical relationships that determine the evolution of early-type galaxies. Previous studies with smaller samples have explored the relationships between x-ray luminosity (L(sub x)) and luminosities in other bands. Using our larger sample and the statistical techniques of survival analysis, a number of these earlier analyses were repeated. For our full sample, a strong statistical correlation is found between L(sub X) and L(sub B) (the probability that the null hypothesis is upheld is P less than 10(exp -4) from a variety of rank correlation tests. Regressions with several algorithms yield consistent results.

  20. Methods for collection and analysis of aquatic biological and microbiological samples

    USGS Publications Warehouse

    Greeson, Phillip E.; Ehlke, T.A.; Irwin, G.A.; Lium, B.W.; Slack, K.V.

    1977-01-01

    Chapter A4 contains methods used by the U.S. Geological Survey to collect, preserve, and analyze waters to determine their biological and microbiological properties. Part 1 discusses biological sampling and sampling statistics. The statistical procedures are accompanied by examples. Part 2 consists of detailed descriptions of more than 45 individual methods, including those for bacteria, phytoplankton, zooplankton, seston, periphyton, macrophytes, benthic invertebrates, fish and other vertebrates, cellular contents, productivity, and bioassays. Each method is summarized, and the application, interferences, apparatus, reagents, collection, analysis, calculations, reporting of results, precision and references are given. Part 3 consists of a glossary. Part 4 is a list of taxonomic references.

  1. Evaluating the efficiency of environmental monitoring programs

    USGS Publications Warehouse

    Levine, Carrie R.; Yanai, Ruth D.; Lampman, Gregory G.; Burns, Douglas A.; Driscoll, Charles T.; Lawrence, Gregory B.; Lynch, Jason; Schoch, Nina

    2014-01-01

    Statistical uncertainty analyses can be used to improve the efficiency of environmental monitoring, allowing sampling designs to maximize information gained relative to resources required for data collection and analysis. In this paper, we illustrate four methods of data analysis appropriate to four types of environmental monitoring designs. To analyze a long-term record from a single site, we applied a general linear model to weekly stream chemistry data at Biscuit Brook, NY, to simulate the effects of reducing sampling effort and to evaluate statistical confidence in the detection of change over time. To illustrate a detectable difference analysis, we analyzed a one-time survey of mercury concentrations in loon tissues in lakes in the Adirondack Park, NY, demonstrating the effects of sampling intensity on statistical power and the selection of a resampling interval. To illustrate a bootstrapping method, we analyzed the plot-level sampling intensity of forest inventory at the Hubbard Brook Experimental Forest, NH, to quantify the sampling regime needed to achieve a desired confidence interval. Finally, to analyze time-series data from multiple sites, we assessed the number of lakes and the number of samples per year needed to monitor change over time in Adirondack lake chemistry using a repeated-measures mixed-effects model. Evaluations of time series and synoptic long-term monitoring data can help determine whether sampling should be re-allocated in space or time to optimize the use of financial and human resources.

  2. Characterizing microstructural features of biomedical samples by statistical analysis of Mueller matrix images

    NASA Astrophysics Data System (ADS)

    He, Honghui; Dong, Yang; Zhou, Jialing; Ma, Hui

    2017-03-01

    As one of the salient features of light, polarization contains abundant structural and optical information of media. Recently, as a comprehensive description of polarization property, the Mueller matrix polarimetry has been applied to various biomedical studies such as cancerous tissues detections. In previous works, it has been found that the structural information encoded in the 2D Mueller matrix images can be presented by other transformed parameters with more explicit relationship to certain microstructural features. In this paper, we present a statistical analyzing method to transform the 2D Mueller matrix images into frequency distribution histograms (FDHs) and their central moments to reveal the dominant structural features of samples quantitatively. The experimental results of porcine heart, intestine, stomach, and liver tissues demonstrate that the transformation parameters and central moments based on the statistical analysis of Mueller matrix elements have simple relationships to the dominant microstructural properties of biomedical samples, including the density and orientation of fibrous structures, the depolarization power, diattenuation and absorption abilities. It is shown in this paper that the statistical analysis of 2D images of Mueller matrix elements may provide quantitative or semi-quantitative criteria for biomedical diagnosis.

  3. How Attitudes towards Statistics Courses and the Field of Statistics Predicts Statistics Anxiety among Undergraduate Social Science Majors: A Validation of the Statistical Anxiety Scale

    ERIC Educational Resources Information Center

    O'Bryant, Monique J.

    2017-01-01

    The aim of this study was to validate an instrument that can be used by instructors or social scientist who are interested in evaluating statistics anxiety. The psychometric properties of the English version of the Statistical Anxiety Scale (SAS) was examined through a confirmatory factor analysis of scores from a sample of 323 undergraduate…

  4. Role of microstructure on twin nucleation and growth in HCP titanium: A statistical study

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Arul Kumar, M.; Wroński, M.; McCabe, Rodney James

    In this study, a detailed statistical analysis is performed using Electron Back Scatter Diffraction (EBSD) to establish the effect of microstructure on twin nucleation and growth in deformed commercial purity hexagonal close packed (HCP) titanium. Rolled titanium samples are compressed along rolling, transverse and normal directions to establish statistical correlations for {10–12}, {11–21}, and {11–22} twins. A recently developed automated EBSD-twinning analysis software is employed for the statistical analysis. Finally, the analysis provides the following key findings: (I) grain size and strain dependence is different for twin nucleation and growth; (II) twinning statistics can be generalized for the HCP metalsmore » magnesium, zirconium and titanium; and (III) complex microstructure, where grain shape and size distribution is heterogeneous, requires multi-point statistical correlations.« less

  5. Role of microstructure on twin nucleation and growth in HCP titanium: A statistical study

    DOE PAGES

    Arul Kumar, M.; Wroński, M.; McCabe, Rodney James; ...

    2018-02-01

    In this study, a detailed statistical analysis is performed using Electron Back Scatter Diffraction (EBSD) to establish the effect of microstructure on twin nucleation and growth in deformed commercial purity hexagonal close packed (HCP) titanium. Rolled titanium samples are compressed along rolling, transverse and normal directions to establish statistical correlations for {10–12}, {11–21}, and {11–22} twins. A recently developed automated EBSD-twinning analysis software is employed for the statistical analysis. Finally, the analysis provides the following key findings: (I) grain size and strain dependence is different for twin nucleation and growth; (II) twinning statistics can be generalized for the HCP metalsmore » magnesium, zirconium and titanium; and (III) complex microstructure, where grain shape and size distribution is heterogeneous, requires multi-point statistical correlations.« less

  6. Trend analysis and selected summary statistics of annual mean streamflow for 38 selected long-term U.S. Geological Survey streamgages in Texas, water years 1916-2012

    USGS Publications Warehouse

    Asquith, William H.; Barbie, Dana L.

    2014-01-01

    Selected summary statistics (L-moments) and estimates of respective sampling variances were computed for the 35 streamgages lacking statistically significant trends. From the L-moments and estimated sampling variances, weighted means or regional values were computed for each L-moment. An example application is included demonstrating how the L-moments could be used to evaluate the magnitude and frequency of annual mean streamflow.

  7. How Statistics "Excel" Online.

    ERIC Educational Resources Information Center

    Chao, Faith; Davis, James

    2000-01-01

    Discusses the use of Microsoft Excel software and provides examples of its use in an online statistics course at Golden Gate University in the areas of randomness and probability, sampling distributions, confidence intervals, and regression analysis. (LRW)

  8. Equivalent statistics and data interpretation.

    PubMed

    Francis, Gregory

    2017-08-01

    Recent reform efforts in psychological science have led to a plethora of choices for scientists to analyze their data. A scientist making an inference about their data must now decide whether to report a p value, summarize the data with a standardized effect size and its confidence interval, report a Bayes Factor, or use other model comparison methods. To make good choices among these options, it is necessary for researchers to understand the characteristics of the various statistics used by the different analysis frameworks. Toward that end, this paper makes two contributions. First, it shows that for the case of a two-sample t test with known sample sizes, many different summary statistics are mathematically equivalent in the sense that they are based on the very same information in the data set. When the sample sizes are known, the p value provides as much information about a data set as the confidence interval of Cohen's d or a JZS Bayes factor. Second, this equivalence means that different analysis methods differ only in their interpretation of the empirical data. At first glance, it might seem that mathematical equivalence of the statistics suggests that it does not matter much which statistic is reported, but the opposite is true because the appropriateness of a reported statistic is relative to the inference it promotes. Accordingly, scientists should choose an analysis method appropriate for their scientific investigation. A direct comparison of the different inferential frameworks provides some guidance for scientists to make good choices and improve scientific practice.

  9. Students' Appreciation of Expectation and Variation as a Foundation for Statistical Understanding

    ERIC Educational Resources Information Center

    Watson, Jane M.; Callingham, Rosemary A.; Kelly, Ben A.

    2007-01-01

    This study presents the results of a partial credit Rasch analysis of in-depth interview data exploring statistical understanding of 73 school students in 6 contextual settings. The use of Rasch analysis allowed the exploration of a single underlying variable across contexts, which included probability sampling, representation of temperature…

  10. Statistical methodology for the analysis of dye-switch microarray experiments

    PubMed Central

    Mary-Huard, Tristan; Aubert, Julie; Mansouri-Attia, Nadera; Sandra, Olivier; Daudin, Jean-Jacques

    2008-01-01

    Background In individually dye-balanced microarray designs, each biological sample is hybridized on two different slides, once with Cy3 and once with Cy5. While this strategy ensures an automatic correction of the gene-specific labelling bias, it also induces dependencies between log-ratio measurements that must be taken into account in the statistical analysis. Results We present two original statistical procedures for the statistical analysis of individually balanced designs. These procedures are compared with the usual ML and REML mixed model procedures proposed in most statistical toolboxes, on both simulated and real data. Conclusion The UP procedure we propose as an alternative to usual mixed model procedures is more efficient and significantly faster to compute. This result provides some useful guidelines for the analysis of complex designs. PMID:18271965

  11. The stability of hydrogen ion and specific conductance in filtered wet-deposition samples stored at ambient temperatures

    USGS Publications Warehouse

    Gordon, J.D.; Schroder, L.J.; Morden-Moore, A. L.; Bowersox, V.C.

    1995-01-01

    Separate experiments by the U.S. Geological Survey (USGS) and the Illinois State Water Survey Central Analytical Laboratory (CAL) independently assessed the stability of hydrogen ion and specific conductance in filtered wet-deposition samples stored at ambient temperatures. The USGS experiment represented a test of sample stability under a diverse range of conditions, whereas the CAL experiment was a controlled test of sample stability. In the experiment by the USGS, a statistically significant (?? = 0.05) relation between [H+] and time was found for the composited filtered, natural, wet-deposition solution when all reported values are included in the analysis. However, if two outlying pH values most likely representing measurement error are excluded from the analysis, the change in [H+] over time was not statistically significant. In the experiment by the CAL, randomly selected samples were reanalyzed between July 1984 and February 1991. The original analysis and reanalysis pairs revealed that [H+] differences, although very small, were statistically different from zero, whereas specific-conductance differences were not. Nevertheless, the results of the CAL reanalysis project indicate there appears to be no consistent, chemically significant degradation in sample integrity with regard to [H+] and specific conductance while samples are stored at room temperature at the CAL. Based on the results of the CAL and USGS studies, short-term (45-60 day) stability of [H+] and specific conductance in natural filtered wet-deposition samples that are shipped and stored unchilled at ambient temperatures was satisfactory.

  12. Frontiers of Two-Dimensional Correlation Spectroscopy. Part 1. New concepts and noteworthy developments

    NASA Astrophysics Data System (ADS)

    Noda, Isao

    2014-07-01

    A comprehensive survey review of new and noteworthy developments, which are advancing forward the frontiers in the field of 2D correlation spectroscopy during the last four years, is compiled. This review covers books, proceedings, and review articles published on 2D correlation spectroscopy, a number of significant conceptual developments in the field, data pretreatment methods and other pertinent topics, as well as patent and publication trends and citation activities. Developments discussed include projection 2D correlation analysis, concatenated 2D correlation, and correlation under multiple perturbation effects, as well as orthogonal sample design, predicting 2D correlation spectra, manipulating and comparing 2D spectra, correlation strategy based on segmented data blocks, such as moving-window analysis, features like determination of sequential order and enhanced spectral resolution, statistical 2D spectroscopy using covariance and other statistical metrics, hetero-correlation analysis, and sample-sample correlation technique. Data pretreatment operations prior to 2D correlation analysis are discussed, including the correction for physical effects, background and baseline subtraction, selection of reference spectrum, normalization and scaling of data, derivatives spectra and deconvolution technique, and smoothing and noise reduction. Other pertinent topics include chemometrics and statistical considerations, peak position shift phenomena, variable sampling increments, computation and software, display schemes, such as color coded format, slice and power spectra, tabulation, and other schemes.

  13. Arsenic health risk assessment in drinking water and source apportionment using multivariate statistical techniques in Kohistan region, northern Pakistan.

    PubMed

    Muhammad, Said; Tahir Shah, M; Khan, Sardar

    2010-10-01

    The present study was conducted in Kohistan region, where mafic and ultramafic rocks (Kohistan island arc and Indus suture zone) and metasedimentary rocks (Indian plate) are exposed. Water samples were collected from the springs, streams and Indus river and analyzed for physical parameters, anions, cations and arsenic (As(3+), As(5+) and arsenic total). The water quality in Kohistan region was evaluated by comparing the physio-chemical parameters with permissible limits set by Pakistan environmental protection agency and world health organization. Most of the studied parameters were found within their respective permissible limits. However in some samples, the iron and arsenic concentrations exceeded their permissible limits. For health risk assessment of arsenic, the average daily dose, hazards quotient (HQ) and cancer risk were calculated by using statistical formulas. The values of HQ were found >1 in the samples collected from Jabba, Dubair, while HQ values were <1 in rest of the samples. This level of contamination should have low chronic risk and medium cancer risk when compared with US EPA guidelines. Furthermore, the inter-dependence of physio-chemical parameters and pollution load was also calculated by using multivariate statistical techniques like one-way ANOVA, correlation analysis, regression analysis, cluster analysis and principle component analysis. Copyright © 2010 Elsevier Ltd. All rights reserved.

  14. PREDICTED GROUND WATER, SOIL AND SOIL GAS IMPACTS FROM U.S. GASOLINES, 2004, FIRST ANALYSIS OF THE AUTUMNAL DATA

    EPA Science Inventory

    Ninety six gasoline samples were collected from around the U.S. in Autumn 2004. A detailed hydrocarbon analysis was performed on each sample resulting in a data set of approximately 300 chemicals per sample. Statistical analyses were performed on the entire suite of reported chem...

  15. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Plemons, R.E.; Hopwood, W.H. Jr.; Hamilton, J.H.

    For a number of years the Oak Ridge Y-12 Plant Laboratory has been analyzing coal predominately for the utilities department of the Y-12 Plant. All laboratory procedures, except a Leco sulfur method which used the Leco Instruction Manual as a reference, were written based on the ASTM coal analyses. Sulfur is analyzed at the present time by two methods, gravimetric and Leco. The laboratory has two major endeavors for monitoring the quality of its coal analyses. (1) A control program by the Plant Statistical Quality Control Department. Quality Control submits one sample for every nine samples submitted by the utilitiesmore » departments and the laboratory analyzes a control sample along with the utilities samples. (2) An exchange program with the DOE Coal Analysis Laboratory in Bruceton, Pennsylvania. The Y-12 Laboratory submits to the DOE Coal Laboratory, on even numbered months, a sample that Y-12 has analyzed. The DOE Coal Laboratory submits, on odd numbered months, one of their analyzed samples to the Y-12 Plant Laboratory to be analyzed. The results of these control and exchange programs are monitored not only by laboratory personnel, but also by Statistical Quality Control personnel who provide statistical evaluations. After analysis and reporting of results, all utilities samples are retained by the laboratory until the coal contracts have been settled. The utilities departments have responsibility for the initiation and preparation of the coal samples. The samples normally received by the laboratory have been ground to 4-mesh, reduced to 0.5-gallon quantities, and sealed in air-tight containers. Sample identification numbers and a Request for Analysis are generated by the utilities departments.« less

  16. Why weight? Modelling sample and observational level variability improves power in RNA-seq analyses.

    PubMed

    Liu, Ruijie; Holik, Aliaksei Z; Su, Shian; Jansz, Natasha; Chen, Kelan; Leong, Huei San; Blewitt, Marnie E; Asselin-Labat, Marie-Liesse; Smyth, Gordon K; Ritchie, Matthew E

    2015-09-03

    Variations in sample quality are frequently encountered in small RNA-sequencing experiments, and pose a major challenge in a differential expression analysis. Removal of high variation samples reduces noise, but at a cost of reducing power, thus limiting our ability to detect biologically meaningful changes. Similarly, retaining these samples in the analysis may not reveal any statistically significant changes due to the higher noise level. A compromise is to use all available data, but to down-weight the observations from more variable samples. We describe a statistical approach that facilitates this by modelling heterogeneity at both the sample and observational levels as part of the differential expression analysis. At the sample level this is achieved by fitting a log-linear variance model that includes common sample-specific or group-specific parameters that are shared between genes. The estimated sample variance factors are then converted to weights and combined with observational level weights obtained from the mean-variance relationship of the log-counts-per-million using 'voom'. A comprehensive analysis involving both simulations and experimental RNA-sequencing data demonstrates that this strategy leads to a universally more powerful analysis and fewer false discoveries when compared to conventional approaches. This methodology has wide application and is implemented in the open-source 'limma' package. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  17. Chemometric and multivariate statistical analysis of time-of-flight secondary ion mass spectrometry spectra from complex Cu-Fe sulfides.

    PubMed

    Kalegowda, Yogesh; Harmer, Sarah L

    2012-03-20

    Time-of-flight secondary ion mass spectrometry (TOF-SIMS) spectra of mineral samples are complex, comprised of large mass ranges and many peaks. Consequently, characterization and classification analysis of these systems is challenging. In this study, different chemometric and statistical data evaluation methods, based on monolayer sensitive TOF-SIMS data, have been tested for the characterization and classification of copper-iron sulfide minerals (chalcopyrite, chalcocite, bornite, and pyrite) at different flotation pulp conditions (feed, conditioned feed, and Eh modified). The complex mass spectral data sets were analyzed using the following chemometric and statistical techniques: principal component analysis (PCA); principal component-discriminant functional analysis (PC-DFA); soft independent modeling of class analogy (SIMCA); and k-Nearest Neighbor (k-NN) classification. PCA was found to be an important first step in multivariate analysis, providing insight into both the relative grouping of samples and the elemental/molecular basis for those groupings. For samples exposed to oxidative conditions (at Eh ~430 mV), each technique (PCA, PC-DFA, SIMCA, and k-NN) was found to produce excellent classification. For samples at reductive conditions (at Eh ~ -200 mV SHE), k-NN and SIMCA produced the most accurate classification. Phase identification of particles that contain the same elements but a different crystal structure in a mixed multimetal mineral system has been achieved.

  18. 34 CFR Appendix A to Subpart N of... - Sample Default Prevention Plan

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... relevant default prevention statistics, including a statistical analysis of the borrowers who default on...'s delinquency status by obtaining reports from data managers and FFEL Program lenders. 5. Enhance... academic study. III. Statistics for Measuring Progress 1. The number of students enrolled at your...

  19. APA's Learning Objectives for Research Methods and Statistics in Practice: A Multimethod Analysis

    ERIC Educational Resources Information Center

    Tomcho, Thomas J.; Rice, Diana; Foels, Rob; Folmsbee, Leah; Vladescu, Jason; Lissman, Rachel; Matulewicz, Ryan; Bopp, Kara

    2009-01-01

    Research methods and statistics courses constitute a core undergraduate psychology requirement. We analyzed course syllabi and faculty self-reported coverage of both research methods and statistics course learning objectives to assess the concordance with APA's learning objectives (American Psychological Association, 2007). We obtained a sample of…

  20. Distribution of the two-sample t-test statistic following blinded sample size re-estimation.

    PubMed

    Lu, Kaifeng

    2016-05-01

    We consider the blinded sample size re-estimation based on the simple one-sample variance estimator at an interim analysis. We characterize the exact distribution of the standard two-sample t-test statistic at the final analysis. We describe a simulation algorithm for the evaluation of the probability of rejecting the null hypothesis at given treatment effect. We compare the blinded sample size re-estimation method with two unblinded methods with respect to the empirical type I error, the empirical power, and the empirical distribution of the standard deviation estimator and final sample size. We characterize the type I error inflation across the range of standardized non-inferiority margin for non-inferiority trials, and derive the adjusted significance level to ensure type I error control for given sample size of the internal pilot study. We show that the adjusted significance level increases as the sample size of the internal pilot study increases. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.

  1. Statistical evaluation of vibration analysis techniques

    NASA Technical Reports Server (NTRS)

    Milner, G. Martin; Miller, Patrice S.

    1987-01-01

    An evaluation methodology is presented for a selection of candidate vibration analysis techniques applicable to machinery representative of the environmental control and life support system of advanced spacecraft; illustrative results are given. Attention is given to the statistical analysis of small sample experiments, the quantification of detection performance for diverse techniques through the computation of probability of detection versus probability of false alarm, and the quantification of diagnostic performance.

  2. A knowledge-based T2-statistic to perform pathway analysis for quantitative proteomic data

    PubMed Central

    Chen, Yi-Hau

    2017-01-01

    Approaches to identify significant pathways from high-throughput quantitative data have been developed in recent years. Still, the analysis of proteomic data stays difficult because of limited sample size. This limitation also leads to the practice of using a competitive null as common approach; which fundamentally implies genes or proteins as independent units. The independent assumption ignores the associations among biomolecules with similar functions or cellular localization, as well as the interactions among them manifested as changes in expression ratios. Consequently, these methods often underestimate the associations among biomolecules and cause false positives in practice. Some studies incorporate the sample covariance matrix into the calculation to address this issue. However, sample covariance may not be a precise estimation if the sample size is very limited, which is usually the case for the data produced by mass spectrometry. In this study, we introduce a multivariate test under a self-contained null to perform pathway analysis for quantitative proteomic data. The covariance matrix used in the test statistic is constructed by the confidence scores retrieved from the STRING database or the HitPredict database. We also design an integrating procedure to retain pathways of sufficient evidence as a pathway group. The performance of the proposed T2-statistic is demonstrated using five published experimental datasets: the T-cell activation, the cAMP/PKA signaling, the myoblast differentiation, and the effect of dasatinib on the BCR-ABL pathway are proteomic datasets produced by mass spectrometry; and the protective effect of myocilin via the MAPK signaling pathway is a gene expression dataset of limited sample size. Compared with other popular statistics, the proposed T2-statistic yields more accurate descriptions in agreement with the discussion of the original publication. We implemented the T2-statistic into an R package T2GA, which is available at https://github.com/roqe/T2GA. PMID:28622336

  3. A knowledge-based T2-statistic to perform pathway analysis for quantitative proteomic data.

    PubMed

    Lai, En-Yu; Chen, Yi-Hau; Wu, Kun-Pin

    2017-06-01

    Approaches to identify significant pathways from high-throughput quantitative data have been developed in recent years. Still, the analysis of proteomic data stays difficult because of limited sample size. This limitation also leads to the practice of using a competitive null as common approach; which fundamentally implies genes or proteins as independent units. The independent assumption ignores the associations among biomolecules with similar functions or cellular localization, as well as the interactions among them manifested as changes in expression ratios. Consequently, these methods often underestimate the associations among biomolecules and cause false positives in practice. Some studies incorporate the sample covariance matrix into the calculation to address this issue. However, sample covariance may not be a precise estimation if the sample size is very limited, which is usually the case for the data produced by mass spectrometry. In this study, we introduce a multivariate test under a self-contained null to perform pathway analysis for quantitative proteomic data. The covariance matrix used in the test statistic is constructed by the confidence scores retrieved from the STRING database or the HitPredict database. We also design an integrating procedure to retain pathways of sufficient evidence as a pathway group. The performance of the proposed T2-statistic is demonstrated using five published experimental datasets: the T-cell activation, the cAMP/PKA signaling, the myoblast differentiation, and the effect of dasatinib on the BCR-ABL pathway are proteomic datasets produced by mass spectrometry; and the protective effect of myocilin via the MAPK signaling pathway is a gene expression dataset of limited sample size. Compared with other popular statistics, the proposed T2-statistic yields more accurate descriptions in agreement with the discussion of the original publication. We implemented the T2-statistic into an R package T2GA, which is available at https://github.com/roqe/T2GA.

  4. An information-theoretic approach to the modeling and analysis of whole-genome bisulfite sequencing data.

    PubMed

    Jenkinson, Garrett; Abante, Jordi; Feinberg, Andrew P; Goutsias, John

    2018-03-07

    DNA methylation is a stable form of epigenetic memory used by cells to control gene expression. Whole genome bisulfite sequencing (WGBS) has emerged as a gold-standard experimental technique for studying DNA methylation by producing high resolution genome-wide methylation profiles. Statistical modeling and analysis is employed to computationally extract and quantify information from these profiles in an effort to identify regions of the genome that demonstrate crucial or aberrant epigenetic behavior. However, the performance of most currently available methods for methylation analysis is hampered by their inability to directly account for statistical dependencies between neighboring methylation sites, thus ignoring significant information available in WGBS reads. We present a powerful information-theoretic approach for genome-wide modeling and analysis of WGBS data based on the 1D Ising model of statistical physics. This approach takes into account correlations in methylation by utilizing a joint probability model that encapsulates all information available in WGBS methylation reads and produces accurate results even when applied on single WGBS samples with low coverage. Using the Shannon entropy, our approach provides a rigorous quantification of methylation stochasticity in individual WGBS samples genome-wide. Furthermore, it utilizes the Jensen-Shannon distance to evaluate differences in methylation distributions between a test and a reference sample. Differential performance assessment using simulated and real human lung normal/cancer data demonstrate a clear superiority of our approach over DSS, a recently proposed method for WGBS data analysis. Critically, these results demonstrate that marginal methods become statistically invalid when correlations are present in the data. This contribution demonstrates clear benefits and the necessity of modeling joint probability distributions of methylation using the 1D Ising model of statistical physics and of quantifying methylation stochasticity using concepts from information theory. By employing this methodology, substantial improvement of DNA methylation analysis can be achieved by effectively taking into account the massive amount of statistical information available in WGBS data, which is largely ignored by existing methods.

  5. Balanced mechanical resonator for powder handling device

    NASA Technical Reports Server (NTRS)

    Sarrazin, Philippe C. (Inventor); Brunner, Will M. (Inventor)

    2012-01-01

    A system incorporating a balanced mechanical resonator and a method for vibration of a sample composed of granular material to generate motion of a powder sample inside the sample holder for obtaining improved analysis statistics, without imparting vibration to the sample holder support.

  6. Computerized EEG analysis for studying the effect of drugs on the central nervous system.

    PubMed

    Rosadini, G; Cavazza, B; Rodriguez, G; Sannita, W G; Siccardi, A

    1977-11-01

    Samples of our experience in quantitative pharmaco-EEG are reviewed to discuss and define its applicability and limits. Simple processing systems, such as the computation of Hjorth's descriptors, are useful for on-line monitoring of drug-induced EEG modifications which are evident also at the visual visual analysis. Power spectral analysis is suitable to identify and quantify EEG effects not evident at the visual inspection. It demonstrated how the EEG effects of compounds in a long-acting formulation vary according to the sampling time and the explored cerebral area. EEG modifications not detected by power spectral analysis can be defined by comparing statistically (F test) the spectral values of the EEG from a single lead at the different samples (longitudinal comparison), or the spectral values from different leads at any sample (intrahemispheric comparison). The presently available procedures of quantitative pharmaco-EEG are effective when applied to the study of mutltilead EEG recordings in a statistically significant sample of population. They do not seem reliable in the monitoring of directing of neuropyschiatric therapies in single patients, due to individual variability of drug effects.

  7. Explorations in statistics: the log transformation.

    PubMed

    Curran-Everett, Douglas

    2018-06-01

    Learning about statistics is a lot like learning about science: the learning is more meaningful if you can actively explore. This thirteenth installment of Explorations in Statistics explores the log transformation, an established technique that rescales the actual observations from an experiment so that the assumptions of some statistical analysis are better met. A general assumption in statistics is that the variability of some response Y is homogeneous across groups or across some predictor variable X. If the variability-the standard deviation-varies in rough proportion to the mean value of Y, a log transformation can equalize the standard deviations. Moreover, if the actual observations from an experiment conform to a skewed distribution, then a log transformation can make the theoretical distribution of the sample mean more consistent with a normal distribution. This is important: the results of a one-sample t test are meaningful only if the theoretical distribution of the sample mean is roughly normal. If we log-transform our observations, then we want to confirm the transformation was useful. We can do this if we use the Box-Cox method, if we bootstrap the sample mean and the statistic t itself, and if we assess the residual plots from the statistical model of the actual and transformed sample observations.

  8. Experimental Investigations of Non-Stationary Properties In Radiometer Receivers Using Measurements of Multiple Calibration References

    NASA Technical Reports Server (NTRS)

    Racette, Paul; Lang, Roger; Zhang, Zhao-Nan; Zacharias, David; Krebs, Carolyn A. (Technical Monitor)

    2002-01-01

    Radiometers must be periodically calibrated because the receiver response fluctuates. Many techniques exist to correct for the time varying response of a radiometer receiver. An analytical technique has been developed that uses generalized least squares regression (LSR) to predict the performance of a wide variety of calibration algorithms. The total measurement uncertainty including the uncertainty of the calibration can be computed using LSR. The uncertainties of the calibration samples used in the regression are based upon treating the receiver fluctuations as non-stationary processes. Signals originating from the different sources of emission are treated as simultaneously existing random processes. Thus, the radiometer output is a series of samples obtained from these random processes. The samples are treated as random variables but because the underlying processes are non-stationary the statistics of the samples are treated as non-stationary. The statistics of the calibration samples depend upon the time for which the samples are to be applied. The statistics of the random variables are equated to the mean statistics of the non-stationary processes over the interval defined by the time of calibration sample and when it is applied. This analysis opens the opportunity for experimental investigation into the underlying properties of receiver non stationarity through the use of multiple calibration references. In this presentation we will discuss the application of LSR to the analysis of various calibration algorithms, requirements for experimental verification of the theory, and preliminary results from analyzing experiment measurements.

  9. IGESS: a statistical approach to integrating individual-level genotype data and summary statistics in genome-wide association studies.

    PubMed

    Dai, Mingwei; Ming, Jingsi; Cai, Mingxuan; Liu, Jin; Yang, Can; Wan, Xiang; Xu, Zongben

    2017-09-15

    Results from genome-wide association studies (GWAS) suggest that a complex phenotype is often affected by many variants with small effects, known as 'polygenicity'. Tens of thousands of samples are often required to ensure statistical power of identifying these variants with small effects. However, it is often the case that a research group can only get approval for the access to individual-level genotype data with a limited sample size (e.g. a few hundreds or thousands). Meanwhile, summary statistics generated using single-variant-based analysis are becoming publicly available. The sample sizes associated with the summary statistics datasets are usually quite large. How to make the most efficient use of existing abundant data resources largely remains an open question. In this study, we propose a statistical approach, IGESS, to increasing statistical power of identifying risk variants and improving accuracy of risk prediction by i ntegrating individual level ge notype data and s ummary s tatistics. An efficient algorithm based on variational inference is developed to handle the genome-wide analysis. Through comprehensive simulation studies, we demonstrated the advantages of IGESS over the methods which take either individual-level data or summary statistics data as input. We applied IGESS to perform integrative analysis of Crohns Disease from WTCCC and summary statistics from other studies. IGESS was able to significantly increase the statistical power of identifying risk variants and improve the risk prediction accuracy from 63.2% ( ±0.4% ) to 69.4% ( ±0.1% ) using about 240 000 variants. The IGESS software is available at https://github.com/daviddaigithub/IGESS . zbxu@xjtu.edu.cn or xwan@comp.hkbu.edu.hk or eeyang@hkbu.edu.hk. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com

  10. Biostatistical analysis of quantitative immunofluorescence microscopy images.

    PubMed

    Giles, C; Albrecht, M A; Lam, V; Takechi, R; Mamo, J C

    2016-12-01

    Semiquantitative immunofluorescence microscopy has become a key methodology in biomedical research. Typical statistical workflows are considered in the context of avoiding pseudo-replication and marginalising experimental error. However, immunofluorescence microscopy naturally generates hierarchically structured data that can be leveraged to improve statistical power and enrich biological interpretation. Herein, we describe a robust distribution fitting procedure and compare several statistical tests, outlining their potential advantages/disadvantages in the context of biological interpretation. Further, we describe tractable procedures for power analysis that incorporates the underlying distribution, sample size and number of images captured per sample. The procedures outlined have significant potential for increasing understanding of biological processes and decreasing both ethical and financial burden through experimental optimization. © 2016 The Authors Journal of Microscopy © 2016 Royal Microscopical Society.

  11. An Analysis of Methods Used to Examine Gender Differences in Computer-Related Behavior.

    ERIC Educational Resources Information Center

    Kay, Robin

    1992-01-01

    Review of research investigating gender differences in computer-related behavior examines statistical and methodological flaws. Issues addressed include sample selection, sample size, scale development, scale quality, the use of univariate and multivariate analyses, regressional analysis, construct definition, construct testing, and the…

  12. Experimental toxicology: Issues of statistics, experimental design, and replication.

    PubMed

    Briner, Wayne; Kirwan, Jeral

    2017-01-01

    The difficulty of replicating experiments has drawn considerable attention. Issues with replication occur for a variety of reasons ranging from experimental design to laboratory errors to inappropriate statistical analysis. Here we review a variety of guidelines for statistical analysis, design, and execution of experiments in toxicology. In general, replication can be improved by using hypothesis driven experiments with adequate sample sizes, randomization, and blind data collection techniques. Copyright © 2016 Elsevier B.V. All rights reserved.

  13. Statistical properties of relative weight distributions of four salmonid species and their sampling implications

    USGS Publications Warehouse

    Hyatt, M.W.; Hubert, W.A.

    2001-01-01

    We assessed relative weight (Wr) distributions among 291 samples of stock-to-quality-length brook trout Salvelinus fontinalis, brown trout Salmo trutta, rainbow trout Oncorhynchus mykiss, and cutthroat trout O. clarki from lentic and lotic habitats. Statistics describing Wr sample distributions varied slightly among species and habitat types. The average sample was leptokurtotic and slightly skewed to the right with a standard deviation of about 10, but the shapes of Wr distributions varied widely among samples. Twenty-two percent of the samples had nonnormal distributions, suggesting the need to evaluate sample distributions before applying statistical tests to determine whether assumptions are met. In general, our findings indicate that samples of about 100 stock-to-quality-length fish are needed to obtain confidence interval widths of four Wr units around the mean. Power analysis revealed that samples of about 50 stock-to-quality-length fish are needed to detect a 2% change in mean Wr at a relatively high level of power (beta = 0.01, alpha = 0.05).

  14. Water quality analysis of the Rapur area, Andhra Pradesh, South India using multivariate techniques

    NASA Astrophysics Data System (ADS)

    Nagaraju, A.; Sreedhar, Y.; Thejaswi, A.; Sayadi, Mohammad Hossein

    2017-10-01

    The groundwater samples from Rapur area were collected from different sites to evaluate the major ion chemistry. The large number of data can lead to difficulties in the integration, interpretation, and representation of the results. Two multivariate statistical methods, hierarchical cluster analysis (HCA) and factor analysis (FA), were applied to evaluate their usefulness to classify and identify geochemical processes controlling groundwater geochemistry. Four statistically significant clusters were obtained from 30 sampling stations. This has resulted two important clusters viz., cluster 1 (pH, Si, CO3, Mg, SO4, Ca, K, HCO3, alkalinity, Na, Na + K, Cl, and hardness) and cluster 2 (EC and TDS) which are released to the study area from different sources. The application of different multivariate statistical techniques, such as principal component analysis (PCA), assists in the interpretation of complex data matrices for a better understanding of water quality of a study area. From PCA, it is clear that the first factor (factor 1), accounted for 36.2% of the total variance, was high positive loading in EC, Mg, Cl, TDS, and hardness. Based on the PCA scores, four significant cluster groups of sampling locations were detected on the basis of similarity of their water quality.

  15. Characterization of Surface Water and Groundwater Quality in the Lower Tano River Basin Using Statistical and Isotopic Approach.

    NASA Astrophysics Data System (ADS)

    Edjah, Adwoba; Stenni, Barbara; Cozzi, Giulio; Turetta, Clara; Dreossi, Giuliano; Tetteh Akiti, Thomas; Yidana, Sandow

    2017-04-01

    Adwoba Kua- Manza Edjaha, Barbara Stennib,c,Giuliano Dreossib, Giulio Cozzic, Clara Turetta c,T.T Akitid ,Sandow Yidanae a,eDepartment of Earth Science, University of Ghana Legon, Ghana West Africa bDepartment of Enviromental Sciences, Informatics and Statistics, Ca Foscari University of Venice, Italy cInstitute for the Dynamics of Environmental Processes, CNR, Venice, Italy dDepartment of Nuclear Application and Techniques, Graduate School of Nuclear and Allied Sciences University of Ghana Legon This research is part of a PhD research work "Hydrogeological Assessment of the Lower Tano river basin for sustainable economic usage, Ghana, West - Africa". In this study, the researcher investigated surface water and groundwater quality in the Lower Tano river basin. This assessment was based on some selected sampling sites associated with mining activities, and the development of oil and gas. Statistical approach was applied to characterize the quality of surface water and groundwater. Also, water stable isotopes, which is a natural tracer of the hydrological cycle was used to investigate the origin of groundwater recharge in the basin. The study revealed that Pb and Ni values of the surface water and groundwater samples exceeded the WHO standards for drinking water. In addition, water quality index (WQI), based on physicochemical parameters(EC, TDS, pH) and major ions(Ca2+, Na+, Mg2+, HCO3-,NO3-, CL-, SO42-, K+) exhibited good quality water for 60% of the sampled surface water and groundwater. Other statistical techniques, such as Heavy metal pollution index (HPI), degree of contamination (Cd), and heavy metal evaluation index (HEI), based on trace element parameters in the water samples, reveal that 90% of the surface water and groundwater samples belong to high level of pollution. Principal component analysis (PCA) also suggests that the water quality in the basin is likely affected by rock - water interaction and anthropogenic activities (sea water intrusion). This was confirm by further statistical analysis (cluster analysis and correlation matrix) of the water quality parameters. Spatial distribution of water quality parameters, trace elements and the results obtained from the statistical analysis was determined by geographical information system (GIS). In addition, the isotopic analysis of the sampled surface water and groundwater revealed that most of the surface water and groundwater were of meteoric origin with little or no isotopic variations. It is expected that outcomes of this research will form a baseline for making appropriate decision on water quality management by decision makers in the Lower Tano river Basin. Keywords: Water stable isotopes, Trace elements, Multivariate statistics, Evaluation indices, Lower Tano river basin.

  16. BrightStat.com: free statistics online.

    PubMed

    Stricker, Daniel

    2008-10-01

    Powerful software for statistical analysis is expensive. Here I present BrightStat, a statistical software running on the Internet which is free of charge. BrightStat's goals, its main capabilities and functionalities are outlined. Three different sample runs, a Friedman test, a chi-square test, and a step-wise multiple regression are presented. The results obtained by BrightStat are compared with results computed by SPSS, one of the global leader in providing statistical software, and VassarStats, a collection of scripts for data analysis running on the Internet. Elementary statistics is an inherent part of academic education and BrightStat is an alternative to commercial products.

  17. Writing to Learn Statistics in an Advanced Placement Statistics Course

    ERIC Educational Resources Information Center

    Northrup, Christian Glenn

    2012-01-01

    This study investigated the use of writing in a statistics classroom to learn if writing provided a rich description of problem-solving processes of students as they solved problems. Through analysis of 329 written samples provided by students, it was determined that writing provided a rich description of problem-solving processes and enabled…

  18. Applying Statistical Process Control to Clinical Data: An Illustration.

    ERIC Educational Resources Information Center

    Pfadt, Al; And Others

    1992-01-01

    Principles of statistical process control are applied to a clinical setting through the use of control charts to detect changes, as part of treatment planning and clinical decision-making processes. The logic of control chart analysis is derived from principles of statistical inference. Sample charts offer examples of evaluating baselines and…

  19. Web-Based Statistical Sampling and Analysis

    ERIC Educational Resources Information Center

    Quinn, Anne; Larson, Karen

    2016-01-01

    Consistent with the Common Core State Standards for Mathematics (CCSSI 2010), the authors write that they have asked students to do statistics projects with real data. To obtain real data, their students use the free Web-based app, Census at School, created by the American Statistical Association (ASA) to help promote civic awareness among school…

  20. Two-sample statistics for testing the equality of survival functions against improper semi-parametric accelerated failure time alternatives: an application to the analysis of a breast cancer clinical trial.

    PubMed

    Broët, Philippe; Tsodikov, Alexander; De Rycke, Yann; Moreau, Thierry

    2004-06-01

    This paper presents two-sample statistics suited for testing equality of survival functions against improper semi-parametric accelerated failure time alternatives. These tests are designed for comparing either the short- or the long-term effect of a prognostic factor, or both. These statistics are obtained as partial likelihood score statistics from a time-dependent Cox model. As a consequence, the proposed tests can be very easily implemented using widely available software. A breast cancer clinical trial is presented as an example to demonstrate the utility of the proposed tests.

  1. A preliminary study on identification of Thai rice samples by INAA and statistical analysis

    NASA Astrophysics Data System (ADS)

    Kongsri, S.; Kukusamude, C.

    2017-09-01

    This study aims to investigate the elemental compositions in 93 Thai rice samples using instrumental neutron activation analysis (INAA) and to identify rice according to their types and rice cultivars using statistical analysis. As, Mg, Cl, Al, Br, Mn, K, Rb and Zn in Thai jasmine rice and Sung Yod rice samples were successfully determined by INAA. The accuracy and precision of the INAA method were verified by SRM 1568a Rice Flour. All elements were found to be in a good agreement with the certified values. The precisions in term of %RSD were lower than 7%. The LODs were obtained in range of 0.01 to 29 mg kg-1. The concentration of 9 elements distributed in Thai rice samples was evaluated and used as chemical indicators to identify the type of rice samples. The result found that Mg, Cl, As, Br, Mn, K, Rb, and Zn concentrations in Thai jasmine rice samples are significantly different but there was no evidence that Al is significantly different from concentration in Sung Yod rice samples at 95% confidence interval. Our results may provide preliminary information for discrimination of rice samples and may be useful database of Thai rice.

  2. The Analysis of Organizational Diagnosis on Based Six Box Model in Universities

    ERIC Educational Resources Information Center

    Hamid, Rahimi; Siadat, Sayyed Ali; Reza, Hoveida; Arash, Shahin; Ali, Nasrabadi Hasan; Azizollah, Arbabisarjou

    2011-01-01

    Purpose: The analysis of organizational diagnosis on based six box model at universities. Research method: Research method was descriptive-survey. Statistical population consisted of 1544 faculty members of universities which through random strafed sampling method 218 persons were chosen as the sample. Research Instrument were organizational…

  3. A statistical evaluation of spectral fingerprinting methods using analysis of variance and principal component analysis

    USDA-ARS?s Scientific Manuscript database

    Six methods were compared with respect to spectral fingerprinting of a well-characterized series of broccoli samples. Spectral fingerprints were acquired for finely-powdered solid samples using Fourier transform-infrared (IR) and Fourier transform-near infrared (NIR) spectrometry and for aqueous met...

  4. [New design of the Health Survey of Catalonia (Spain, 2010-2014): a step forward in health planning and evaluation].

    PubMed

    Alcañiz-Zanón, Manuela; Mompart-Penina, Anna; Guillén-Estany, Montserrat; Medina-Bustos, Antonia; Aragay-Barbany, Josep M; Brugulat-Guiteras, Pilar; Tresserras-Gaju, Ricard

    2014-01-01

    This article presents the genesis of the Health Survey of Catalonia (Spain, 2010-2014) with its semiannual subsamples and explains the basic characteristics of its multistage sampling design. In comparison with previous surveys, the organizational advantages of this new statistical operation include rapid data availability and the ability to continuously monitor the population. The main benefits are timeliness in the production of indicators and the possibility of introducing new topics through the supplemental questionnaire as a function of needs. Limitations consist of the complexity of the sample design and the lack of longitudinal follow-up of the sample. Suitable sampling weights for each specific subsample are necessary for any statistical analysis of micro-data. Accuracy in the analysis of territorial disaggregation or population subgroups increases if annual samples are accumulated. Copyright © 2013 SESPAS. Published by Elsevier Espana. All rights reserved.

  5. Statistical methods for astronomical data with upper limits. I - Univariate distributions

    NASA Technical Reports Server (NTRS)

    Feigelson, E. D.; Nelson, P. I.

    1985-01-01

    The statistical treatment of univariate censored data is discussed. A heuristic derivation of the Kaplan-Meier maximum-likelihood estimator from first principles is presented which results in an expression amenable to analytic error analysis. Methods for comparing two or more censored samples are given along with simple computational examples, stressing the fact that most astronomical problems involve upper limits while the standard mathematical methods require lower limits. The application of univariate survival analysis to six data sets in the recent astrophysical literature is described, and various aspects of the use of survival analysis in astronomy, such as the limitations of various two-sample tests and the role of parametric modelling, are discussed.

  6. Addendum to Sampling and Analysis Plan (SAP) for Assessment of LANL-Derived Residual Radionuclides in Soils within Tract A-16-d for Land Conveyance and Transfer for Sewage Treatment Facility Area

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Whicker, Jeffrey Jay; Gillis, Jessica Mcdonnel; Ruedig, Elizabeth

    This report summarizes the sampling design used, associated statistical assumptions, as well as general guidelines for conducting post-sampling data analysis. Sampling plan components presented here include how many sampling locations to choose and where within the sampling area to collect those samples. The type of medium to sample (i.e., soil, groundwater, etc.) and how to analyze the samples (in-situ, fixed laboratory, etc.) are addressed in other sections of the sampling plan.

  7. Taking a statistical approach

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wild, M.; Rouhani, S.

    1995-02-01

    A typical site investigation entails extensive sampling and monitoring. In the past, sampling plans have been designed on purely ad hoc bases, leading to significant expenditures and, in some cases, collection of redundant information. In many instances, sampling costs exceed the true worth of the collected data. The US Environmental Protection Agency (EPA) therefore has advocated the use of geostatistics to provide a logical framework for sampling and analysis of environmental data. Geostatistical methodology uses statistical techniques for the spatial analysis of a variety of earth-related data. The use of geostatistics was developed by the mining industry to estimate oremore » concentrations. The same procedure is effective in quantifying environmental contaminants in soils for risk assessments. Unlike classical statistical techniques, geostatistics offers procedures to incorporate the underlying spatial structure of the investigated field. Sample points spaced close together tend to be more similar than samples spaced further apart. This can guide sampling strategies and determine complex contaminant distributions. Geostatistic techniques can be used to evaluate site conditions on the basis of regular, irregular, random and even spatially biased samples. In most environmental investigations, it is desirable to concentrate sampling in areas of known or suspected contamination. The rigorous mathematical procedures of geostatistics allow for accurate estimates at unsampled locations, potentially reducing sampling requirements. The use of geostatistics serves as a decision-aiding and planning tool and can significantly reduce short-term site assessment costs, long-term sampling and monitoring needs, as well as lead to more accurate and realistic remedial design criteria.« less

  8. Measuring, Understanding, and Responding to Covert Social Networks: Passive and Active Tomography

    DTIC Science & Technology

    2017-11-29

    Methods for generating a random sample of networks with desired properties are important tools for the analysis of social , biological, and information...on Theoretical Foundations for Statistical Network Analysis at the Isaac Newton Institute for Mathematical Sciences at Cambridge U. (organized by...Approach SOCIAL SCIENCES STATISTICS EECS Problems span three disciplines Scientific focus is needed at the interfaces

  9. An evaluation of the quality of statistical design and analysis of published medical research: results from a systematic survey of general orthopaedic journals.

    PubMed

    Parsons, Nick R; Price, Charlotte L; Hiskens, Richard; Achten, Juul; Costa, Matthew L

    2012-04-25

    The application of statistics in reported research in trauma and orthopaedic surgery has become ever more important and complex. Despite the extensive use of statistical analysis, it is still a subject which is often not conceptually well understood, resulting in clear methodological flaws and inadequate reporting in many papers. A detailed statistical survey sampled 100 representative orthopaedic papers using a validated questionnaire that assessed the quality of the trial design and statistical analysis methods. The survey found evidence of failings in study design, statistical methodology and presentation of the results. Overall, in 17% (95% confidence interval; 10-26%) of the studies investigated the conclusions were not clearly justified by the results, in 39% (30-49%) of studies a different analysis should have been undertaken and in 17% (10-26%) a different analysis could have made a difference to the overall conclusions. It is only by an improved dialogue between statistician, clinician, reviewer and journal editor that the failings in design methodology and analysis highlighted by this survey can be addressed.

  10. Spatial Autocorrelation Approaches to Testing Residuals from Least Squares Regression.

    PubMed

    Chen, Yanguang

    2016-01-01

    In geo-statistics, the Durbin-Watson test is frequently employed to detect the presence of residual serial correlation from least squares regression analyses. However, the Durbin-Watson statistic is only suitable for ordered time or spatial series. If the variables comprise cross-sectional data coming from spatial random sampling, the test will be ineffectual because the value of Durbin-Watson's statistic depends on the sequence of data points. This paper develops two new statistics for testing serial correlation of residuals from least squares regression based on spatial samples. By analogy with the new form of Moran's index, an autocorrelation coefficient is defined with a standardized residual vector and a normalized spatial weight matrix. Then by analogy with the Durbin-Watson statistic, two types of new serial correlation indices are constructed. As a case study, the two newly presented statistics are applied to a spatial sample of 29 China's regions. These results show that the new spatial autocorrelation models can be used to test the serial correlation of residuals from regression analysis. In practice, the new statistics can make up for the deficiencies of the Durbin-Watson test.

  11. Reliability and statistical power analysis of cortical and subcortical FreeSurfer metrics in a large sample of healthy elderly.

    PubMed

    Liem, Franziskus; Mérillat, Susan; Bezzola, Ladina; Hirsiger, Sarah; Philipp, Michel; Madhyastha, Tara; Jäncke, Lutz

    2015-03-01

    FreeSurfer is a tool to quantify cortical and subcortical brain anatomy automatically and noninvasively. Previous studies have reported reliability and statistical power analyses in relatively small samples or only selected one aspect of brain anatomy. Here, we investigated reliability and statistical power of cortical thickness, surface area, volume, and the volume of subcortical structures in a large sample (N=189) of healthy elderly subjects (64+ years). Reliability (intraclass correlation coefficient) of cortical and subcortical parameters is generally high (cortical: ICCs>0.87, subcortical: ICCs>0.95). Surface-based smoothing increases reliability of cortical thickness maps, while it decreases reliability of cortical surface area and volume. Nevertheless, statistical power of all measures benefits from smoothing. When aiming to detect a 10% difference between groups, the number of subjects required to test effects with sufficient power over the entire cortex varies between cortical measures (cortical thickness: N=39, surface area: N=21, volume: N=81; 10mm smoothing, power=0.8, α=0.05). For subcortical regions this number is between 16 and 76 subjects, depending on the region. We also demonstrate the advantage of within-subject designs over between-subject designs. Furthermore, we publicly provide a tool that allows researchers to perform a priori power analysis and sensitivity analysis to help evaluate previously published studies and to design future studies with sufficient statistical power. Copyright © 2014 Elsevier Inc. All rights reserved.

  12. MetaComp: comprehensive analysis software for comparative meta-omics including comparative metagenomics.

    PubMed

    Zhai, Peng; Yang, Longshu; Guo, Xiao; Wang, Zhe; Guo, Jiangtao; Wang, Xiaoqi; Zhu, Huaiqiu

    2017-10-02

    During the past decade, the development of high throughput nucleic sequencing and mass spectrometry analysis techniques have enabled the characterization of microbial communities through metagenomics, metatranscriptomics, metaproteomics and metabolomics data. To reveal the diversity of microbial communities and interactions between living conditions and microbes, it is necessary to introduce comparative analysis based upon integration of all four types of data mentioned above. Comparative meta-omics, especially comparative metageomics, has been established as a routine process to highlight the significant differences in taxon composition and functional gene abundance among microbiota samples. Meanwhile, biologists are increasingly concerning about the correlations between meta-omics features and environmental factors, which may further decipher the adaptation strategy of a microbial community. We developed a graphical comprehensive analysis software named MetaComp comprising a series of statistical analysis approaches with visualized results for metagenomics and other meta-omics data comparison. This software is capable to read files generated by a variety of upstream programs. After data loading, analyses such as multivariate statistics, hypothesis testing of two-sample, multi-sample as well as two-group sample and a novel function-regression analysis of environmental factors are offered. Here, regression analysis regards meta-omic features as independent variable and environmental factors as dependent variables. Moreover, MetaComp is capable to automatically choose an appropriate two-group sample test based upon the traits of input abundance profiles. We further evaluate the performance of its choice, and exhibit applications for metagenomics, metaproteomics and metabolomics samples. MetaComp, an integrative software capable for applying to all meta-omics data, originally distills the influence of living environment on microbial community by regression analysis. Moreover, since the automatically chosen two-group sample test is verified to be outperformed, MetaComp is friendly to users without adequate statistical training. These improvements are aiming to overcome the new challenges under big data era for all meta-omics data. MetaComp is available at: http://cqb.pku.edu.cn/ZhuLab/MetaComp/ and https://github.com/pzhaipku/MetaComp/ .

  13. Monitoring of bread cooling by statistical analysis of laser speckle patterns

    NASA Astrophysics Data System (ADS)

    Lyubenova, Tanya; Stoykova, Elena; Nacheva, Elena; Ivanov, Branimir; Panchev, Ivan; Sainov, Ventseslav

    2013-03-01

    The phenomenon of laser speckle can be used for detection and visualization of physical or biological activity in various objects (e.g. fruits, seeds, coatings) through statistical description of speckle dynamics. The paper presents the results of non-destructive monitoring of bread cooling by co-occurrence matrix and temporal structure function analysis of speckle patterns which have been recorded continuously within a few days. In total, 72960 and 39680 images were recorded and processed for two similar bread samples respectively. The experiments proved the expected steep decrease of activity related to the processes in the bread samples during the first several hours and revealed its oscillating character within the next few days. Characterization of activity over the bread sample surface was also obtained.

  14. Comparison of four-hour and twenty-four-hour refrigerated storage of nonpotable water for fecal coliform analysis.

    PubMed Central

    Standridge, J H; Lesar, D J

    1977-01-01

    The problem of extending the storage time of water samples for fecal coliform analysis was addressed. Included in this report is a literature review of the storage problem. Twenty-eight samples were analyzed in replicate to determine the effect of 24-h storage of water samples at 4 degrees C. A new statistical approach to data analysis, coupled with the concept of practical acceptability, is presented. According to our results, many samples can successfully be stored at 4 degrees C for 24 h. PMID:335972

  15. Multi-Reader ROC studies with Split-Plot Designs: A Comparison of Statistical Methods

    PubMed Central

    Obuchowski, Nancy A.; Gallas, Brandon D.; Hillis, Stephen L.

    2012-01-01

    Rationale and Objectives Multi-reader imaging trials often use a factorial design, where study patients undergo testing with all imaging modalities and readers interpret the results of all tests for all patients. A drawback of the design is the large number of interpretations required of each reader. Split-plot designs have been proposed as an alternative, in which one or a subset of readers interprets all images of a sample of patients, while other readers interpret the images of other samples of patients. In this paper we compare three methods of analysis for the split-plot design. Materials and Methods Three statistical methods are presented: Obuchowski-Rockette method modified for the split-plot design, a newly proposed marginal-mean ANOVA approach, and an extension of the three-sample U-statistic method. A simulation study using the Roe-Metz model was performed to compare the type I error rate, power and confidence interval coverage of the three test statistics. Results The type I error rates for all three methods are close to the nominal level but tend to be slightly conservative. The statistical power is nearly identical for the three methods. The coverage of 95% CIs fall close to the nominal coverage for small and large sample sizes. Conclusions The split-plot MRMC study design can be statistically efficient compared with the factorial design, reducing the number of interpretations required per reader. Three methods of analysis, shown to have nominal type I error rate, similar power, and nominal CI coverage, are available for this study design. PMID:23122570

  16. The large sample size fallacy.

    PubMed

    Lantz, Björn

    2013-06-01

    Significance in the statistical sense has little to do with significance in the common practical sense. Statistical significance is a necessary but not a sufficient condition for practical significance. Hence, results that are extremely statistically significant may be highly nonsignificant in practice. The degree of practical significance is generally determined by the size of the observed effect, not the p-value. The results of studies based on large samples are often characterized by extreme statistical significance despite small or even trivial effect sizes. Interpreting such results as significant in practice without further analysis is referred to as the large sample size fallacy in this article. The aim of this article is to explore the relevance of the large sample size fallacy in contemporary nursing research. Relatively few nursing articles display explicit measures of observed effect sizes or include a qualitative discussion of observed effect sizes. Statistical significance is often treated as an end in itself. Effect sizes should generally be calculated and presented along with p-values for statistically significant results, and observed effect sizes should be discussed qualitatively through direct and explicit comparisons with the effects in related literature. © 2012 Nordic College of Caring Science.

  17. An Analysis of Variance Framework for Matrix Sampling.

    ERIC Educational Resources Information Center

    Sirotnik, Kenneth

    Significant cost savings can be achieved with the use of matrix sampling in estimating population parameters from psychometric data. The statistical design is intuitively simple, using the framework of the two-way classification analysis of variance technique. For example, the mean and variance are derived from the performance of a certain grade…

  18. Introducing Undergraduate Students to Metabolomics Using a NMR-Based Analysis of Coffee Beans

    ERIC Educational Resources Information Center

    Sandusky, Peter Olaf

    2017-01-01

    Metabolomics applies multivariate statistical analysis to sets of high-resolution spectra taken over a population of biologically derived samples. The objective is to distinguish subpopulations within the overall sample population, and possibly also to identify biomarkers. While metabolomics has become part of the standard analytical toolbox in…

  19. Application of Multivariate Statistical Analysis to Biomarkers in Se-Turkey Crude Oils

    NASA Astrophysics Data System (ADS)

    Gürgey, K.; Canbolat, S.

    2017-11-01

    Twenty-four crude oil samples were collected from the 24 oil fields distributed in different districts of SE-Turkey. API and Sulphur content (%), Stable Carbon Isotope, Gas Chromatography (GC), and Gas Chromatography-Mass Spectrometry (GC-MS) data were used to construct a geochemical data matrix. The aim of this study is to examine the genetic grouping or correlations in the crude oil samples, hence the number of source rocks present in the SE-Turkey. To achieve these aims, two of the multivariate statistical analysis techniques (Principle Component Analysis [PCA] and Cluster Analysis were applied to data matrix of 24 samples and 8 source specific biomarker variables/parameters. The results showed that there are 3 genetically different oil groups: Batman-Nusaybin Oils, Adıyaman-Kozluk Oils and Diyarbakir Oils, in addition to a one mixed group. These groupings imply that at least, three different source rocks are present in South-Eastern (SE) Turkey. Grouping of the crude oil samples appears to be consistent with the geographic locations of the oils fields, subsurface stratigraphy as well as geology of the area.

  20. Reporting quality of statistical methods in surgical observational studies: protocol for systematic review.

    PubMed

    Wu, Robert; Glen, Peter; Ramsay, Tim; Martel, Guillaume

    2014-06-28

    Observational studies dominate the surgical literature. Statistical adjustment is an important strategy to account for confounders in observational studies. Research has shown that published articles are often poor in statistical quality, which may jeopardize their conclusions. The Statistical Analyses and Methods in the Published Literature (SAMPL) guidelines have been published to help establish standards for statistical reporting.This study will seek to determine whether the quality of statistical adjustment and the reporting of these methods are adequate in surgical observational studies. We hypothesize that incomplete reporting will be found in all surgical observational studies, and that the quality and reporting of these methods will be of lower quality in surgical journals when compared with medical journals. Finally, this work will seek to identify predictors of high-quality reporting. This work will examine the top five general surgical and medical journals, based on a 5-year impact factor (2007-2012). All observational studies investigating an intervention related to an essential component area of general surgery (defined by the American Board of Surgery), with an exposure, outcome, and comparator, will be included in this systematic review. Essential elements related to statistical reporting and quality were extracted from the SAMPL guidelines and include domains such as intent of analysis, primary analysis, multiple comparisons, numbers and descriptive statistics, association and correlation analyses, linear regression, logistic regression, Cox proportional hazard analysis, analysis of variance, survival analysis, propensity analysis, and independent and correlated analyses. Each article will be scored as a proportion based on fulfilling criteria in relevant analyses used in the study. A logistic regression model will be built to identify variables associated with high-quality reporting. A comparison will be made between the scores of surgical observational studies published in medical versus surgical journals. Secondary outcomes will pertain to individual domains of analysis. Sensitivity analyses will be conducted. This study will explore the reporting and quality of statistical analyses in surgical observational studies published in the most referenced surgical and medical journals in 2013 and examine whether variables (including the type of journal) can predict high-quality reporting.

  1. Differentiation of chocolates according to the cocoa's geographical origin using chemometrics.

    PubMed

    Cambrai, Amandine; Marcic, Christophe; Morville, Stéphane; Sae Houer, Pierre; Bindler, Françoise; Marchioni, Eric

    2010-02-10

    The determination of the geographical origin of cocoa used to produce chocolate has been assessed through the analysis of the volatile compounds of chocolate samples. The analysis of the volatile content and their statistical processing by multivariate analyses tended to form independent groups for both Africa and Madagascar, even if some of the chocolate samples analyzed appeared in a mixed zone together with those from America. This analysis also allowed a clear separation between Caribbean chocolates and those from other origins. Height compounds (such as linalool or (E,E)-2,4-decadienal) characteristic of chocolate's different geographical origins were also identified. The method described in this work (hydrodistillation, GC analysis, and statistic treatment) may improve the control of the geographical origin of chocolate during its long production process.

  2. Assessing human metal accumulations in an urban superfund site.

    PubMed

    Hailer, M Katie; Peck, Christopher P; Calhoun, Michael W; West, Robert F; James, Kyle J; Siciliano, Steven D

    2017-09-01

    Butte, Montana is part of the largest superfund site in the continental United States. Open-pit mining continues in close proximity to Butte's urban population. This study seeks to establish baseline metal concentrations in the hair and blood of individuals living in Butte, MT and possible routes of exposure. Volunteers from Butte (n=116) and Bozeman (n=86) were recruited to submit hair and blood samples and asked to complete a lifestyle survey. Elemental analysis of hair and blood samples was performed by ICP-MS. Three air monitors were stationed in Butte to collect particulate and filters were analyzed by ICP-MS. Soil samples from the yards of Butte volunteers were quantified by ICP-MS. Hair analysis revealed concentrations of Al, As, Cd, Cu, Mn, Mo, and U to be statistically elevated in Butte's population. Blood analysis revealed that the concentration of As was also statistically elevated in the Butte population. Multiple regression analysis was performed for the elements As, Cu, and Mn for hair and blood samples. Soil samples revealed detectable levels of As, Pb, Cu, Mn, and Cd, with As and Cu levels being higher than expected in some of the samples. Air sampling revealed consistently elevated As and Mn levels in the larger particulate sampled as compared to average U.S. ambient air data. Copyright © 2017 Elsevier B.V. All rights reserved.

  3. Small sample estimation of the reliability function for technical products

    NASA Astrophysics Data System (ADS)

    Lyamets, L. L.; Yakimenko, I. V.; Kanishchev, O. A.; Bliznyuk, O. A.

    2017-12-01

    It is demonstrated that, in the absence of big statistic samples obtained as a result of testing complex technical products for failure, statistic estimation of the reliability function of initial elements can be made by the moments method. A formal description of the moments method is given and its advantages in the analysis of small censored samples are discussed. A modified algorithm is proposed for the implementation of the moments method with the use of only the moments at which the failures of initial elements occur.

  4. Two-Sample Statistics for Testing the Equality of Survival Functions Against Improper Semi-parametric Accelerated Failure Time Alternatives: An Application to the Analysis of a Breast Cancer Clinical Trial

    PubMed Central

    BROËT, PHILIPPE; TSODIKOV, ALEXANDER; DE RYCKE, YANN; MOREAU, THIERRY

    2010-01-01

    This paper presents two-sample statistics suited for testing equality of survival functions against improper semi-parametric accelerated failure time alternatives. These tests are designed for comparing either the short- or the long-term effect of a prognostic factor, or both. These statistics are obtained as partial likelihood score statistics from a time-dependent Cox model. As a consequence, the proposed tests can be very easily implemented using widely available software. A breast cancer clinical trial is presented as an example to demonstrate the utility of the proposed tests. PMID:15293627

  5. Protein Multiplexed Immunoassay Analysis with R.

    PubMed

    Breen, Edmond J

    2017-01-01

    Plasma samples from 177 control and type 2 diabetes patients collected at three Australian hospitals are screened for 14 analytes using six custom-made multiplex kits across 60 96-well plates. In total 354 samples were collected from the patients, representing one baseline and one end point sample from each patient. R methods and source code for analyzing the analyte fluorescence response obtained from these samples by Luminex Bio-Plex ® xMap multiplexed immunoassay technology are disclosed. Techniques and R procedures for reading Bio-Plex ® result files for statistical analysis and data visualization are also presented. The need for technical replicates and the number of technical replicates are addressed as well as plate layout design strategies. Multinomial regression is used to determine plate to sample covariate balance. Methods for matching clinical covariate information to Bio-Plex ® results and vice versa are given. As well as methods for measuring and inspecting the quality of the fluorescence responses are presented. Both fixed and mixed-effect approaches for immunoassay statistical differential analysis are presented and discussed. A random effect approach to outlier analysis and detection is also shown. The bioinformatics R methodology present here provides a foundation for rigorous and reproducible analysis of the fluorescence response obtained from multiplexed immunoassays.

  6. Multi-trait analysis of genome-wide association summary statistics using MTAG.

    PubMed

    Turley, Patrick; Walters, Raymond K; Maghzian, Omeed; Okbay, Aysu; Lee, James J; Fontana, Mark Alan; Nguyen-Viet, Tuan Anh; Wedow, Robbee; Zacher, Meghan; Furlotte, Nicholas A; Magnusson, Patrik; Oskarsson, Sven; Johannesson, Magnus; Visscher, Peter M; Laibson, David; Cesarini, David; Neale, Benjamin M; Benjamin, Daniel J

    2018-02-01

    We introduce multi-trait analysis of GWAS (MTAG), a method for joint analysis of summary statistics from genome-wide association studies (GWAS) of different traits, possibly from overlapping samples. We apply MTAG to summary statistics for depressive symptoms (N eff  = 354,862), neuroticism (N = 168,105), and subjective well-being (N = 388,538). As compared to the 32, 9, and 13 genome-wide significant loci identified in the single-trait GWAS (most of which are themselves novel), MTAG increases the number of associated loci to 64, 37, and 49, respectively. Moreover, association statistics from MTAG yield more informative bioinformatics analyses and increase the variance explained by polygenic scores by approximately 25%, matching theoretical expectations.

  7. A Statistical Analysis Plan to Support the Joint Forward Area Air Defense Test.

    DTIC Science & Technology

    1984-08-02

    hy estahlishing a specific significance level prior to performing the statistical test (traditionally a levels are set at .01 or .05). What is often...undesirable increase in 8. For constant a levels , the power (I - 8) of a statistical test can he increased by Increasing the sample size of the test. fRef...ANOVA Iparison Test on MOP I=--ferences Exist AmongF "Upon MOP "A" Factor I "A" Factor I 1MOP " A " Levels ? I . I I I _ _ ________ IPerform k-Sample Com- I

  8. The Importance of Practice in the Development of Statistics.

    DTIC Science & Technology

    1983-01-01

    RESOLUTION TEST CHART NATIONAL BUREAU OIF STANDARDS 1963 -A NRC Technical Summary Report #2471 C THE IMORTANCE OF PRACTICE IN to THE DEVELOPMENT OF STATISTICS...component analysis, bioassay, limits for a ratio, quality control, sampling inspection, non-parametric tests , transformation theory, ARIMA time series...models, sequential tests , cumulative sum charts, data analysis plotting techniques, and a resolution of the Bayes - frequentist controversy. It appears

  9. Directions for new developments on statistical design and analysis of small population group trials.

    PubMed

    Hilgers, Ralf-Dieter; Roes, Kit; Stallard, Nigel

    2016-06-14

    Most statistical design and analysis methods for clinical trials have been developed and evaluated where at least several hundreds of patients could be recruited. These methods may not be suitable to evaluate therapies if the sample size is unavoidably small, which is usually termed by small populations. The specific sample size cut off, where the standard methods fail, needs to be investigated. In this paper, the authors present their view on new developments for design and analysis of clinical trials in small population groups, where conventional statistical methods may be inappropriate, e.g., because of lack of power or poor adherence to asymptotic approximations due to sample size restrictions. Following the EMA/CHMP guideline on clinical trials in small populations, we consider directions for new developments in the area of statistical methodology for design and analysis of small population clinical trials. We relate the findings to the research activities of three projects, Asterix, IDeAl, and InSPiRe, which have received funding since 2013 within the FP7-HEALTH-2013-INNOVATION-1 framework of the EU. As not all aspects of the wide research area of small population clinical trials can be addressed, we focus on areas where we feel advances are needed and feasible. The general framework of the EMA/CHMP guideline on small population clinical trials stimulates a number of research areas. These serve as the basis for the three projects, Asterix, IDeAl, and InSPiRe, which use various approaches to develop new statistical methodology for design and analysis of small population clinical trials. Small population clinical trials refer to trials with a limited number of patients. Small populations may result form rare diseases or specific subtypes of more common diseases. New statistical methodology needs to be tailored to these specific situations. The main results from the three projects will constitute a useful toolbox for improved design and analysis of small population clinical trials. They address various challenges presented by the EMA/CHMP guideline as well as recent discussions about extrapolation. There is a need for involvement of the patients' perspective in the planning and conduct of small population clinical trials for a successful therapy evaluation.

  10. Statistical Considerations of Food Allergy Prevention Studies.

    PubMed

    Bahnson, Henry T; du Toit, George; Lack, Gideon

    Clinical studies to prevent the development of food allergy have recently helped reshape public policy recommendations on the early introduction of allergenic foods. These trials are also prompting new research, and it is therefore important to address the unique design and analysis challenges of prevention trials. We highlight statistical concepts and give recommendations that clinical researchers may wish to adopt when designing future study protocols and analysis plans for prevention studies. Topics include selecting a study sample, addressing internal and external validity, improving statistical power, choosing alpha and beta, analysis innovations to address dilution effects, and analysis methods to deal with poor compliance, dropout, and missing data. Copyright © 2017 The Authors. Published by Elsevier Inc. All rights reserved.

  11. A proposed method to minimize waste from institutional radiation safety surveillance programs through the application of expected value statistics.

    PubMed

    Emery, R J

    1997-03-01

    Institutional radiation safety programs routinely use wipe test sampling and liquid scintillation counting analysis to indicate the presence of removable radioactive contamination. Significant volumes of liquid waste can be generated by such surveillance activities, and the subsequent disposal of these materials can sometimes be difficult and costly. In settings where large numbers of negative results are regularly obtained, the limited grouping of samples for analysis based on expected value statistical techniques is possible. To demonstrate the plausibility of the approach, single wipe samples exposed to varying amounts of contamination were analyzed concurrently with nine non-contaminated samples. Although the sample grouping inevitably leads to increased quenching with liquid scintillation counting systems, the effect did not impact the ability to detect removable contamination in amounts well below recommended action levels. Opportunities to further improve this cost effective semi-quantitative screening procedure are described, including improvements in sample collection procedures, enhancing sample-counting media contact through mixing and extending elution periods, increasing sample counting times, and adjusting institutional action levels.

  12. Analyzing thematic maps and mapping for accuracy

    USGS Publications Warehouse

    Rosenfield, G.H.

    1982-01-01

    Two problems which exist while attempting to test the accuracy of thematic maps and mapping are: (1) evaluating the accuracy of thematic content, and (2) evaluating the effects of the variables on thematic mapping. Statistical analysis techniques are applicable to both these problems and include techniques for sampling the data and determining their accuracy. In addition, techniques for hypothesis testing, or inferential statistics, are used when comparing the effects of variables. A comprehensive and valid accuracy test of a classification project, such as thematic mapping from remotely sensed data, includes the following components of statistical analysis: (1) sample design, including the sample distribution, sample size, size of the sample unit, and sampling procedure; and (2) accuracy estimation, including estimation of the variance and confidence limits. Careful consideration must be given to the minimum sample size necessary to validate the accuracy of a given. classification category. The results of an accuracy test are presented in a contingency table sometimes called a classification error matrix. Usually the rows represent the interpretation, and the columns represent the verification. The diagonal elements represent the correct classifications. The remaining elements of the rows represent errors by commission, and the remaining elements of the columns represent the errors of omission. For tests of hypothesis that compare variables, the general practice has been to use only the diagonal elements from several related classification error matrices. These data are arranged in the form of another contingency table. The columns of the table represent the different variables being compared, such as different scales of mapping. The rows represent the blocking characteristics, such as the various categories of classification. The values in the cells of the tables might be the counts of correct classification or the binomial proportions of these counts divided by either the row totals or the column totals from the original classification error matrices. In hypothesis testing, when the results of tests of multiple sample cases prove to be significant, some form of statistical test must be used to separate any results that differ significantly from the others. In the past, many analyses of the data in this error matrix were made by comparing the relative magnitudes of the percentage of correct classifications, for either individual categories, the entire map or both. More rigorous analyses have used data transformations and (or) two-way classification analysis of variance. A more sophisticated step of data analysis techniques would be to use the entire classification error matrices using the methods of discrete multivariate analysis or of multiviariate analysis of variance.

  13. [Practical aspects regarding sample size in clinical research].

    PubMed

    Vega Ramos, B; Peraza Yanes, O; Herrera Correa, G; Saldívar Toraya, S

    1996-01-01

    The knowledge of the right sample size let us to be sure if the published results in medical papers had a suitable design and a proper conclusion according to the statistics analysis. To estimate the sample size we must consider the type I error, type II error, variance, the size of the effect, significance and power of the test. To decide what kind of mathematics formula will be used, we must define what kind of study we have, it means if its a prevalence study, a means values one or a comparative one. In this paper we explain some basic topics of statistics and we describe four simple samples of estimation of sample size.

  14. Augmenting Latent Dirichlet Allocation and Rank Threshold Detection with Ontologies

    DTIC Science & Technology

    2010-03-01

    Probabilistic Latent Semantic Indexing (PLSI) is an automated indexing information retrieval model [20]. It is based on a statistical latent class model which is...uses a statistical foundation that is more accurate in finding hidden semantic relationships [20]. The model uses factor analysis of count data, number...principle of statistical infer- ence which asserts that all of the information in a sample is contained in the likelihood function [20]. The statistical

  15. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Gilbert, Richard O.

    The application of statistics to environmental pollution monitoring studies requires a knowledge of statistical analysis methods particularly well suited to pollution data. This book fills that need by providing sampling plans, statistical tests, parameter estimation procedure techniques, and references to pertinent publications. Most of the statistical techniques are relatively simple, and examples, exercises, and case studies are provided to illustrate procedures. The book is logically divided into three parts. Chapters 1, 2, and 3 are introductory chapters. Chapters 4 through 10 discuss field sampling designs and Chapters 11 through 18 deal with a broad range of statistical analysis procedures. Somemore » statistical techniques given here are not commonly seen in statistics book. For example, see methods for handling correlated data (Sections 4.5 and 11.12), for detecting hot spots (Chapter 10), and for estimating a confidence interval for the mean of a lognormal distribution (Section 13.2). Also, Appendix B lists a computer code that estimates and tests for trends over time at one or more monitoring stations using nonparametric methods (Chapters 16 and 17). Unfortunately, some important topics could not be included because of their complexity and the need to limit the length of the book. For example, only brief mention could be made of time series analysis using Box-Jenkins methods and of kriging techniques for estimating spatial and spatial-time patterns of pollution, although multiple references on these topics are provided. Also, no discussion of methods for assessing risks from environmental pollution could be included.« less

  16. Identifying biologically relevant differences between metagenomic communities.

    PubMed

    Parks, Donovan H; Beiko, Robert G

    2010-03-15

    Metagenomics is the study of genetic material recovered directly from environmental samples. Taxonomic and functional differences between metagenomic samples can highlight the influence of ecological factors on patterns of microbial life in a wide range of habitats. Statistical hypothesis tests can help us distinguish ecological influences from sampling artifacts, but knowledge of only the P-value from a statistical hypothesis test is insufficient to make inferences about biological relevance. Current reporting practices for pairwise comparative metagenomics are inadequate, and better tools are needed for comparative metagenomic analysis. We have developed a new software package, STAMP, for comparative metagenomics that supports best practices in analysis and reporting. Examination of a pair of iron mine metagenomes demonstrates that deeper biological insights can be gained using statistical techniques available in our software. An analysis of the functional potential of 'Candidatus Accumulibacter phosphatis' in two enhanced biological phosphorus removal metagenomes identified several subsystems that differ between the A.phosphatis stains in these related communities, including phosphate metabolism, secretion and metal transport. Python source code and binaries are freely available from our website at http://kiwi.cs.dal.ca/Software/STAMP CONTACT: beiko@cs.dal.ca Supplementary data are available at Bioinformatics online.

  17. Some Tests of Randomness with Applications

    DTIC Science & Technology

    1981-02-01

    freedom. For further details, the reader is referred to Gnanadesikan (1977, p. 169) wherein other relevant tests are also given, Graphical tests, as...sample from a gamma distri- bution. J. Am. Statist. Assoc. 71, 480-7. Gnanadesikan , R. (1977). Methods for Statistical Data Analysis of Multivariate

  18. Provenance establishment of coffee using solution ICP-MS and ICP-AES.

    PubMed

    Valentin, Jenna L; Watling, R John

    2013-11-01

    Statistical interpretation of the concentrations of 59 elements, determined using solution based inductively coupled plasma mass spectrometry (ICP-MS) and inductively coupled plasma emission spectroscopy (ICP-AES), was used to establish the provenance of coffee samples from 15 countries across five continents. Data confirmed that the harvest year, degree of ripeness and whether the coffees were green or roasted had little effect on the elemental composition of the coffees. The application of linear discriminant analysis and principal component analysis of the elemental concentrations permitted up to 96.9% correct classification of the coffee samples according to their continent of origin. When samples from each continent were considered separately, up to 100% correct classification of coffee samples into their countries, and plantations of origin was achieved. This research demonstrates the potential of using elemental composition, in combination with statistical classification methods, for accurate provenance establishment of coffee. Copyright © 2013 Elsevier Ltd. All rights reserved.

  19. Sample Size Calculations for Precise Interval Estimation of the Eta-Squared Effect Size

    ERIC Educational Resources Information Center

    Shieh, Gwowen

    2015-01-01

    Analysis of variance is one of the most frequently used statistical analyses in the behavioral, educational, and social sciences, and special attention has been paid to the selection and use of an appropriate effect size measure of association in analysis of variance. This article presents the sample size procedures for precise interval estimation…

  20. Using Multi-Group Confirmatory Factor Analysis to Evaluate Cross-Cultural Research: Identifying and Understanding Non-Invariance

    ERIC Educational Resources Information Center

    Brown, Gavin T. L.; Harris, Lois R.; O'Quin, Chrissie; Lane, Kenneth E.

    2017-01-01

    Multi-group confirmatory factor analysis (MGCFA) allows researchers to determine whether a research inventory elicits similar response patterns across samples. If statistical equivalence in responding is found, then scale score comparisons become possible and samples can be said to be from the same population. This paper illustrates the use of…

  1. Sampling benthic macroinvertebrates in a large flood-plain river: Considerations of study design, sample size, and cost

    USGS Publications Warehouse

    Bartsch, L.A.; Richardson, W.B.; Naimo, T.J.

    1998-01-01

    Estimation of benthic macroinvertebrate populations over large spatial scales is difficult due to the high variability in abundance and the cost of sample processing and taxonomic analysis. To determine a cost-effective, statistically powerful sample design, we conducted an exploratory study of the spatial variation of benthic macroinvertebrates in a 37 km reach of the Upper Mississippi River. We sampled benthos at 36 sites within each of two strata, contiguous backwater and channel border. Three standard ponar (525 cm(2)) grab samples were obtained at each site ('Original Design'). Analysis of variance and sampling cost of strata-wide estimates for abundance of Oligochaeta, Chironomidae, and total invertebrates showed that only one ponar sample per site ('Reduced Design') yielded essentially the same abundance estimates as the Original Design, while reducing the overall cost by 63%. A posteriori statistical power analysis (alpha = 0.05, beta = 0.20) on the Reduced Design estimated that at least 18 sites per stratum were needed to detect differences in mean abundance between contiguous backwater and channel border areas for Oligochaeta, Chironomidae, and total invertebrates. Statistical power was nearly identical for the three taxonomic groups. The abundances of several taxa of concern (e.g., Hexagenia mayflies and Musculium fingernail clams) were too spatially variable to estimate power with our method. Resampling simulations indicated that to achieve adequate sampling precision for Oligochaeta, at least 36 sample sites per stratum would be required, whereas a sampling precision of 0.2 would not be attained with any sample size for Hexagenia in channel border areas, or Chironomidae and Musculium in both strata given the variance structure of the original samples. Community-wide diversity indices (Brillouin and 1-Simpsons) increased as sample area per site increased. The backwater area had higher diversity than the channel border area. The number of sampling sites required to sample benthic macroinvertebrates during our sampling period depended on the study objective and ranged from 18 to more than 40 sites per stratum. No single sampling regime would efficiently and adequately sample all components of the macroinvertebrate community.

  2. Assessing signal-to-noise in quantitative proteomics: multivariate statistical analysis in DIGE experiments.

    PubMed

    Friedman, David B

    2012-01-01

    All quantitative proteomics experiments measure variation between samples. When performing large-scale experiments that involve multiple conditions or treatments, the experimental design should include the appropriate number of individual biological replicates from each condition to enable the distinction between a relevant biological signal from technical noise. Multivariate statistical analyses, such as principal component analysis (PCA), provide a global perspective on experimental variation, thereby enabling the assessment of whether the variation describes the expected biological signal or the unanticipated technical/biological noise inherent in the system. Examples will be shown from high-resolution multivariable DIGE experiments where PCA was instrumental in demonstrating biologically significant variation as well as sample outliers, fouled samples, and overriding technical variation that would not be readily observed using standard univariate tests.

  3. Expression Profiling of Nonpolar Lipids in Meibum From Patients With Dry Eye: A Pilot Study

    PubMed Central

    Chen, Jianzhong; Keirsey, Jeremy K.; Green, Kari B.; Nichols, Kelly K.

    2017-01-01

    Purpose The purpose of this investigation was to characterize differentially expressed lipids in meibum samples from patients with dry eye disease (DED) in order to better understand the underlying pathologic mechanisms. Methods Meibum samples were collected from postmenopausal women with DED (PW-DED; n = 5) and a control group of postmenopausal women without DED (n = 4). Lipid profiles were analyzed by direct infusion full-scan electrospray ionization mass spectrometry (ESI-MS). An initial analysis of 145 representative peaks from four classes of lipids in PW-DED samples revealed that additional manual corrections for peak overlap and isotopes only slightly affected the statistical analysis. Therefore, analysis of uncorrected data, which can be applied to a greater number of peaks, was used to compare more than 500 lipid peaks common to PW-DED and control samples. Statistical analysis of peak intensities identified several lipid species that differed significantly between the two groups. Data from contact lens wearers with DED (CL-DED; n = 5) were also analyzed. Results Many species of the two types of diesters (DE) and very long chain wax esters (WE) were decreased by ∼20% in PW-DED, whereas levels of triacylglycerols were increased by an average of 39% ± 3% in meibum from PW-DED compared to that in the control group. Approximately the same reduction (20%) of similar DE and WE was observed for CL-DED. Conclusions Statistical analysis of peak intensities from direct infusion ESI-MS results identified differentially expressed lipids in meibum from dry eye patients. Further studies are warranted to support these findings. PMID:28426869

  4. Geostatistics - a tool applied to the distribution of Legionella pneumophila in a hospital water system.

    PubMed

    Laganà, Pasqualina; Moscato, Umberto; Poscia, Andrea; La Milia, Daniele Ignazio; Boccia, Stefania; Avventuroso, Emanuela; Delia, Santi

    2015-01-01

    Legionnaires' disease is normally acquired by inhalation of legionellae from a contaminated environmental source. Water systems of large buildings, such as hospitals, are often contaminated with legionellae and therefore represent a potential risk for the hospital population. The aim of this study was to evaluate the potential contamination of Legionella pneumophila (LP) in a large hospital in Italy through georeferential statistical analysis to assess the possible sources of dispersion and, consequently, the risk of exposure for both health care staff and patients. LP serogroups 1 and 2-14 distribution was considered in the wards housed on two consecutive floors of the hospital building. On the basis of information provided by 53 bacteriological analysis, a 'random' grid of points was chosen and spatial geostatistics or FAIk Kriging was applied and compared with the results of classical statistical analysis. Over 50% of the examined samples were positive for Legionella pneumophila. LP 1 was isolated in 69% of samples from the ground floor and in 60% of sample from the first floor; LP 2-14 in 36% of sample from the ground floor and 24% from the first. The iso-estimation maps show clearly the most contaminated pipe and the difference in the diffusion of the different L. pneumophila serogroups. Experimental work has demonstrated that geostatistical methods applied to the microbiological analysis of water matrices allows a better modeling of the phenomenon under study, a greater potential for risk management and a greater choice of methods of prevention and environmental recovery to be put in place with respect to the classical statistical analysis.

  5. A robust and efficient statistical method for genetic association studies using case and control samples from multiple cohorts

    PubMed Central

    2013-01-01

    Background The theoretical basis of genome-wide association studies (GWAS) is statistical inference of linkage disequilibrium (LD) between any polymorphic marker and a putative disease locus. Most methods widely implemented for such analyses are vulnerable to several key demographic factors and deliver a poor statistical power for detecting genuine associations and also a high false positive rate. Here, we present a likelihood-based statistical approach that accounts properly for non-random nature of case–control samples in regard of genotypic distribution at the loci in populations under study and confers flexibility to test for genetic association in presence of different confounding factors such as population structure, non-randomness of samples etc. Results We implemented this novel method together with several popular methods in the literature of GWAS, to re-analyze recently published Parkinson’s disease (PD) case–control samples. The real data analysis and computer simulation show that the new method confers not only significantly improved statistical power for detecting the associations but also robustness to the difficulties stemmed from non-randomly sampling and genetic structures when compared to its rivals. In particular, the new method detected 44 significant SNPs within 25 chromosomal regions of size < 1 Mb but only 6 SNPs in two of these regions were previously detected by the trend test based methods. It discovered two SNPs located 1.18 Mb and 0.18 Mb from the PD candidates, FGF20 and PARK8, without invoking false positive risk. Conclusions We developed a novel likelihood-based method which provides adequate estimation of LD and other population model parameters by using case and control samples, the ease in integration of these samples from multiple genetically divergent populations and thus confers statistically robust and powerful analyses of GWAS. On basis of simulation studies and analysis of real datasets, we demonstrated significant improvement of the new method over the non-parametric trend test, which is the most popularly implemented in the literature of GWAS. PMID:23394771

  6. Interpretation of correlations in clinical research.

    PubMed

    Hung, Man; Bounsanga, Jerry; Voss, Maren Wright

    2017-11-01

    Critically analyzing research is a key skill in evidence-based practice and requires knowledge of research methods, results interpretation, and applications, all of which rely on a foundation based in statistics. Evidence-based practice makes high demands on trained medical professionals to interpret an ever-expanding array of research evidence. As clinical training emphasizes medical care rather than statistics, it is useful to review the basics of statistical methods and what they mean for interpreting clinical studies. We reviewed the basic concepts of correlational associations, violations of normality, unobserved variable bias, sample size, and alpha inflation. The foundations of causal inference were discussed and sound statistical analyses were examined. We discuss four ways in which correlational analysis is misused, including causal inference overreach, over-reliance on significance, alpha inflation, and sample size bias. Recent published studies in the medical field provide evidence of causal assertion overreach drawn from correlational findings. The findings present a primer on the assumptions and nature of correlational methods of analysis and urge clinicians to exercise appropriate caution as they critically analyze the evidence before them and evaluate evidence that supports practice. Critically analyzing new evidence requires statistical knowledge in addition to clinical knowledge. Studies can overstate relationships, expressing causal assertions when only correlational evidence is available. Failure to account for the effect of sample size in the analyses tends to overstate the importance of predictive variables. It is important not to overemphasize the statistical significance without consideration of effect size and whether differences could be considered clinically meaningful.

  7. Anomaly detection for machine learning redshifts applied to SDSS galaxies

    NASA Astrophysics Data System (ADS)

    Hoyle, Ben; Rau, Markus Michael; Paech, Kerstin; Bonnett, Christopher; Seitz, Stella; Weller, Jochen

    2015-10-01

    We present an analysis of anomaly detection for machine learning redshift estimation. Anomaly detection allows the removal of poor training examples, which can adversely influence redshift estimates. Anomalous training examples may be photometric galaxies with incorrect spectroscopic redshifts, or galaxies with one or more poorly measured photometric quantity. We select 2.5 million `clean' SDSS DR12 galaxies with reliable spectroscopic redshifts, and 6730 `anomalous' galaxies with spectroscopic redshift measurements which are flagged as unreliable. We contaminate the clean base galaxy sample with galaxies with unreliable redshifts and attempt to recover the contaminating galaxies using the Elliptical Envelope technique. We then train four machine learning architectures for redshift analysis on both the contaminated sample and on the preprocessed `anomaly-removed' sample and measure redshift statistics on a clean validation sample generated without any preprocessing. We find an improvement on all measured statistics of up to 80 per cent when training on the anomaly removed sample as compared with training on the contaminated sample for each of the machine learning routines explored. We further describe a method to estimate the contamination fraction of a base data sample.

  8. A random-sum Wilcoxon statistic and its application to analysis of ROC and LROC data.

    PubMed

    Tang, Liansheng Larry; Balakrishnan, N

    2011-01-01

    The Wilcoxon-Mann-Whitney statistic is commonly used for a distribution-free comparison of two groups. One requirement for its use is that the sample sizes of the two groups are fixed. This is violated in some of the applications such as medical imaging studies and diagnostic marker studies; in the former, the violation occurs since the number of correctly localized abnormal images is random, while in the latter the violation is due to some subjects not having observable measurements. For this reason, we propose here a random-sum Wilcoxon statistic for comparing two groups in the presence of ties, and derive its variance as well as its asymptotic distribution for large sample sizes. The proposed statistic includes the regular Wilcoxon rank-sum statistic. Finally, we apply the proposed statistic for summarizing location response operating characteristic data from a liver computed tomography study, and also for summarizing diagnostic accuracy of biomarker data.

  9. Rapid analysis of pharmaceutical drugs using LIBS coupled with multivariate analysis.

    PubMed

    Tiwari, P K; Awasthi, S; Kumar, R; Anand, R K; Rai, P K; Rai, A K

    2018-02-01

    Type 2 diabetes drug tablets containing voglibose having dose strengths of 0.2 and 0.3 mg of various brands have been examined, using laser-induced breakdown spectroscopy (LIBS) technique. The statistical methods such as the principal component analysis (PCA) and the partial least square regression analysis (PLSR) have been employed on LIBS spectral data for classifying and developing the calibration models of drug samples. We have developed the ratio-based calibration model applying PLSR in which relative spectral intensity ratios H/C, H/N and O/N are used. Further, the developed model has been employed to predict the relative concentration of element in unknown drug samples. The experiment has been performed in air and argon atmosphere, respectively, and the obtained results have been compared. The present model provides rapid spectroscopic method for drug analysis with high statistical significance for online control and measurement process in a wide variety of pharmaceutical industrial applications.

  10. [Evaluation of using statistical methods in selected national medical journals].

    PubMed

    Sych, Z

    1996-01-01

    The paper covers the performed evaluation of frequency with which the statistical methods were applied in analyzed works having been published in six selected, national medical journals in the years 1988-1992. For analysis the following journals were chosen, namely: Klinika Oczna, Medycyna Pracy, Pediatria Polska, Polski Tygodnik Lekarski, Roczniki Państwowego Zakładu Higieny, Zdrowie Publiczne. Appropriate number of works up to the average in the remaining medical journals was randomly selected from respective volumes of Pol. Tyg. Lek. The studies did not include works wherein the statistical analysis was not implemented, which referred both to national and international publications. That exemption was also extended to review papers, casuistic ones, reviews of books, handbooks, monographies, reports from scientific congresses, as well as papers on historical topics. The number of works was defined in each volume. Next, analysis was performed to establish the mode of finding out a suitable sample in respective studies, differentiating two categories: random and target selections. Attention was also paid to the presence of control sample in the individual works. In the analysis attention was also focussed on the existence of sample characteristics, setting up three categories: complete, partial and lacking. In evaluating the analyzed works an effort was made to present the results of studies in tables and figures (Tab. 1, 3). Analysis was accomplished with regard to the rate of employing statistical methods in analyzed works in relevant volumes of six selected, national medical journals for the years 1988-1992, simultaneously determining the number of works, in which no statistical methods were used. Concurrently the frequency of applying the individual statistical methods was analyzed in the scrutinized works. Prominence was given to fundamental statistical methods in the field of descriptive statistics (measures of position, measures of dispersion) as well as most important methods of mathematical statistics such as parametric tests of significance, analysis of variance (in single and dual classifications). non-parametric tests of significance, correlation and regression. The works, in which use was made of either multiple correlation or multiple regression or else more complex methods of studying the relationship for two or more numbers of variables, were incorporated into the works whose statistical methods were constituted by correlation and regression as well as other methods, e.g. statistical methods being used in epidemiology (coefficients of incidence and morbidity, standardization of coefficients, survival tables) factor analysis conducted by Jacobi-Hotellng's method, taxonomic methods and others. On the basis of the performed studies it has been established that the frequency of employing statistical methods in the six selected national, medical journals in the years 1988-1992 was 61.1-66.0% of the analyzed works (Tab. 3), and they generally were almost similar to the frequency provided in English language medical journals. On a whole, no significant differences were disclosed in the frequency of applied statistical methods (Tab. 4) as well as in frequency of random tests (Tab. 3) in the analyzed works, appearing in the medical journals in respective years 1988-1992. The most frequently used statistical methods in analyzed works for 1988-1992 were the measures of position 44.2-55.6% and measures of dispersion 32.5-38.5% as well as parametric tests of significance 26.3-33.1% of the works analyzed (Tab. 4). For the purpose of increasing the frequency and reliability of the used statistical methods, the didactics should be widened in the field of biostatistics at medical studies and postgraduation training designed for physicians and scientific-didactic workers.

  11. An Automated System for Chromosome Analysis

    NASA Technical Reports Server (NTRS)

    Castleman, K. R.; Melnyk, J. H.

    1976-01-01

    The design, construction, and testing of a complete system to produce karyotypes and chromosome measurement data from human blood samples, and to provide a basis for statistical analysis of quantitative chromosome measurement data are described.

  12. Spatial Autocorrelation Approaches to Testing Residuals from Least Squares Regression

    PubMed Central

    Chen, Yanguang

    2016-01-01

    In geo-statistics, the Durbin-Watson test is frequently employed to detect the presence of residual serial correlation from least squares regression analyses. However, the Durbin-Watson statistic is only suitable for ordered time or spatial series. If the variables comprise cross-sectional data coming from spatial random sampling, the test will be ineffectual because the value of Durbin-Watson’s statistic depends on the sequence of data points. This paper develops two new statistics for testing serial correlation of residuals from least squares regression based on spatial samples. By analogy with the new form of Moran’s index, an autocorrelation coefficient is defined with a standardized residual vector and a normalized spatial weight matrix. Then by analogy with the Durbin-Watson statistic, two types of new serial correlation indices are constructed. As a case study, the two newly presented statistics are applied to a spatial sample of 29 China’s regions. These results show that the new spatial autocorrelation models can be used to test the serial correlation of residuals from regression analysis. In practice, the new statistics can make up for the deficiencies of the Durbin-Watson test. PMID:26800271

  13. Current State and Development Trends of Education Policy Research in China in the Last Decade (2004-2013): A Statistical Analysis of Papers from Eight Core Chinese Journals

    ERIC Educational Resources Information Center

    Ling, Guo

    2017-01-01

    The author conducted sampling and statistical analysis of papers on education policy research collected by the China National Knowledge Infrastructure in the period from the years 2004--2013. Under the current state of education policy research in China, the number of papers correlates positively with the year; the papers are concentrated in…

  14. Methodological quality of behavioural weight loss studies: a systematic review

    PubMed Central

    Lemon, S. C.; Wang, M. L.; Haughton, C. F.; Estabrook, D. P.; Frisard, C. F.; Pagoto, S. L.

    2018-01-01

    Summary This systematic review assessed the methodological quality of behavioural weight loss intervention studies conducted among adults and associations between quality and statistically significant weight loss outcome, strength of intervention effectiveness and sample size. Searches for trials published between January, 2009 and December, 2014 were conducted using PUBMED, MEDLINE and PSYCINFO and identified ninety studies. Methodological quality indicators included study design, anthropometric measurement approach, sample size calculations, intent-to-treat (ITT) analysis, loss to follow-up rate, missing data strategy, sampling strategy, report of treatment receipt and report of intervention fidelity (mean = 6.3). Indicators most commonly utilized included randomized design (100%), objectively measured anthropometrics (96.7%), ITT analysis (86.7%) and reporting treatment adherence (76.7%). Most studies (62.2%) had a follow-up rate >75% and reported a loss to follow-up analytic strategy or minimal missing data (69.9%). Describing intervention fidelity (34.4%) and sampling from a known population (41.1%) were least common. Methodological quality was not associated with reporting a statistically significant result, effect size or sample size. This review found the published literature of behavioural weight loss trials to be of high quality for specific indicators, including study design and measurement. Identified for improvement include utilization of more rigorous statistical approaches to loss to follow up and better fidelity reporting. PMID:27071775

  15. ASURV: Astronomical SURVival Statistics

    NASA Astrophysics Data System (ADS)

    Feigelson, E. D.; Nelson, P. I.; Isobe, T.; LaValley, M.

    2014-06-01

    ASURV (Astronomical SURVival Statistics) provides astronomy survival analysis for right- and left-censored data including the maximum-likelihood Kaplan-Meier estimator and several univariate two-sample tests, bivariate correlation measures, and linear regressions. ASURV is written in FORTRAN 77, and is stand-alone and does not call any specialized libraries.

  16. "Magnitude-based inference": a statistical review.

    PubMed

    Welsh, Alan H; Knight, Emma J

    2015-04-01

    We consider "magnitude-based inference" and its interpretation by examining in detail its use in the problem of comparing two means. We extract from the spreadsheets, which are provided to users of the analysis (http://www.sportsci.org/), a precise description of how "magnitude-based inference" is implemented. We compare the implemented version of the method with general descriptions of it and interpret the method in familiar statistical terms. We show that "magnitude-based inference" is not a progressive improvement on modern statistics. The additional probabilities introduced are not directly related to the confidence interval but, rather, are interpretable either as P values for two different nonstandard tests (for different null hypotheses) or as approximate Bayesian calculations, which also lead to a type of test. We also discuss sample size calculations associated with "magnitude-based inference" and show that the substantial reduction in sample sizes claimed for the method (30% of the sample size obtained from standard frequentist calculations) is not justifiable so the sample size calculations should not be used. Rather than using "magnitude-based inference," a better solution is to be realistic about the limitations of the data and use either confidence intervals or a fully Bayesian analysis.

  17. Evaluation of Solid Rocket Motor Component Data Using a Commercially Available Statistical Software Package

    NASA Technical Reports Server (NTRS)

    Stefanski, Philip L.

    2015-01-01

    Commercially available software packages today allow users to quickly perform the routine evaluations of (1) descriptive statistics to numerically and graphically summarize both sample and population data, (2) inferential statistics that draws conclusions about a given population from samples taken of it, (3) probability determinations that can be used to generate estimates of reliability allowables, and finally (4) the setup of designed experiments and analysis of their data to identify significant material and process characteristics for application in both product manufacturing and performance enhancement. This paper presents examples of analysis and experimental design work that has been conducted using Statgraphics®(Registered Trademark) statistical software to obtain useful information with regard to solid rocket motor propellants and internal insulation material. Data were obtained from a number of programs (Shuttle, Constellation, and Space Launch System) and sources that include solid propellant burn rate strands, tensile specimens, sub-scale test motors, full-scale operational motors, rubber insulation specimens, and sub-scale rubber insulation analog samples. Besides facilitating the experimental design process to yield meaningful results, statistical software has demonstrated its ability to quickly perform complex data analyses and yield significant findings that might otherwise have gone unnoticed. One caveat to these successes is that useful results not only derive from the inherent power of the software package, but also from the skill and understanding of the data analyst.

  18. Searching for molecular markers in head and neck squamous cell carcinomas (HNSCC) by statistical and bioinformatic analysis of larynx-derived SAGE libraries

    PubMed Central

    Silveira, Nelson JF; Varuzza, Leonardo; Machado-Lima, Ariane; Lauretto, Marcelo S; Pinheiro, Daniel G; Rodrigues, Rodrigo V; Severino, Patrícia; Nobrega, Francisco G; Silva, Wilson A; de B Pereira, Carlos A; Tajara, Eloiza H

    2008-01-01

    Background Head and neck squamous cell carcinoma (HNSCC) is one of the most common malignancies in humans. The average 5-year survival rate is one of the lowest among aggressive cancers, showing no significant improvement in recent years. When detected early, HNSCC has a good prognosis, but most patients present metastatic disease at the time of diagnosis, which significantly reduces survival rate. Despite extensive research, no molecular markers are currently available for diagnostic or prognostic purposes. Methods Aiming to identify differentially-expressed genes involved in laryngeal squamous cell carcinoma (LSCC) development and progression, we generated individual Serial Analysis of Gene Expression (SAGE) libraries from a metastatic and non-metastatic larynx carcinoma, as well as from a normal larynx mucosa sample. Approximately 54,000 unique tags were sequenced in three libraries. Results Statistical data analysis identified a subset of 1,216 differentially expressed tags between tumor and normal libraries, and 894 differentially expressed tags between metastatic and non-metastatic carcinomas. Three genes displaying differential regulation, one down-regulated (KRT31) and two up-regulated (BST2, MFAP2), as well as one with a non-significant differential expression pattern (GNA15) in our SAGE data were selected for real-time polymerase chain reaction (PCR) in a set of HNSCC samples. Consistent with our statistical analysis, quantitative PCR confirmed the upregulation of BST2 and MFAP2 and the downregulation of KRT31 when samples of HNSCC were compared to tumor-free surgical margins. As expected, GNA15 presented a non-significant differential expression pattern when tumor samples were compared to normal tissues. Conclusion To the best of our knowledge, this is the first study reporting SAGE data in head and neck squamous cell tumors. Statistical analysis was effective in identifying differentially expressed genes reportedly involved in cancer development. The differential expression of a subset of genes was confirmed in additional larynx carcinoma samples and in carcinomas from a distinct head and neck subsite. This result suggests the existence of potential common biomarkers for prognosis and targeted-therapy development in this heterogeneous type of tumor. PMID:19014460

  19. Visual Sample Plan Version 7.0 User's Guide

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Matzke, Brett D.; Newburn, Lisa LN; Hathaway, John E.

    2014-03-01

    User's guide for VSP 7.0 This user's guide describes Visual Sample Plan (VSP) Version 7.0 and provides instructions for using the software. VSP selects the appropriate number and location of environmental samples to ensure that the results of statistical tests performed to provide input to risk decisions have the required confidence and performance. VSP Version 7.0 provides sample-size equations or algorithms needed by specific statistical tests appropriate for specific environmental sampling objectives. It also provides data quality assessment and statistical analysis functions to support evaluation of the data and determine whether the data support decisions regarding sites suspected of contamination.more » The easy-to-use program is highly visual and graphic. VSP runs on personal computers with Microsoft Windows operating systems (XP, Vista, Windows 7, and Windows 8). Designed primarily for project managers and users without expertise in statistics, VSP is applicable to two- and three-dimensional populations to be sampled (e.g., rooms and buildings, surface soil, a defined layer of subsurface soil, water bodies, and other similar applications) for studies of environmental quality. VSP is also applicable for designing sampling plans for assessing chem/rad/bio threat and hazard identification within rooms and buildings, and for designing geophysical surveys for unexploded ordnance (UXO) identification.« less

  20. Preparing for the first meeting with a statistician.

    PubMed

    De Muth, James E

    2008-12-15

    Practical statistical issues that should be considered when performing data collection and analysis are reviewed. The meeting with a statistician should take place early in the research development before any study data are collected. The process of statistical analysis involves establishing the research question, formulating a hypothesis, selecting an appropriate test, sampling correctly, collecting data, performing tests, and making decisions. Once the objectives are established, the researcher can determine the characteristics or demographics of the individuals required for the study, how to recruit volunteers, what type of data are needed to answer the research question(s), and the best methods for collecting the required information. There are two general types of statistics: descriptive and inferential. Presenting data in a more palatable format for the reader is called descriptive statistics. Inferential statistics involve making an inference or decision about a population based on results obtained from a sample of that population. In order for the results of a statistical test to be valid, the sample should be representative of the population from which it is drawn. When collecting information about volunteers, researchers should only collect information that is directly related to the study objectives. Important information that a statistician will require first is an understanding of the type of variables involved in the study and which variables can be controlled by researchers and which are beyond their control. Data can be presented in one of four different measurement scales: nominal, ordinal, interval, or ratio. Hypothesis testing involves two mutually exclusive and exhaustive statements related to the research question. Statisticians should not be replaced by computer software, and they should be consulted before any research data are collected. When preparing to meet with a statistician, the pharmacist researcher should be familiar with the steps of statistical analysis and consider several questions related to the study to be conducted.

  1. Monitoring Method of Cow Anthrax Based on Gis and Spatial Statistical Analysis

    NASA Astrophysics Data System (ADS)

    Li, Lin; Yang, Yong; Wang, Hongbin; Dong, Jing; Zhao, Yujun; He, Jianbin; Fan, Honggang

    Geographic information system (GIS) is a computer application system, which possesses the ability of manipulating spatial information and has been used in many fields related with the spatial information management. Many methods and models have been established for analyzing animal diseases distribution models and temporal-spatial transmission models. Great benefits have been gained from the application of GIS in animal disease epidemiology. GIS is now a very important tool in animal disease epidemiological research. Spatial analysis function of GIS can be widened and strengthened by using spatial statistical analysis, allowing for the deeper exploration, analysis, manipulation and interpretation of spatial pattern and spatial correlation of the animal disease. In this paper, we analyzed the cow anthrax spatial distribution characteristics in the target district A (due to the secret of epidemic data we call it district A) based on the established GIS of the cow anthrax in this district in combination of spatial statistical analysis and GIS. The Cow anthrax is biogeochemical disease, and its geographical distribution is related closely to the environmental factors of habitats and has some spatial characteristics, and therefore the correct analysis of the spatial distribution of anthrax cow for monitoring and the prevention and control of anthrax has a very important role. However, the application of classic statistical methods in some areas is very difficult because of the pastoral nomadic context. The high mobility of livestock and the lack of enough suitable sampling for the some of the difficulties in monitoring currently make it nearly impossible to apply rigorous random sampling methods. It is thus necessary to develop an alternative sampling method, which could overcome the lack of sampling and meet the requirements for randomness. The GIS computer application software ArcGIS9.1 was used to overcome the lack of data of sampling sites.Using ArcGIS 9.1 and GEODA to analyze the cow anthrax spatial distribution of district A. we gained some conclusions about cow anthrax' density: (1) there is a spatial clustering model. (2) there is an intensely spatial autocorrelation. We established a prediction model to estimate the anthrax distribution based on the spatial characteristic of the density of cow anthrax. Comparing with the true distribution, the prediction model has a well coincidence and is feasible to the application. The method using a GIS tool facilitates can be implemented significantly in the cow anthrax monitoring and investigation, and the space statistics - related prediction model provides a fundamental use for other study on space-related animal diseases.

  2. US Geological Survey nutrient preservation experiment : experimental design, statistical analysis, and interpretation of analytical results

    USGS Publications Warehouse

    Patton, Charles J.; Gilroy, Edward J.

    1999-01-01

    Data on which this report is based, including nutrient concentrations in synthetic reference samples determined concurrently with those in real samples, are extensive (greater than 20,000 determinations) and have been published separately. In addition to confirming the well-documented instability of nitrite in acidified samples, this study also demonstrates that when biota are removed from samples at collection sites by 0.45-micrometer membrane filtration, subsequent preservation with sulfuric acid or mercury (II) provides no statistically significant improvement in nutrient concentration stability during storage at 4 degrees Celsius for 30 days. Biocide preservation had no statistically significant effect on the 30-day stability of phosphorus concentrations in whole-water splits from any of the 15 stations, but did stabilize Kjeldahl nitrogen concentrations in whole-water splits from three data-collection stations where ammonium accounted for at least half of the measured Kjeldahl nitrogen.

  3. Tools for Basic Statistical Analysis

    NASA Technical Reports Server (NTRS)

    Luz, Paul L.

    2005-01-01

    Statistical Analysis Toolset is a collection of eight Microsoft Excel spreadsheet programs, each of which performs calculations pertaining to an aspect of statistical analysis. These programs present input and output data in user-friendly, menu-driven formats, with automatic execution. The following types of calculations are performed: Descriptive statistics are computed for a set of data x(i) (i = 1, 2, 3 . . . ) entered by the user. Normal Distribution Estimates will calculate the statistical value that corresponds to cumulative probability values, given a sample mean and standard deviation of the normal distribution. Normal Distribution from two Data Points will extend and generate a cumulative normal distribution for the user, given two data points and their associated probability values. Two programs perform two-way analysis of variance (ANOVA) with no replication or generalized ANOVA for two factors with four levels and three repetitions. Linear Regression-ANOVA will curvefit data to the linear equation y=f(x) and will do an ANOVA to check its significance.

  4. The Effect of Unequal Samples, Heterogeneity of Covariance Matrices, and Number of Variables on Discriminant Analysis Classification Tables and Related Statistics.

    ERIC Educational Resources Information Center

    Spearing, Debra; Woehlke, Paula

    To assess the effect on discriminant analysis in terms of correct classification into two groups, the following parameters were systematically altered using Monte Carlo techniques: sample sizes; proportions of one group to the other; number of independent variables; and covariance matrices. The pairing of the off diagonals (or covariances) with…

  5. Chance-corrected classification for use in discriminant analysis: Ecological applications

    USGS Publications Warehouse

    Titus, K.; Mosher, J.A.; Williams, B.K.

    1984-01-01

    A method for evaluating the classification table from a discriminant analysis is described. The statistic, kappa, is useful to ecologists in that it removes the effects of chance. It is useful even with equal group sample sizes although the need for a chance-corrected measure of prediction becomes greater with more dissimilar group sample sizes. Examples are presented.

  6. Helioseismology of pre-emerging active regions. III. Statistical analysis

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Barnes, G.; Leka, K. D.; Braun, D. C.

    The subsurface properties of active regions (ARs) prior to their appearance at the solar surface may shed light on the process of AR formation. Helioseismic holography has been applied to samples taken from two populations of regions on the Sun (pre-emergence and without emergence), each sample having over 100 members, that were selected to minimize systematic bias, as described in Paper I. Paper II showed that there are statistically significant signatures in the average helioseismic properties that precede the formation of an AR. This paper describes a more detailed analysis of the samples of pre-emergence regions and regions without emergencemore » based on discriminant analysis. The property that is best able to distinguish the populations is found to be the surface magnetic field, even a day before the emergence time. However, after accounting for the correlations between the surface field and the quantities derived from helioseismology, there is still evidence of a helioseismic precursor to AR emergence that is present for at least a day prior to emergence, although the analysis presented cannot definitively determine the subsurface properties prior to emergence due to the small sample sizes.« less

  7. Statistical differences between relative quantitative molecular fingerprints from microbial communities.

    PubMed

    Portillo, M C; Gonzalez, J M

    2008-08-01

    Molecular fingerprints of microbial communities are a common method for the analysis and comparison of environmental samples. The significance of differences between microbial community fingerprints was analyzed considering the presence of different phylotypes and their relative abundance. A method is proposed by simulating coverage of the analyzed communities as a function of sampling size applying a Cramér-von Mises statistic. Comparisons were performed by a Monte Carlo testing procedure. As an example, this procedure was used to compare several sediment samples from freshwater ponds using a relative quantitative PCR-DGGE profiling technique. The method was able to discriminate among different samples based on their molecular fingerprints, and confirmed the lack of differences between aliquots from a single sample.

  8. Monte Carlo Simulation for Perusal and Practice.

    ERIC Educational Resources Information Center

    Brooks, Gordon P.; Barcikowski, Robert S.; Robey, Randall R.

    The meaningful investigation of many problems in statistics can be solved through Monte Carlo methods. Monte Carlo studies can help solve problems that are mathematically intractable through the analysis of random samples from populations whose characteristics are known to the researcher. Using Monte Carlo simulation, the values of a statistic are…

  9. STATISTICAL TECHNIQUES FOR DETERMINATION AND PREDICTION OF FUNDAMENTAL FISH ASSEMBLAGES OF THE MID-ATLANTIC HIGHLANDS

    EPA Science Inventory

    A statistical software tool, Stream Fish Community Predictor (SFCP), based on EMAP stream sampling in the mid-Atlantic Highlands, was developed to predict stream fish communities using stream and watershed characteristics. Step one in the tool development was a cluster analysis t...

  10. Influence of flavor solvent on flavor release and perception in sugar-free chewing gum.

    PubMed

    Potineni, Rajesh V; Peterson, Devin G

    2008-05-14

    The influence of flavor solvent [triacetin (TA), propylene glycol (PG), medium chained triglycerides (MCT), or no flavor solvent (NFS)] on the flavor release profile, the textural properties, and the sensory perception of a sugar-free chewing gum was investigated. Time course analysis of the exhaled breath and saliva during chewing gum mastication indicated that flavor solvent addition or type did not influence the aroma release profile; however, the sorbitol release rate was statistically lower for the TA formulated sample in comparison to those with PG, MCT, or NFS. Sensory time-intensity analysis also indicated that the TA formulated sample was statistically lower in perceived sweetness intensity, in comparison with the other chewing gum samples, and also had lower cinnamon-like aroma intensity, presumably due to an interaction between sweetness intensity on aroma perception. Measurement of the chewing gum macroscopic texture by compression analysis during consumption was not correlated to the unique flavor release properties of the TA-chewing gum. However, a relationship between gum base plasticity and retention of sugar alcohol during mastication was proposed to explain the different flavor properties of the TA sample.

  11. Singular spectrum analysis in nonlinear dynamics, with applications to paleoclimatic time series

    NASA Technical Reports Server (NTRS)

    Vautard, R.; Ghil, M.

    1989-01-01

    Two dimensions of a dynamical system given by experimental time series are distinguished. Statistical dimension gives a theoretical upper bound for the minimal number of degrees of freedom required to describe the attractor up to the accuracy of the data, taking into account sampling and noise problems. The dynamical dimension is the intrinsic dimension of the attractor and does not depend on the quality of the data. Singular Spectrum Analysis (SSA) provides estimates of the statistical dimension. SSA also describes the main physical phenomena reflected by the data. It gives adaptive spectral filters associated with the dominant oscillations of the system and clarifies the noise characteristics of the data. SSA is applied to four paleoclimatic records. The principal climatic oscillations and the regime changes in their amplitude are detected. About 10 degrees of freedom are statistically significant in the data. Large noise and insufficient sample length do not allow reliable estimates of the dynamical dimension.

  12. A Comparison of Analytical and Data Preprocessing Methods for Spectral Fingerprinting

    PubMed Central

    LUTHRIA, DEVANAND L.; MUKHOPADHYAY, SUDARSAN; LIN, LONG-ZE; HARNLY, JAMES M.

    2013-01-01

    Spectral fingerprinting, as a method of discriminating between plant cultivars and growing treatments for a common set of broccoli samples, was compared for six analytical instruments. Spectra were acquired for finely powdered solid samples using Fourier transform infrared (FT-IR) and Fourier transform near-infrared (NIR) spectrometry. Spectra were also acquired for unfractionated aqueous methanol extracts of the powders using molecular absorption in the ultraviolet (UV) and visible (VIS) regions and mass spectrometry with negative (MS−) and positive (MS+) ionization. The spectra were analyzed using nested one-way analysis of variance (ANOVA) and principal component analysis (PCA) to statistically evaluate the quality of discrimination. All six methods showed statistically significant differences between the cultivars and treatments. The significance of the statistical tests was improved by the judicious selection of spectral regions (IR and NIR), masses (MS+ and MS−), and derivatives (IR, NIR, UV, and VIS). PMID:21352644

  13. Laser Velocimeter Measurements and Analysis in Turbulent Flows with Combustion. Part 2.

    DTIC Science & Technology

    1983-07-01

    sampling error for 63 this sample size. Mean velocities and turbulence intensi- ties were found to be statistically accurate to ± 1 % and 13%, respectively...Although the statist - ical error was found to be rather small (± 1 % for mean velo- cities and 13% for turbulence intensities), there can be additional...34Computational and Experimental Study of a Captive Annular Eddy," Journal of Fluid Mechanics, Vol. 28, pt. 1 , pp. 43-63, 12 April, 1967. 152 REFERENCES (con’d

  14. GLIMMPSE Lite: Calculating Power and Sample Size on Smartphone Devices

    PubMed Central

    Munjal, Aarti; Sakhadeo, Uttara R.; Muller, Keith E.; Glueck, Deborah H.; Kreidler, Sarah M.

    2014-01-01

    Researchers seeking to develop complex statistical applications for mobile devices face a common set of difficult implementation issues. In this work, we discuss general solutions to the design challenges. We demonstrate the utility of the solutions for a free mobile application designed to provide power and sample size calculations for univariate, one-way analysis of variance (ANOVA), GLIMMPSE Lite. Our design decisions provide a guide for other scientists seeking to produce statistical software for mobile platforms. PMID:25541688

  15. In vivo Comet assay--statistical analysis and power calculations of mice testicular cells.

    PubMed

    Hansen, Merete Kjær; Sharma, Anoop Kumar; Dybdahl, Marianne; Boberg, Julie; Kulahci, Murat

    2014-11-01

    The in vivo Comet assay is a sensitive method for evaluating DNA damage. A recurrent concern is how to analyze the data appropriately and efficiently. A popular approach is to summarize the raw data into a summary statistic prior to the statistical analysis. However, consensus on which summary statistic to use has yet to be reached. Another important consideration concerns the assessment of proper sample sizes in the design of Comet assay studies. This study aims to identify a statistic suitably summarizing the % tail DNA of mice testicular samples in Comet assay studies. A second aim is to provide curves for this statistic outlining the number of animals and gels to use. The current study was based on 11 compounds administered via oral gavage in three doses to male mice: CAS no. 110-26-9, CAS no. 512-56-1, CAS no. 111873-33-7, CAS no. 79-94-7, CAS no. 115-96-8, CAS no. 598-55-0, CAS no. 636-97-5, CAS no. 85-28-9, CAS no. 13674-87-8, CAS no. 43100-38-5 and CAS no. 60965-26-6. Testicular cells were examined using the alkaline version of the Comet assay and the DNA damage was quantified as % tail DNA using a fully automatic scoring system. From the raw data 23 summary statistics were examined. A linear mixed-effects model was fitted to the summarized data and the estimated variance components were used to generate power curves as a function of sample size. The statistic that most appropriately summarized the within-sample distributions was the median of the log-transformed data, as it most consistently conformed to the assumptions of the statistical model. Power curves for 1.5-, 2-, and 2.5-fold changes of the highest dose group compared to the control group when 50 and 100 cells were scored per gel are provided to aid in the design of future Comet assay studies on testicular cells. Copyright © 2014 Elsevier B.V. All rights reserved.

  16. Fast and accurate imputation of summary statistics enhances evidence of functional enrichment

    PubMed Central

    Pasaniuc, Bogdan; Zaitlen, Noah; Shi, Huwenbo; Bhatia, Gaurav; Gusev, Alexander; Pickrell, Joseph; Hirschhorn, Joel; Strachan, David P.; Patterson, Nick; Price, Alkes L.

    2014-01-01

    Motivation: Imputation using external reference panels (e.g. 1000 Genomes) is a widely used approach for increasing power in genome-wide association studies and meta-analysis. Existing hidden Markov models (HMM)-based imputation approaches require individual-level genotypes. Here, we develop a new method for Gaussian imputation from summary association statistics, a type of data that is becoming widely available. Results: In simulations using 1000 Genomes (1000G) data, this method recovers 84% (54%) of the effective sample size for common (>5%) and low-frequency (1–5%) variants [increasing to 87% (60%) when summary linkage disequilibrium information is available from target samples] versus the gold standard of 89% (67%) for HMM-based imputation, which cannot be applied to summary statistics. Our approach accounts for the limited sample size of the reference panel, a crucial step to eliminate false-positive associations, and it is computationally very fast. As an empirical demonstration, we apply our method to seven case–control phenotypes from the Wellcome Trust Case Control Consortium (WTCCC) data and a study of height in the British 1958 birth cohort (1958BC). Gaussian imputation from summary statistics recovers 95% (105%) of the effective sample size (as quantified by the ratio of χ2 association statistics) compared with HMM-based imputation from individual-level genotypes at the 227 (176) published single nucleotide polymorphisms (SNPs) in the WTCCC (1958BC height) data. In addition, for publicly available summary statistics from large meta-analyses of four lipid traits, we publicly release imputed summary statistics at 1000G SNPs, which could not have been obtained using previously published methods, and demonstrate their accuracy by masking subsets of the data. We show that 1000G imputation using our approach increases the magnitude and statistical evidence of enrichment at genic versus non-genic loci for these traits, as compared with an analysis without 1000G imputation. Thus, imputation of summary statistics will be a valuable tool in future functional enrichment analyses. Availability and implementation: Publicly available software package available at http://bogdan.bioinformatics.ucla.edu/software/. Contact: bpasaniuc@mednet.ucla.edu or aprice@hsph.harvard.edu Supplementary information: Supplementary materials are available at Bioinformatics online. PMID:24990607

  17. Mathematical background and attitudes toward statistics in a sample of Spanish college students.

    PubMed

    Carmona, José; Martínez, Rafael J; Sánchez, Manuel

    2005-08-01

    To examine the relation of mathematical background and initial attitudes toward statistics of Spanish college students in social sciences the Survey of Attitudes Toward Statistics was given to 827 students. Multivariate analyses tested the effects of two indicators of mathematical background (amount of exposure and achievement in previous courses) on the four subscales. Analysis suggested grades in previous courses are more related to initial attitudes toward statistics than the number of mathematics courses taken. Mathematical background was related with students' affective responses to statistics but not with their valuing of statistics. Implications of possible research are discussed.

  18. Ranking metrics in gene set enrichment analysis: do they matter?

    PubMed

    Zyla, Joanna; Marczyk, Michal; Weiner, January; Polanska, Joanna

    2017-05-12

    There exist many methods for describing the complex relation between changes of gene expression in molecular pathways or gene ontologies under different experimental conditions. Among them, Gene Set Enrichment Analysis seems to be one of the most commonly used (over 10,000 citations). An important parameter, which could affect the final result, is the choice of a metric for the ranking of genes. Applying a default ranking metric may lead to poor results. In this work 28 benchmark data sets were used to evaluate the sensitivity and false positive rate of gene set analysis for 16 different ranking metrics including new proposals. Furthermore, the robustness of the chosen methods to sample size was tested. Using k-means clustering algorithm a group of four metrics with the highest performance in terms of overall sensitivity, overall false positive rate and computational load was established i.e. absolute value of Moderated Welch Test statistic, Minimum Significant Difference, absolute value of Signal-To-Noise ratio and Baumgartner-Weiss-Schindler test statistic. In case of false positive rate estimation, all selected ranking metrics were robust with respect to sample size. In case of sensitivity, the absolute value of Moderated Welch Test statistic and absolute value of Signal-To-Noise ratio gave stable results, while Baumgartner-Weiss-Schindler and Minimum Significant Difference showed better results for larger sample size. Finally, the Gene Set Enrichment Analysis method with all tested ranking metrics was parallelised and implemented in MATLAB, and is available at https://github.com/ZAEDPolSl/MrGSEA . Choosing a ranking metric in Gene Set Enrichment Analysis has critical impact on results of pathway enrichment analysis. The absolute value of Moderated Welch Test has the best overall sensitivity and Minimum Significant Difference has the best overall specificity of gene set analysis. When the number of non-normally distributed genes is high, using Baumgartner-Weiss-Schindler test statistic gives better outcomes. Also, it finds more enriched pathways than other tested metrics, which may induce new biological discoveries.

  19. 45 CFR 1356.71 - Federal review of the eligibility of children in foster care and the eligibility of foster care...

    Code of Federal Regulations, 2010 CFR

    2010-10-01

    ... by ACF statistical staff from the Adoption and Foster Care Analysis and Reporting System (AFCARS... primary review utilizing probability sampling methodologies. Usually, the chosen methodology will be simple random sampling, but other probability samples may be utilized, when necessary and appropriate. (3...

  20. Mathematical Ties That Bind.

    ERIC Educational Resources Information Center

    House, Peggy A.

    1994-01-01

    Describes some mathematical investigations of the necktie which includes applications of geometry, statistics, data analysis, sampling, probability, symmetry, proportion, problem solving, and business. (MKR)

  1. Study design and statistical analysis of data in human population studies with the micronucleus assay.

    PubMed

    Ceppi, Marcello; Gallo, Fabio; Bonassi, Stefano

    2011-01-01

    The most common study design performed in population studies based on the micronucleus (MN) assay, is the cross-sectional study, which is largely performed to evaluate the DNA damaging effects of exposure to genotoxic agents in the workplace, in the environment, as well as from diet or lifestyle factors. Sample size is still a critical issue in the design of MN studies since most recent studies considering gene-environment interaction, often require a sample size of several hundred subjects, which is in many cases difficult to achieve. The control of confounding is another major threat to the validity of causal inference. The most popular confounders considered in population studies using MN are age, gender and smoking habit. Extensive attention is given to the assessment of effect modification, given the increasing inclusion of biomarkers of genetic susceptibility in the study design. Selected issues concerning the statistical treatment of data have been addressed in this mini-review, starting from data description, which is a critical step of statistical analysis, since it allows to detect possible errors in the dataset to be analysed and to check the validity of assumptions required for more complex analyses. Basic issues dealing with statistical analysis of biomarkers are extensively evaluated, including methods to explore the dose-response relationship among two continuous variables and inferential analysis. A critical approach to the use of parametric and non-parametric methods is presented, before addressing the issue of most suitable multivariate models to fit MN data. In the last decade, the quality of statistical analysis of MN data has certainly evolved, although even nowadays only a small number of studies apply the Poisson model, which is the most suitable method for the analysis of MN data.

  2. Classification of edible oils by employing 31P and 1H NMR spectroscopy in combination with multivariate statistical analysis. A proposal for the detection of seed oil adulteration in virgin olive oils.

    PubMed

    Vigli, Georgia; Philippidis, Angelos; Spyros, Apostolos; Dais, Photis

    2003-09-10

    A combination of (1)H NMR and (31)P NMR spectroscopy and multivariate statistical analysis was used to classify 192 samples from 13 types of vegetable oils, namely, hazelnut, sunflower, corn, soybean, sesame, walnut, rapeseed, almond, palm, groundnut, safflower, coconut, and virgin olive oils from various regions of Greece. 1,2-Diglycerides, 1,3-diglycerides, the ratio of 1,2-diglycerides to total diglycerides, acidity, iodine value, and fatty acid composition determined upon analysis of the respective (1)H NMR and (31)P NMR spectra were selected as variables to establish a classification/prediction model by employing discriminant analysis. This model, obtained from the training set of 128 samples, resulted in a significant discrimination among the different classes of oils, whereas 100% of correct validated assignments for 64 samples were obtained. Different artificial mixtures of olive-hazelnut, olive-corn, olive-sunflower, and olive-soybean oils were prepared and analyzed by (1)H NMR and (31)P NMR spectroscopy. Subsequent discriminant analysis of the data allowed detection of adulteration as low as 5% w/w, provided that fresh virgin olive oil samples were used, as reflected by their high 1,2-diglycerides to total diglycerides ratio (D > or = 0.90).

  3. Reexamining Sample Size Requirements for Multivariate, Abundance-Based Community Research: When Resources are Limited, the Research Does Not Have to Be.

    PubMed

    Forcino, Frank L; Leighton, Lindsey R; Twerdy, Pamela; Cahill, James F

    2015-01-01

    Community ecologists commonly perform multivariate techniques (e.g., ordination, cluster analysis) to assess patterns and gradients of taxonomic variation. A critical requirement for a meaningful statistical analysis is accurate information on the taxa found within an ecological sample. However, oversampling (too many individuals counted per sample) also comes at a cost, particularly for ecological systems in which identification and quantification is substantially more resource consuming than the field expedition itself. In such systems, an increasingly larger sample size will eventually result in diminishing returns in improving any pattern or gradient revealed by the data, but will also lead to continually increasing costs. Here, we examine 396 datasets: 44 previously published and 352 created datasets. Using meta-analytic and simulation-based approaches, the research within the present paper seeks (1) to determine minimal sample sizes required to produce robust multivariate statistical results when conducting abundance-based, community ecology research. Furthermore, we seek (2) to determine the dataset parameters (i.e., evenness, number of taxa, number of samples) that require larger sample sizes, regardless of resource availability. We found that in the 44 previously published and the 220 created datasets with randomly chosen abundances, a conservative estimate of a sample size of 58 produced the same multivariate results as all larger sample sizes. However, this minimal number varies as a function of evenness, where increased evenness resulted in increased minimal sample sizes. Sample sizes as small as 58 individuals are sufficient for a broad range of multivariate abundance-based research. In cases when resource availability is the limiting factor for conducting a project (e.g., small university, time to conduct the research project), statistically viable results can still be obtained with less of an investment.

  4. [Statistical analysis of German radiologic periodicals: developmental trends in the last 10 years].

    PubMed

    Golder, W

    1999-09-01

    To identify which statistical tests are applied in German radiological publications, to what extent their use has changed during the last decade, and which factors might be responsible for this development. The major articles published in "ROFO" and "DER RADIOLOGE" during 1988, 1993 and 1998 were reviewed for statistical content. The contributions were classified by principal focus and radiological subspecialty. The methods used were assigned to descriptive, basal and advanced statistics. Sample size, significance level and power were established. The use of experts' assistance was monitored. Finally, we calculated the so-called cumulative accessibility of the publications. 525 contributions were found to be eligible. In 1988, 87% used descriptive statistics only, 12.5% basal, and 0.5% advanced statistics. The corresponding figures in 1993 and 1998 are 62 and 49%, 32 and 41%, and 6 and 10%, respectively. Statistical techniques were most likely to be used in research on musculoskeletal imaging and articles dedicated to MRI. Six basic categories of statistical methods account for the complete statistical analysis appearing in 90% of the articles. ROC analysis is the single most common advanced technique. Authors make increasingly use of statistical experts' opinion and programs. During the last decade, the use of statistical methods in German radiological journals has fundamentally improved, both quantitatively and qualitatively. Presently, advanced techniques account for 20% of the pertinent statistical tests. This development seems to be promoted by the increasing availability of statistical analysis software.

  5. Informal Statistics Help Desk

    NASA Technical Reports Server (NTRS)

    Young, M.; Koslovsky, M.; Schaefer, Caroline M.; Feiveson, A. H.

    2017-01-01

    Back by popular demand, the JSC Biostatistics Laboratory and LSAH statisticians are offering an opportunity to discuss your statistical challenges and needs. Take the opportunity to meet the individuals offering expert statistical support to the JSC community. Join us for an informal conversation about any questions you may have encountered with issues of experimental design, analysis, or data visualization. Get answers to common questions about sample size, repeated measures, statistical assumptions, missing data, multiple testing, time-to-event data, and when to trust the results of your analyses.

  6. Gene coexpression measures in large heterogeneous samples using count statistics.

    PubMed

    Wang, Y X Rachel; Waterman, Michael S; Huang, Haiyan

    2014-11-18

    With the advent of high-throughput technologies making large-scale gene expression data readily available, developing appropriate computational tools to process these data and distill insights into systems biology has been an important part of the "big data" challenge. Gene coexpression is one of the earliest techniques developed that is still widely in use for functional annotation, pathway analysis, and, most importantly, the reconstruction of gene regulatory networks, based on gene expression data. However, most coexpression measures do not specifically account for local features in expression profiles. For example, it is very likely that the patterns of gene association may change or only exist in a subset of the samples, especially when the samples are pooled from a range of experiments. We propose two new gene coexpression statistics based on counting local patterns of gene expression ranks to take into account the potentially diverse nature of gene interactions. In particular, one of our statistics is designed for time-course data with local dependence structures, such as time series coupled over a subregion of the time domain. We provide asymptotic analysis of their distributions and power, and evaluate their performance against a wide range of existing coexpression measures on simulated and real data. Our new statistics are fast to compute, robust against outliers, and show comparable and often better general performance.

  7. The Content of Statistical Requirements for Authors in Biomedical Research Journals

    PubMed Central

    Liu, Tian-Yi; Cai, Si-Yu; Nie, Xiao-Lu; Lyu, Ya-Qi; Peng, Xiao-Xia; Feng, Guo-Shuang

    2016-01-01

    Background: Robust statistical designing, sound statistical analysis, and standardized presentation are important to enhance the quality and transparency of biomedical research. This systematic review was conducted to summarize the statistical reporting requirements introduced by biomedical research journals with an impact factor of 10 or above so that researchers are able to give statistical issues’ serious considerations not only at the stage of data analysis but also at the stage of methodological design. Methods: Detailed statistical instructions for authors were downloaded from the homepage of each of the included journals or obtained from the editors directly via email. Then, we described the types and numbers of statistical guidelines introduced by different press groups. Items of statistical reporting guideline as well as particular requirements were summarized in frequency, which were grouped into design, method of analysis, and presentation, respectively. Finally, updated statistical guidelines and particular requirements for improvement were summed up. Results: Totally, 21 of 23 press groups introduced at least one statistical guideline. More than half of press groups can update their statistical instruction for authors gradually relative to issues of new statistical reporting guidelines. In addition, 16 press groups, covering 44 journals, address particular statistical requirements. The most of the particular requirements focused on the performance of statistical analysis and transparency in statistical reporting, including “address issues relevant to research design, including participant flow diagram, eligibility criteria, and sample size estimation,” and “statistical methods and the reasons.” Conclusions: Statistical requirements for authors are becoming increasingly perfected. Statistical requirements for authors remind researchers that they should make sufficient consideration not only in regards to statistical methods during the research design, but also standardized statistical reporting, which would be beneficial in providing stronger evidence and making a greater critical appraisal of evidence more accessible. PMID:27748343

  8. The Content of Statistical Requirements for Authors in Biomedical Research Journals.

    PubMed

    Liu, Tian-Yi; Cai, Si-Yu; Nie, Xiao-Lu; Lyu, Ya-Qi; Peng, Xiao-Xia; Feng, Guo-Shuang

    2016-10-20

    Robust statistical designing, sound statistical analysis, and standardized presentation are important to enhance the quality and transparency of biomedical research. This systematic review was conducted to summarize the statistical reporting requirements introduced by biomedical research journals with an impact factor of 10 or above so that researchers are able to give statistical issues' serious considerations not only at the stage of data analysis but also at the stage of methodological design. Detailed statistical instructions for authors were downloaded from the homepage of each of the included journals or obtained from the editors directly via email. Then, we described the types and numbers of statistical guidelines introduced by different press groups. Items of statistical reporting guideline as well as particular requirements were summarized in frequency, which were grouped into design, method of analysis, and presentation, respectively. Finally, updated statistical guidelines and particular requirements for improvement were summed up. Totally, 21 of 23 press groups introduced at least one statistical guideline. More than half of press groups can update their statistical instruction for authors gradually relative to issues of new statistical reporting guidelines. In addition, 16 press groups, covering 44 journals, address particular statistical requirements. The most of the particular requirements focused on the performance of statistical analysis and transparency in statistical reporting, including "address issues relevant to research design, including participant flow diagram, eligibility criteria, and sample size estimation," and "statistical methods and the reasons." Statistical requirements for authors are becoming increasingly perfected. Statistical requirements for authors remind researchers that they should make sufficient consideration not only in regards to statistical methods during the research design, but also standardized statistical reporting, which would be beneficial in providing stronger evidence and making a greater critical appraisal of evidence more accessible.

  9. Statistical power in parallel group point exposure studies with time-to-event outcomes: an empirical comparison of the performance of randomized controlled trials and the inverse probability of treatment weighting (IPTW) approach.

    PubMed

    Austin, Peter C; Schuster, Tibor; Platt, Robert W

    2015-10-15

    Estimating statistical power is an important component of the design of both randomized controlled trials (RCTs) and observational studies. Methods for estimating statistical power in RCTs have been well described and can be implemented simply. In observational studies, statistical methods must be used to remove the effects of confounding that can occur due to non-random treatment assignment. Inverse probability of treatment weighting (IPTW) using the propensity score is an attractive method for estimating the effects of treatment using observational data. However, sample size and power calculations have not been adequately described for these methods. We used an extensive series of Monte Carlo simulations to compare the statistical power of an IPTW analysis of an observational study with time-to-event outcomes with that of an analysis of a similarly-structured RCT. We examined the impact of four factors on the statistical power function: number of observed events, prevalence of treatment, the marginal hazard ratio, and the strength of the treatment-selection process. We found that, on average, an IPTW analysis had lower statistical power compared to an analysis of a similarly-structured RCT. The difference in statistical power increased as the magnitude of the treatment-selection model increased. The statistical power of an IPTW analysis tended to be lower than the statistical power of a similarly-structured RCT.

  10. The prior statistics of object colors.

    PubMed

    Koenderink, Jan J

    2010-02-01

    The prior statistics of object colors is of much interest because extensive statistical investigations of reflectance spectra reveal highly non-uniform structure in color space common to several very different databases. This common structure is due to the visual system rather than to the statistics of environmental structure. Analysis involves an investigation of the proper sample space of spectral reflectance factors and of the statistical consequences of the projection of spectral reflectances on the color solid. Even in the case of reflectance statistics that are translationally invariant with respect to the wavelength dimension, the statistics of object colors is highly non-uniform. The qualitative nature of this non-uniformity is due to trichromacy.

  11. Multivariate statistical analysis of the polyphenolic constituents in kiwifruit juices to trace fruit varieties and geographical origins.

    PubMed

    Guo, Jing; Yuan, Yahong; Dou, Pei; Yue, Tianli

    2017-10-01

    Fifty-one kiwifruit juice samples of seven kiwifruit varieties from five regions in China were analyzed to determine their polyphenols contents and to trace fruit varieties and geographical origins by multivariate statistical analysis. Twenty-one polyphenols belonging to four compound classes were determined by ultra-high-performance liquid chromatography coupled with ultra-high-resolution TOF mass spectrometry. (-)-Epicatechin, (+)-catechin, procyanidin B1 and caffeic acid derivatives were the predominant phenolic compounds in the juices. Principal component analysis (PCA) allowed a clear separation of the juices according to kiwifruit varieties. Stepwise linear discriminant analysis (SLDA) yielded satisfactory categorization of samples, provided 100% success rate according to kiwifruit varieties and 92.2% success rate according to geographical origins. The result showed that polyphenolic profiles of kiwifruit juices contain enough information to trace fruit varieties and geographical origins. Copyright © 2017 Elsevier Ltd. All rights reserved.

  12. Integrated Assessment and Improvement of the Quality Assurance System for the Cosworth Casting Process

    NASA Astrophysics Data System (ADS)

    Yousif, Dilon

    The purpose of this study was to improve the Quality Assurance (QA) System at the Nemak Windsor Aluminum Plant (WAP). The project used Six Sigma method based on Define, Measure, Analyze, Improve, and Control (DMAIC). Analysis of in process melt at WAP was based on chemical, thermal, and mechanical testing. The control limits for the W319 Al Alloy were statistically recalculated using the composition measured under stable conditions. The "Chemistry Viewer" software was developed for statistical analysis of alloy composition. This software features the Silicon Equivalency (SiBQ) developed by the IRC. The Melt Sampling Device (MSD) was designed and evaluated at WAP to overcome traditional sampling limitations. The Thermal Analysis "Filters" software was developed for cooling curve analysis of the 3XX Al Alloy(s) using IRC techniques. The impact of low melting point impurities on the start of melting was evaluated using the Universal Metallurgical Simulator and Analyzer (UMSA).

  13. A First Survey on the Abundance of Plastics Fragments and Particles on Two Sandy Beaches in Kuching, Sarawak, Malaysia

    NASA Astrophysics Data System (ADS)

    Noik, V. James; Mohd Tuah, P.

    2015-04-01

    Plastic fragments and particles as an emerging environmental contaminant and pollutant are gaining scientific attention in the recent decades due to the potential threats on biota. This study aims to elucidate the presence, abundance and temporal change of plastic fragments and particles from two selected beaches, namely Santubong and Trombol in Kuching on two sampling times. Morphological and polymer identification assessment on the recovered plastics was also conducted. Overall comparison statistical analysis revealed that the abundance of plastic fragments/debris on both of sampling stations were insignificantly different (p>0.05). Likewise, statistical analysis on the temporal changes on the abundance yielded no significant difference for most of the sampling sites on each respective station, except STB-S2. Morphological studies revealed physical features of plastic fragments and debris were diverse in shapes, sizes, colors and surface fatigues. FTIR fingerprinting analysis shows that polypropylene and polyethylene were the dominant plastic polymers debris on both beaches.

  14. Differences in results of analyses of concurrent and split stream-water samples collected and analyzed by the US Geological Survey and the Illinois Environmental Protection Agency, 1985-91

    USGS Publications Warehouse

    Melching, C.S.; Coupe, R.H.

    1995-01-01

    During water years 1985-91, the U.S. Geological Survey (USGS) and the Illinois Environmental Protection Agency (IEPA) cooperated in the collection and analysis of concurrent and split stream-water samples from selected sites in Illinois. Concurrent samples were collected independently by field personnel from each agency at the same time and sent to the IEPA laboratory, whereas the split samples were collected by USGS field personnel and divided into aliquots that were sent to each agency's laboratory for analysis. The water-quality data from these programs were examined by means of the Wilcoxon signed ranks test to identify statistically significant differences between results of the USGS and IEPA analyses. The data sets for constituents and properties identified by the Wilcoxon test as having significant differences were further examined by use of the paired t-test, mean relative percentage difference, and scattergrams to determine if the differences were important. Of the 63 constituents and properties in the concurrent-sample analysis, differences in only 2 (pH and ammonia) were statistically significant and large enough to concern water-quality engineers and planners. Of the 27 constituents and properties in the split-sample analysis, differences in 9 (turbidity, dissolved potassium, ammonia, total phosphorus, dissolved aluminum, dissolved barium, dissolved iron, dissolved manganese, and dissolved nickel) were statistically significant and large enough to con- cern water-quality engineers and planners. The differences in concentration between pairs of the concurrent samples were compared to the precision of the laboratory or field method used. The differences in concentration between pairs of the concurrent samples were compared to the precision of the laboratory or field method used. The differences in concentration between paris of split samples were compared to the precision of the laboratory method used and the interlaboratory precision of measuring a given concentration or property. Consideration of method precision indicated that differences between concurrent samples were insignificant for all concentrations and properties except pH, and that differences between split samples were significant for all concentrations and properties. Consideration of interlaboratory precision indicated that the differences between the split samples were not unusually large. The results for the split samples illustrate the difficulty in obtaining comparable and accurate water-quality data.

  15. Quantity of dates trumps quality of dates for dense Bayesian radiocarbon sediment chronologies - Gas ion source 14C dating instructed by simultaneous Bayesian accumulation rate modeling

    NASA Astrophysics Data System (ADS)

    Rosenheim, B. E.; Firesinger, D.; Roberts, M. L.; Burton, J. R.; Khan, N.; Moyer, R. P.

    2016-12-01

    Radiocarbon (14C) sediment core chronologies benefit from a high density of dates, even when precision of individual dates is sacrificed. This is demonstrated by a combined approach of rapid 14C analysis of CO2 gas generated from carbonates and organic material coupled with Bayesian statistical modeling. Analysis of 14C is facilitated by the gas ion source on the Continuous Flow Accelerator Mass Spectrometry (CFAMS) system at the Woods Hole Oceanographic Institution's National Ocean Sciences Accelerator Mass Spectrometry facility. This instrument is capable of producing a 14C determination of +/- 100 14C y precision every 4-5 minutes, with limited sample handling (dissolution of carbonates and/or combustion of organic carbon in evacuated containers). Rapid analysis allows over-preparation of samples to include replicates at each depth and/or comparison of different sample types at particular depths in a sediment or peat core. Analysis priority is given to depths that have the least chronologic precision as determined by Bayesian modeling of the chronology of calibrated ages. Use of such a statistical approach to determine the order in which samples are run ensures that the chronology constantly improves so long as material is available for the analysis of chronologic weak points. Ultimately, accuracy of the chronology is determined by the material that is actually being dated, and our combined approach allows testing of different constituents of the organic carbon pool and the carbonate minerals within a core. We will present preliminary results from a deep-sea sediment core abundant in deep-sea foraminifera as well as coastal wetland peat cores to demonstrate statistical improvements in sediment- and peat-core chronologies obtained by increasing the quantity and decreasing the quality of individual dates.

  16. A Guerilla Guide to Common Problems in ‘Neurostatistics’: Essential Statistical Topics in Neuroscience

    PubMed Central

    Smith, Paul F.

    2017-01-01

    Effective inferential statistical analysis is essential for high quality studies in neuroscience. However, recently, neuroscience has been criticised for the poor use of experimental design and statistical analysis. Many of the statistical issues confronting neuroscience are similar to other areas of biology; however, there are some that occur more regularly in neuroscience studies. This review attempts to provide a succinct overview of some of the major issues that arise commonly in the analyses of neuroscience data. These include: the non-normal distribution of the data; inequality of variance between groups; extensive correlation in data for repeated measurements across time or space; excessive multiple testing; inadequate statistical power due to small sample sizes; pseudo-replication; and an over-emphasis on binary conclusions about statistical significance as opposed to effect sizes. Statistical analysis should be viewed as just another neuroscience tool, which is critical to the final outcome of the study. Therefore, it needs to be done well and it is a good idea to be proactive and seek help early, preferably before the study even begins. PMID:29371855

  17. A Guerilla Guide to Common Problems in 'Neurostatistics': Essential Statistical Topics in Neuroscience.

    PubMed

    Smith, Paul F

    2017-01-01

    Effective inferential statistical analysis is essential for high quality studies in neuroscience. However, recently, neuroscience has been criticised for the poor use of experimental design and statistical analysis. Many of the statistical issues confronting neuroscience are similar to other areas of biology; however, there are some that occur more regularly in neuroscience studies. This review attempts to provide a succinct overview of some of the major issues that arise commonly in the analyses of neuroscience data. These include: the non-normal distribution of the data; inequality of variance between groups; extensive correlation in data for repeated measurements across time or space; excessive multiple testing; inadequate statistical power due to small sample sizes; pseudo-replication; and an over-emphasis on binary conclusions about statistical significance as opposed to effect sizes. Statistical analysis should be viewed as just another neuroscience tool, which is critical to the final outcome of the study. Therefore, it needs to be done well and it is a good idea to be proactive and seek help early, preferably before the study even begins.

  18. Extracting Hydrologic Understanding from the Unique Space-time Sampling of the Surface Water and Ocean Topography (SWOT) Mission

    NASA Astrophysics Data System (ADS)

    Nickles, C.; Zhao, Y.; Beighley, E.; Durand, M. T.; David, C. H.; Lee, H.

    2017-12-01

    The Surface Water and Ocean Topography (SWOT) satellite mission is jointly developed by NASA, the French space agency (CNES), with participation from the Canadian and UK space agencies to serve both the hydrology and oceanography communities. The SWOT mission will sample global surface water extents and elevations (lakes/reservoirs, rivers, estuaries, oceans, sea and land ice) at a finer spatial resolution than is currently possible enabling hydrologic discovery, model advancements and new applications that are not currently possible or likely even conceivable. Although the mission will provide global cover, analysis and interpolation of the data generated from the irregular space/time sampling represents a significant challenge. In this study, we explore the applicability of the unique space/time sampling for understanding river discharge dynamics throughout the Ohio River Basin. River network topology, SWOT sampling (i.e., orbit and identified SWOT river reaches) and spatial interpolation concepts are used to quantify the fraction of effective sampling of river reaches each day of the three-year mission. Streamflow statistics for SWOT generated river discharge time series are compared to continuous daily river discharge series. Relationships are presented to transform SWOT generated streamflow statistics to equivalent continuous daily discharge time series statistics intended to support hydrologic applications using low-flow and annual flow duration statistics.

  19. Identification of key micro-organisms involved in Douchi fermentation by statistical analysis and their use in an experimental fermentation.

    PubMed

    Chen, C; Xiang, J Y; Hu, W; Xie, Y B; Wang, T J; Cui, J W; Xu, Y; Liu, Z; Xiang, H; Xie, Q

    2015-11-01

    To screen and identify safe micro-organisms used during Douchi fermentation, and verify the feasibility of producing high-quality Douchi using these identified micro-organisms. PCR-denaturing gradient gel electrophoresis (DGGE) and automatic amino-acid analyser were used to investigate the microbial diversity and free amino acids (FAAs) content of 10 commercial Douchi samples. The correlations between microbial communities and FAAs were analysed by statistical analysis. Ten strains with significant positive correlation were identified. Then an experiment on Douchi fermentation by identified strains was carried out, and the nutritional composition in Douchi was analysed. Results showed that FAAs and relative content of isoflavone aglycones in verification Douchi samples were generally higher than those in commercial Douchi samples. Our study indicated that fungi, yeasts, Bacillus and lactic acid bacteria were the key players in Douchi fermentation, and with identified probiotic micro-organisms participating in fermentation, a higher quality Douchi product was produced. This is the first report to analyse and confirm the key micro-organisms during Douchi fermentation by statistical analysis. This work proves fermentation micro-organisms to be the key influencing factor of Douchi quality, and demonstrates the feasibility of fermenting Douchi using identified starter micro-organisms. © 2015 The Society for Applied Microbiology.

  20. Implementation of novel statistical procedures and other advanced approaches to improve analysis of CASA data.

    PubMed

    Ramón, M; Martínez-Pastor, F

    2018-04-23

    Computer-aided sperm analysis (CASA) produces a wealth of data that is frequently ignored. The use of multiparametric statistical methods can help explore these datasets, unveiling the subpopulation structure of sperm samples. In this review we analyse the significance of the internal heterogeneity of sperm samples and its relevance. We also provide a brief description of the statistical tools used for extracting sperm subpopulations from the datasets, namely unsupervised clustering (with non-hierarchical, hierarchical and two-step methods) and the most advanced supervised methods, based on machine learning. The former method has allowed exploration of subpopulation patterns in many species, whereas the latter offering further possibilities, especially considering functional studies and the practical use of subpopulation analysis. We also consider novel approaches, such as the use of geometric morphometrics or imaging flow cytometry. Finally, although the data provided by CASA systems provides valuable information on sperm samples by applying clustering analyses, there are several caveats. Protocols for capturing and analysing motility or morphometry should be standardised and adapted to each experiment, and the algorithms should be open in order to allow comparison of results between laboratories. Moreover, we must be aware of new technology that could change the paradigm for studying sperm motility and morphology.

  1. Statistical Assessment of a Paired-site Approach for Verification of Carbon and Nitrogen Sequestration on CRP Land

    NASA Astrophysics Data System (ADS)

    Kucharik, C.; Roth, J.

    2002-12-01

    The threat of global climate change has provoked policy-makers to consider plausible strategies to slow the accumulation of greenhouse gases, especially carbon dioxide, in the atmosphere. One such idea involves the sequestration of atmospheric carbon (C) in degraded agricultural soils as part of the Conservation Reserve Program (CRP). While the potential for significant C sequestration in CRP grassland ecosystems has been demonstrated, the paired-site sampling approach traditionally used to quantify soil C changes has not been evaluated with robust statistical analysis. In this study, 14 paired CRP (> 8 years old) and cropland sites in Dane County, Wisconsin (WI) were used to assess whether a paired-site sampling design could detect statistically significant differences (ANOVA) in mean soil organic C and total nitrogen (N) storage. We compared surface (0 to 10 cm) bulk density, and sampled soils (0 to 5, 5 to 10, and 10 to 25 cm) for textural differences and chemical analysis of organic matter (OM), soil organic C (SOC), total N, and pH. The CRP contributed to lowering soil bulk density by 13% (p < 0.0001) and increased SOC and OM storage (kg m-2) by 13 to 17% in the 0 to 5 cm layer (p = 0.1). We tested the statistical power associated with ANOVA for measured soil properties, and calculated minimum detectable differences (MDD). We concluded that 40 to 65 paired sites and soil sampling in 5 cm increments near the surface were needed to achieve an 80% confidence level (α = 0.05; β = 0.20) in soil C and N sequestration rates. Because soil C and total N storage was highly variable among these sites (CVs > 20%), only a 23 to 29% change in existing total organic C and N pools could be reliably detected. While C and N sequestration (247 kg C ha{-1 } yr-1 and 17 kg N ha-1 yr-1) may be occurring and confined to the surface 5 cm as part of the WI CRP, our sampling design did not statistically support the desired 80% confidence level. We conclude that usage of statistical power analysis is essential to insure a high level of confidence in soil C and N sequestration rates that are quantified using paired plots.

  2. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Solc, J.

    The reclamation effort typically deals with consequences of mining activity instead of being planned well before the mining. Detailed assessment of principal hydro- and geochemical processes participating in pore and groundwater chemistry evolution was carried out at three surface mine localities in North Dakota-the Fritz mine, the Indian Head mine, and the Velva mine. The geochemical model MINTEQUA2 and advanced statistical analysis coupled with traditional interpretive techniques were used to determine site-specific environmental characteristics and to compare the differences between study sites. Multivariate statistical analysis indicates that sulfate, magnesium, calcium, the gypsum saturation index, and sodium contribute the most tomore » overall differences in groundwater chemistry between study sites. Soil paste extract pH and EC measurements performed on over 3700 samples document extremely acidic soils at the Fritz mine. The number of samples with pH <5.5 reaches 80%-90% of total samples from discrete depth near the top of the soil profile at the Fritz mine. Soil samples from Indian Head and Velva do not indicate the acidity below the pH of 5.5 limit. The percentage of samples with EC > 3 mS cm{sup -1} is between 20% and 40% at the Fritz mine and below 20% for samples from Indian Head and Velva. The results of geochemical modeling indicate an increased tendency for gypsum saturation within the vadose zone, particularly within the lands disturbed by mining activity. This trend is directly associated with increased concentrations of sulfate anions as a result of mineral oxidation. Geochemical modeling, statistical analysis, and soil extract pH and EC measurements proved to be reliable, fast, and relatively cost-effective tools for the assessment of soil acidity, the extent of the oxidation zone, and the potential for negative impact on pore and groundwater chemistry.« less

  3. Sampling surface and subsurface particle-size distributions in wadable gravel-and cobble-bed streams for analyses in sediment transport, hydraulics, and streambed monitoring

    Treesearch

    Kristin Bunte; Steven R. Abt

    2001-01-01

    This document provides guidance for sampling surface and subsurface sediment from wadable gravel-and cobble-bed streams. After a short introduction to streams types and classifications in gravel-bed rivers, the document explains the field and laboratory measurement of particle sizes and the statistical analysis of particle-size distributions. Analysis of particle...

  4. Weighting by Inverse Variance or by Sample Size in Random-Effects Meta-Analysis

    ERIC Educational Resources Information Center

    Marin-Martinez, Fulgencio; Sanchez-Meca, Julio

    2010-01-01

    Most of the statistical procedures in meta-analysis are based on the estimation of average effect sizes from a set of primary studies. The optimal weight for averaging a set of independent effect sizes is the inverse variance of each effect size, but in practice these weights have to be estimated, being affected by sampling error. When assuming a…

  5. Considerations for the design, analysis and presentation of in vivo studies.

    PubMed

    Ranstam, J; Cook, J A

    2017-03-01

    To describe, explain and give practical suggestions regarding important principles and key methodological challenges in the study design, statistical analysis, and reporting of results from in vivo studies. Pre-specifying endpoints and analysis, recognizing the common underlying assumption of statistically independent observations, performing sample size calculations, and addressing multiplicity issues are important parts of an in vivo study. A clear reporting of results and informative graphical presentations of data are other important parts. Copyright © 2016 Osteoarthritis Research Society International. Published by Elsevier Ltd. All rights reserved.

  6. Significant Association of Urinary Toxic Metals and Autism-Related Symptoms—A Nonlinear Statistical Analysis with Cross Validation

    PubMed Central

    Adams, James; Kruger, Uwe; Geis, Elizabeth; Gehn, Eva; Fimbres, Valeria; Pollard, Elena; Mitchell, Jessica; Ingram, Julie; Hellmers, Robert; Quig, David; Hahn, Juergen

    2017-01-01

    Introduction A number of previous studies examined a possible association of toxic metals and autism, and over half of those studies suggest that toxic metal levels are different in individuals with Autism Spectrum Disorders (ASD). Additionally, several studies found that those levels correlate with the severity of ASD. Methods In order to further investigate these points, this paper performs the most detailed statistical analysis to date of a data set in this field. First morning urine samples were collected from 67 children and adults with ASD and 50 neurotypical controls of similar age and gender. The samples were analyzed to determine the levels of 10 urinary toxic metals (UTM). Autism-related symptoms were assessed with eleven behavioral measures. Statistical analysis was used to distinguish participants on the ASD spectrum and neurotypical participants based upon the UTM data alone. The analysis also included examining the association of autism severity with toxic metal excretion data using linear and nonlinear analysis. “Leave-one-out” cross-validation was used to ensure statistical independence of results. Results and Discussion Average excretion levels of several toxic metals (lead, tin, thallium, antimony) were significantly higher in the ASD group. However, ASD classification using univariate statistics proved difficult due to large variability, but nonlinear multivariate statistical analysis significantly improved ASD classification with Type I/II errors of 15% and 18%, respectively. These results clearly indicate that the urinary toxic metal excretion profiles of participants in the ASD group were significantly different from those of the neurotypical participants. Similarly, nonlinear methods determined a significantly stronger association between the behavioral measures and toxic metal excretion. The association was strongest for the Aberrant Behavior Checklist (including subscales on Irritability, Stereotypy, Hyperactivity, and Inappropriate Speech), but significant associations were found for UTM with all eleven autism-related assessments with cross-validation R2 values ranging from 0.12–0.48. PMID:28068407

  7. Advanced microwave soil moisture studies. [Big Sioux River Basin, Iowa

    NASA Technical Reports Server (NTRS)

    Dalsted, K. J.; Harlan, J. C.

    1983-01-01

    Comparisons of low level L-band brightness temperature (TB) and thermal infrared (TIR) data as well as the following data sets: soil map and land cover data; direct soil moisture measurement; and a computer generated contour map were statistically evaluated using regression analysis and linear discriminant analysis. Regression analysis of footprint data shows that statistical groupings of ground variables (soil features and land cover) hold promise for qualitative assessment of soil moisture and for reducing variance within the sampling space. Dry conditions appear to be more conductive to producing meaningful statistics than wet conditions. Regression analysis using field averaged TB and TIR data did not approach the higher sq R values obtained using within-field variations. The linear discriminant analysis indicates some capacity to distinguish categories with the results being somewhat better on a field basis than a footprint basis.

  8. The use of multicomponent statistical analysis in hydrogeological environmental research.

    PubMed

    Lambrakis, Nicolaos; Antonakos, Andreas; Panagopoulos, George

    2004-04-01

    The present article examines the possibilities of investigating NO(3)(-) spread in aquifers by applying multicomponent statistical methods (factor, cluster and discriminant analysis) on hydrogeological, hydrochemical, and environmental parameters. A 4-R-Mode factor model determined from the analysis showed its useful role in investigating hydrogeological parameters affecting NO(3)(-) concentration, such as its dilution by upcoming groundwater of the recharge areas. The relationship between NO(3)(-) concentration and agricultural activities can be determined sufficiently by the first factor which relies on NO(3)(-) and SO(4)(2-) of the same origin-that of agricultural fertilizers. The other three factors of R-Mode analysis are not connected directly to the NO(3)(-) problem. They do however, by extracting the role of the unsaturated zone, show an interesting relationship between organic matter content, thickness and saturated hydraulic conductivity. The application of Hirerarchical Cluster Analysis, based on all possible combinations of classification method, showed two main groups of samples. The first group comprises samples from the edges and the second from the central part of the study area. By the application of Discriminant Analysis it was shown that NO(3)(-) and SO(4)(2-) ions are the most significant variables in the discriminant function. Therefore, the first group is considered to comprise all samples from areas not influenced by fertilizers lying on the edges of contaminating activities such as crop cultivation, while the second comprises all the other samples.

  9. "Using Power Tables to Compute Statistical Power in Multilevel Experimental Designs"

    ERIC Educational Resources Information Center

    Konstantopoulos, Spyros

    2009-01-01

    Power computations for one-level experimental designs that assume simple random samples are greatly facilitated by power tables such as those presented in Cohen's book about statistical power analysis. However, in education and the social sciences experimental designs have naturally nested structures and multilevel models are needed to compute the…

  10. An On-Line Virtual Environment for Teaching Statistical Sampling and Analysis

    ERIC Educational Resources Information Center

    Marsh, Michael T.

    2009-01-01

    Regardless of the related discipline, students in statistics courses invariably have difficulty understanding the connection between the numerical values calculated for end-of-the-chapter exercises and their usefulness in decision making. This disconnect is, in part, due to the lack of time and opportunity to actually design the experiments and…

  11. STATISTICAL ANALYSIS OF TANK 5 FLOOR SAMPLE RESULTS

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Shine, E.

    2012-03-14

    Sampling has been completed for the characterization of the residual material on the floor of Tank 5 in the F-Area Tank Farm at the Savannah River Site (SRS), near Aiken, SC. The sampling was performed by Savannah River Remediation (SRR) LLC using a stratified random sampling plan with volume-proportional compositing. The plan consisted of partitioning the residual material on the floor of Tank 5 into three non-overlapping strata: two strata enclosed accumulations, and a third stratum consisted of a thin layer of material outside the regions of the two accumulations. Each of three composite samples was constructed from five primarymore » sample locations of residual material on the floor of Tank 5. Three of the primary samples were obtained from the stratum containing the thin layer of material, and one primary sample was obtained from each of the two strata containing an accumulation. This report documents the statistical analyses of the analytical results for the composite samples. The objective of the analysis is to determine the mean concentrations and upper 95% confidence (UCL95) bounds for the mean concentrations for a set of analytes in the tank residuals. The statistical procedures employed in the analyses were consistent with the Environmental Protection Agency (EPA) technical guidance by Singh and others [2010]. Savannah River National Laboratory (SRNL) measured the sample bulk density, nonvolatile beta, gross alpha, radionuclide, inorganic, and anion concentrations three times for each of the composite samples. The analyte concentration data were partitioned into three separate groups for further analysis: analytes with every measurement above their minimum detectable concentrations (MDCs), analytes with no measurements above their MDCs, and analytes with a mixture of some measurement results above and below their MDCs. The means, standard deviations, and UCL95s were computed for the analytes in the two groups that had at least some measurements above their MDCs. The identification of distributions and the selection of UCL95 procedures generally followed the protocol in Singh, Armbya, and Singh [2010]. When all of an analyte's measurements lie below their MDCs, only a summary of the MDCs can be provided. The measurement results reported by SRNL are listed in Appendix A, and the results of this analysis are reported in Appendix B. The data were generally found to follow a normal distribution, and to be homogeneous across composite samples.« less

  12. Statistical Analysis of Tank 5 Floor Sample Results

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Shine, E. P.

    2013-01-31

    Sampling has been completed for the characterization of the residual material on the floor of Tank 5 in the F-Area Tank Farm at the Savannah River Site (SRS), near Aiken, SC. The sampling was performed by Savannah River Remediation (SRR) LLC using a stratified random sampling plan with volume-proportional compositing. The plan consisted of partitioning the residual material on the floor of Tank 5 into three non-overlapping strata: two strata enclosed accumulations, and a third stratum consisted of a thin layer of material outside the regions of the two accumulations. Each of three composite samples was constructed from five primarymore » sample locations of residual material on the floor of Tank 5. Three of the primary samples were obtained from the stratum containing the thin layer of material, and one primary sample was obtained from each of the two strata containing an accumulation. This report documents the statistical analyses of the analytical results for the composite samples. The objective of the analysis is to determine the mean concentrations and upper 95% confidence (UCL95) bounds for the mean concentrations for a set of analytes in the tank residuals. The statistical procedures employed in the analyses were consistent with the Environmental Protection Agency (EPA) technical guidance by Singh and others [2010]. Savannah River National Laboratory (SRNL) measured the sample bulk density, nonvolatile beta, gross alpha, and the radionuclide1, elemental, and chemical concentrations three times for each of the composite samples. The analyte concentration data were partitioned into three separate groups for further analysis: analytes with every measurement above their minimum detectable concentrations (MDCs), analytes with no measurements above their MDCs, and analytes with a mixture of some measurement results above and below their MDCs. The means, standard deviations, and UCL95s were computed for the analytes in the two groups that had at least some measurements above their MDCs. The identification of distributions and the selection of UCL95 procedures generally followed the protocol in Singh, Armbya, and Singh [2010]. When all of an analyte's measurements lie below their MDCs, only a summary of the MDCs can be provided. The measurement results reported by SRNL are listed, and the results of this analysis are reported. The data were generally found to follow a normal distribution, and to be homogenous across composite samples.« less

  13. Statistical Analysis Of Tank 5 Floor Sample Results

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Shine, E. P.

    2012-08-01

    Sampling has been completed for the characterization of the residual material on the floor of Tank 5 in the F-Area Tank Farm at the Savannah River Site (SRS), near Aiken, SC. The sampling was performed by Savannah River Remediation (SRR) LLC using a stratified random sampling plan with volume-proportional compositing. The plan consisted of partitioning the residual material on the floor of Tank 5 into three non-overlapping strata: two strata enclosed accumulations, and a third stratum consisted of a thin layer of material outside the regions of the two accumulations. Each of three composite samples was constructed from five primarymore » sample locations of residual material on the floor of Tank 5. Three of the primary samples were obtained from the stratum containing the thin layer of material, and one primary sample was obtained from each of the two strata containing an accumulation. This report documents the statistical analyses of the analytical results for the composite samples. The objective of the analysis is to determine the mean concentrations and upper 95% confidence (UCL95) bounds for the mean concentrations for a set of analytes in the tank residuals. The statistical procedures employed in the analyses were consistent with the Environmental Protection Agency (EPA) technical guidance by Singh and others [2010]. Savannah River National Laboratory (SRNL) measured the sample bulk density, nonvolatile beta, gross alpha, and the radionuclide, elemental, and chemical concentrations three times for each of the composite samples. The analyte concentration data were partitioned into three separate groups for further analysis: analytes with every measurement above their minimum detectable concentrations (MDCs), analytes with no measurements above their MDCs, and analytes with a mixture of some measurement results above and below their MDCs. The means, standard deviations, and UCL95s were computed for the analytes in the two groups that had at least some measurements above their MDCs. The identification of distributions and the selection of UCL95 procedures generally followed the protocol in Singh, Armbya, and Singh [2010]. When all of an analyte's measurements lie below their MDCs, only a summary of the MDCs can be provided. The measurement results reported by SRNL are listed in Appendix A, and the results of this analysis are reported in Appendix B. The data were generally found to follow a normal distribution, and to be homogenous across composite samples.« less

  14. A crash course on data analysis in asteroseismology

    NASA Astrophysics Data System (ADS)

    Appourchaux, Thierry

    2014-02-01

    In this course, I try to provide a few basics required for performing data analysis in asteroseismology. First, I address how one can properly treat times series: the sampling, the filtering effect, the use of Fourier transform, the associated statistics. Second, I address how one can apply statistics for decision making and for parameter estimation either in a frequentist of a Bayesian framework. Last, I review how these basic principle have been applied (or not) in asteroseismology.

  15. Statistical analysis of oil percolation through pressboard measured by optical recording

    NASA Astrophysics Data System (ADS)

    Rogalski, Przemysław; Kozak, Czesław

    2017-08-01

    The paper presents a measuring station used to measure the percolation of transformer oil by electrotechnical pressboard. Nytro Taurus insulating oil manufactured by Nynas company percolation rate by the Pucaro company pressboard investigation was made. Approximately 60 samples of Pucaro made pressboard, widely used for insulation of power transformers, was measured. Statistical analysis of oil percolation times were performed. The measurements made it possible to determine the distribution of capillary diameters occurring in the pressboard.

  16. Sample processing, protocol, and statistical analysis of the time-of-flight secondary ion mass spectrometry (ToF-SIMS) of protein, cell, and tissue samples.

    PubMed

    Barreto, Goncalo; Soininen, Antti; Sillat, Tarvo; Konttinen, Yrjö T; Kaivosoja, Emilia

    2014-01-01

    Time-of-flight secondary ion mass spectrometry (ToF-SIMS) is increasingly being used in analysis of biological samples. For example, it has been applied to distinguish healthy and osteoarthritic human cartilage. This chapter discusses ToF-SIMS principle and instrumentation including the three modes of analysis in ToF-SIMS. ToF-SIMS sets certain requirements for the samples to be analyzed; for example, the samples have to be vacuum compatible. Accordingly, sample processing steps for different biological samples, i.e., proteins, cells, frozen and paraffin-embedded tissues and extracellular matrix for the ToF-SIMS are presented. Multivariate analysis of the ToF-SIMS data and the necessary data preprocessing steps (peak selection, data normalization, mean-centering, and scaling and transformation) are discussed in this chapter.

  17. Hydrogeochemistry and water quality of the Kordkandi-Duzduzan plain, NW Iran: application of multivariate statistical analysis and PoS index.

    PubMed

    Soltani, Shahla; Asghari Moghaddam, Asghar; Barzegar, Rahim; Kazemian, Naeimeh; Tziritis, Evangelos

    2017-08-18

    Kordkandi-Duzduzan plain is one of the fertile plains of East Azarbaijan Province, NW of Iran. Groundwater is an important resource for drinking and agricultural purposes due to the lack of surface water resources in the region. The main objectives of the present study are to identify the hydrogeochemical processes and the potential sources of major, minor, and trace metals and metalloids such as Cr, Mn, Cd, Fe, Al, and As by using joint hydrogeochemical techniques and multivariate statistical analysis and to evaluate groundwater quality deterioration with the use of PoS environmental index. To achieve these objectives, 23 groundwater samples were collected in September 2015. Piper diagram shows that the mixed Ca-Mg-Cl is the dominant groundwater type, and some of the samples have Ca-HCO 3 , Ca-Cl, and Na-Cl types. Multivariate statistical analyses indicate that weathering and dissolution of different rocks and minerals, e.g., silicates, gypsum, and halite, ion exchange, and agricultural activities influence the hydrogeochemistry of the study area. The cluster analysis divides the samples into two distinct clusters which are completely different in EC (and its dependent variables such as Na + , K + , Ca 2+ , Mg 2+ , SO 4 2- , and Cl - ), Cd, and Cr variables according to the ANOVA statistical test. Based on the median values, the concentrations of pH, NO 3 - , SiO 2 , and As in cluster 1 are elevated compared with those of cluster 2, while their maximum values occur in cluster 2. According to the PoS index, the dominant parameter that controls quality deterioration is As, with 60% of contribution. Samples of lowest PoS values are located in the southern and northern parts (recharge area) while samples of the highest values are located in the discharge area and the eastern part.

  18. Statistical scaling of pore-scale Lagrangian velocities in natural porous media.

    PubMed

    Siena, M; Guadagnini, A; Riva, M; Bijeljic, B; Pereira Nunes, J P; Blunt, M J

    2014-08-01

    We investigate the scaling behavior of sample statistics of pore-scale Lagrangian velocities in two different rock samples, Bentheimer sandstone and Estaillades limestone. The samples are imaged using x-ray computer tomography with micron-scale resolution. The scaling analysis relies on the study of the way qth-order sample structure functions (statistical moments of order q of absolute increments) of Lagrangian velocities depend on separation distances, or lags, traveled along the mean flow direction. In the sandstone block, sample structure functions of all orders exhibit a power-law scaling within a clearly identifiable intermediate range of lags. Sample structure functions associated with the limestone block display two diverse power-law regimes, which we infer to be related to two overlapping spatially correlated structures. In both rocks and for all orders q, we observe linear relationships between logarithmic structure functions of successive orders at all lags (a phenomenon that is typically known as extended power scaling, or extended self-similarity). The scaling behavior of Lagrangian velocities is compared with the one exhibited by porosity and specific surface area, which constitute two key pore-scale geometric observables. The statistical scaling of the local velocity field reflects the behavior of these geometric observables, with the occurrence of power-law-scaling regimes within the same range of lags for sample structure functions of Lagrangian velocity, porosity, and specific surface area.

  19. Analysis of chemical warfare agents. II. Use of thiols and statistical experimental design for the trace level determination of vesicant compounds in air samples.

    PubMed

    Muir, Bob; Quick, Suzanne; Slater, Ben J; Cooper, David B; Moran, Mary C; Timperley, Christopher M; Carrick, Wendy A; Burnell, Christopher K

    2005-03-18

    Thermal desorption with gas chromatography-mass spectrometry (TD-GC-MS) remains the technique of choice for analysis of trace concentrations of analytes in air samples. This paper describes the development and application of a method for analysing the vesicant compounds sulfur mustard and Lewisites I-III. 3,4-Dimercaptotoluene and butanethiol were used to spike sorbent tubes and vesicant vapours sampled; Lewisite I and II reacted with the thiols while sulfur mustard and Lewisite III did not. Statistical experimental design was used to optimise thermal desorption parameters and the optimum method used to determine vesicant compounds in headspace samples taken from a decontamination trial. 3,4-Dimercaptotoluene reacted with Lewisites I and II to give a common derivative with a limit of detection (LOD) of 260 microg m(-3), while the butanethiol gave distinct derivatives with limits of detection around 30 microg m(-3).

  20. Triacylglycerol Analysis in Human Milk and Other Mammalian Species: Small-Scale Sample Preparation, Characterization, and Statistical Classification Using HPLC-ELSD Profiles.

    PubMed

    Ten-Doménech, Isabel; Beltrán-Iturat, Eduardo; Herrero-Martínez, José Manuel; Sancho-Llopis, Juan Vicente; Simó-Alfonso, Ernesto Francisco

    2015-06-24

    In this work, a method for the separation of triacylglycerols (TAGs) present in human milk and from other mammalian species by reversed-phase high-performance liquid chromatography using a core-shell particle packed column with UV and evaporative light-scattering detectors is described. Under optimal conditions, a mobile phase containing acetonitrile/n-pentanol at 10 °C gave an excellent resolution among more than 50 TAG peaks. A small-scale method for fat extraction in these milks (particularly of interest for human milk samples) using minimal amounts of sample and reagents was also developed. The proposed extraction protocol and the traditional method were compared, giving similar results, with respect to the total fat and relative TAG contents. Finally, a statistical study based on linear discriminant analysis on the TAG composition of different types of milks (human, cow, sheep, and goat) was carried out to differentiate the samples according to their mammalian origin.

  1. Evaluation of a segment-based LANDSAT full-frame approach to corp area estimation

    NASA Technical Reports Server (NTRS)

    Bauer, M. E. (Principal Investigator); Hixson, M. M.; Davis, S. M.

    1981-01-01

    As the registration of LANDSAT full frames enters the realm of current technology, sampling methods should be examined which utilize other than the segment data used for LACIE. The effect of separating the functions of sampling for training and sampling for area estimation. The frame selected for analysis was acquired over north central Iowa on August 9, 1978. A stratification of he full-frame was defined. Training data came from segments within the frame. Two classification and estimation procedures were compared: statistics developed on one segment were used to classify that segment, and pooled statistics from the segments were used to classify a systematic sample of pixels. Comparisons to USDA/ESCS estimates illustrate that the full-frame sampling approach can provide accurate and precise area estimates.

  2. A methodological analysis of chaplaincy research: 2000-2009.

    PubMed

    Galek, Kathleen; Flannelly, Kevin J; Jankowski, Katherine R B; Handzo, George F

    2011-01-01

    The present article presents a comprehensive review and analysis of quantitative research conducted in the United States on chaplaincy and closely related topics published between 2000 and 2009. A combined search strategy identified 49 quantitative studies in 13 journals. The analysis focuses on the methodological sophistication of the studies, compared to earlier research on chaplaincy and pastoral care. Cross-sectional surveys of convenience samples still dominate the field, but sample sizes have increased somewhat over the past three decades. Reporting of the validity and reliability of measures continues to be low, although reporting of response rates has improved. Improvements in the use of inferential statistics and statistical controls were also observed, compared to previous research. The authors conclude that more experimental research is needed on chaplaincy, along with an increased use of hypothesis testing, regardless of the research designs that are used.

  3. Drinking water quality assessment.

    PubMed

    Aryal, J; Gautam, B; Sapkota, N

    2012-09-01

    Drinking water quality is the great public health concern because it is a major risk factor for high incidence of diarrheal diseases in Nepal. In the recent years, the prevalence rate of diarrhoea has been found the highest in Myagdi district. This study was carried out to assess the quality of drinking water from different natural sources, reservoirs and collection taps at Arthunge VDC of Myagdi district. A cross-sectional study was carried out using random sampling method in Arthunge VDC of Myagdi district from January to June,2010. 84 water samples representing natural sources, reservoirs and collection taps from the study area were collected. The physico-chemical and microbiological analysis was performed following standards technique set by APHA 1998 and statistical analysis was carried out using SPSS 11.5. The result was also compared with national and WHO guidelines. Out of 84 water samples (from natural source, reservoirs and tap water) analyzed, drinking water quality parameters (except arsenic and total coliform) of all water samples was found to be within the WHO standards and national standards.15.48% of water samples showed pH (13) higher than the WHO permissible guideline values. Similarly, 85.71% of water samples showed higher Arsenic value (72) than WHO value. Further, the statistical analysis showed no significant difference (P<0.05) of physico-chemical parameters and total coliform count of drinking water for collection taps water samples of winter (January, 2010) and summer (June, 2010). The microbiological examination of water samples revealed the presence of total coliform in 86.90% of water samples. The results obtained from physico-chemical analysis of water samples were within national standard and WHO standards except arsenic. The study also found the coliform contamination to be the key problem with drinking water.

  4. A Monte Carlo study of Weibull reliability analysis for space shuttle main engine components

    NASA Technical Reports Server (NTRS)

    Abernethy, K.

    1986-01-01

    The incorporation of a number of additional capabilities into an existing Weibull analysis computer program and the results of Monte Carlo computer simulation study to evaluate the usefulness of the Weibull methods using samples with a very small number of failures and extensive censoring are discussed. Since the censoring mechanism inherent in the Space Shuttle Main Engine (SSME) data is hard to analyze, it was decided to use a random censoring model, generating censoring times from a uniform probability distribution. Some of the statistical techniques and computer programs that are used in the SSME Weibull analysis are described. The methods documented in were supplemented by adding computer calculations of approximate (using iteractive methods) confidence intervals for several parameters of interest. These calculations are based on a likelihood ratio statistic which is asymptotically a chisquared statistic with one degree of freedom. The assumptions built into the computer simulations are described. The simulation program and the techniques used in it are described there also. Simulation results are tabulated for various combinations of Weibull shape parameters and the numbers of failures in the samples.

  5. BCM: toolkit for Bayesian analysis of Computational Models using samplers.

    PubMed

    Thijssen, Bram; Dijkstra, Tjeerd M H; Heskes, Tom; Wessels, Lodewyk F A

    2016-10-21

    Computational models in biology are characterized by a large degree of uncertainty. This uncertainty can be analyzed with Bayesian statistics, however, the sampling algorithms that are frequently used for calculating Bayesian statistical estimates are computationally demanding, and each algorithm has unique advantages and disadvantages. It is typically unclear, before starting an analysis, which algorithm will perform well on a given computational model. We present BCM, a toolkit for the Bayesian analysis of Computational Models using samplers. It provides efficient, multithreaded implementations of eleven algorithms for sampling from posterior probability distributions and for calculating marginal likelihoods. BCM includes tools to simplify the process of model specification and scripts for visualizing the results. The flexible architecture allows it to be used on diverse types of biological computational models. In an example inference task using a model of the cell cycle based on ordinary differential equations, BCM is significantly more efficient than existing software packages, allowing more challenging inference problems to be solved. BCM represents an efficient one-stop-shop for computational modelers wishing to use sampler-based Bayesian statistics.

  6. The problem of pseudoreplication in neuroscientific studies: is it affecting your analysis?

    PubMed Central

    2010-01-01

    Background Pseudoreplication occurs when observations are not statistically independent, but treated as if they are. This can occur when there are multiple observations on the same subjects, when samples are nested or hierarchically organised, or when measurements are correlated in time or space. Analysis of such data without taking these dependencies into account can lead to meaningless results, and examples can easily be found in the neuroscience literature. Results A single issue of Nature Neuroscience provided a number of examples and is used as a case study to highlight how pseudoreplication arises in neuroscientific studies, why the analyses in these papers are incorrect, and appropriate analytical methods are provided. 12% of papers had pseudoreplication and a further 36% were suspected of having pseudoreplication, but it was not possible to determine for certain because insufficient information was provided. Conclusions Pseudoreplication can undermine the conclusions of a statistical analysis, and it would be easier to detect if the sample size, degrees of freedom, the test statistic, and precise p-values are reported. This information should be a requirement for all publications. PMID:20074371

  7. “Magnitude-based Inference”: A Statistical Review

    PubMed Central

    Welsh, Alan H.; Knight, Emma J.

    2015-01-01

    ABSTRACT Purpose We consider “magnitude-based inference” and its interpretation by examining in detail its use in the problem of comparing two means. Methods We extract from the spreadsheets, which are provided to users of the analysis (http://www.sportsci.org/), a precise description of how “magnitude-based inference” is implemented. We compare the implemented version of the method with general descriptions of it and interpret the method in familiar statistical terms. Results and Conclusions We show that “magnitude-based inference” is not a progressive improvement on modern statistics. The additional probabilities introduced are not directly related to the confidence interval but, rather, are interpretable either as P values for two different nonstandard tests (for different null hypotheses) or as approximate Bayesian calculations, which also lead to a type of test. We also discuss sample size calculations associated with “magnitude-based inference” and show that the substantial reduction in sample sizes claimed for the method (30% of the sample size obtained from standard frequentist calculations) is not justifiable so the sample size calculations should not be used. Rather than using “magnitude-based inference,” a better solution is to be realistic about the limitations of the data and use either confidence intervals or a fully Bayesian analysis. PMID:25051387

  8. Multivariate analysis and geochemical approach for assessment of metal pollution state in sediment cores.

    PubMed

    Jamshidi-Zanjani, Ahmad; Saeedi, Mohsen

    2017-07-01

    Vertical distribution of metals (Cu, Zn, Cr, Fe, Mn, Pb, Ni, Cd, and Li) in four sediment core samples (C 1 , C 2 , C 3 , and C 4 ) from Anzali international wetland located southwest of the Caspian Sea was examined. Background concentration of each metal was calculated according to different statistical approaches. The results of multivariate statistical analysis showed that Fe and Mn might have significant role in the fate of Ni and Zn in sediment core samples. Different sediment quality indexes were utilized to assess metal pollution in sediment cores. Moreover, a new sediment quality index named aggregative toxicity index (ATI) based on sediment quality guidelines (SQGs) was developed to assess the degree of metal toxicity in an aggregative manner. The increasing pattern of metal pollution and their toxicity degree in upper layers of core samples indicated increasing effects of anthropogenic sources in the study area.

  9. The statistical analysis of global climate change studies

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hardin, J.W.

    1992-01-01

    The focus of this work is to contribute to the enhancement of the relationship between climatologists and statisticians. The analysis of global change data has been underway for many years by atmospheric scientists. Much of this analysis includes a heavy reliance on statistics and statistical inference. Some specific climatological analyses are presented and the dependence on statistics is documented before the analysis is undertaken. The first problem presented involves the fluctuation-dissipation theorem and its application to global climate models. This problem has a sound theoretical niche in the literature of both climate modeling and physics, but a statistical analysis inmore » which the data is obtained from the model to show graphically the relationship has not been undertaken. It is under this motivation that the author presents this problem. A second problem concerning the standard errors in estimating global temperatures is purely statistical in nature although very little materials exists for sampling on such a frame. This problem not only has climatological and statistical ramifications, but political ones as well. It is planned to use these results in a further analysis of global warming using actual data collected on the earth. In order to simplify the analysis of these problems, the development of a computer program, MISHA, is presented. This interactive program contains many of the routines, functions, graphics, and map projections needed by the climatologist in order to effectively enter the arena of data visualization.« less

  10. Wavelet analysis of polarization maps of polycrystalline biological fluids networks

    NASA Astrophysics Data System (ADS)

    Ushenko, Y. A.

    2011-12-01

    The optical model of human joints synovial fluid is proposed. The statistic (statistic moments), correlation (autocorrelation function) and self-similar (Log-Log dependencies of power spectrum) structure of polarization two-dimensional distributions (polarization maps) of synovial fluid has been analyzed. It has been shown that differentiation of polarization maps of joint synovial fluid with different physiological state samples is expected of scale-discriminative analysis. To mark out of small-scale domain structure of synovial fluid polarization maps, the wavelet analysis has been used. The set of parameters, which characterize statistic, correlation and self-similar structure of wavelet coefficients' distributions of different scales of polarization domains for diagnostics and differentiation of polycrystalline network transformation connected with the pathological processes, has been determined.

  11. Detection of semi-volatile organic compounds in permeable ...

    EPA Pesticide Factsheets

    Abstract The Edison Environmental Center (EEC) has a research and demonstration permeable parking lot comprised of three different permeable systems: permeable asphalt, porous concrete and interlocking concrete permeable pavers. Water quality and quantity analysis has been ongoing since January, 2010. This paper describes a subset of the water quality analysis, analysis of semivolatile organic compounds (SVOCs) to determine if hydrocarbons were in water infiltrated through the permeable surfaces. SVOCs were analyzed in samples collected from 11 dates over a 3 year period, from 2/8/2010 to 4/1/2013.Results are broadly divided into three categories: 42 chemicals were never detected; 12 chemicals (11 chemical test) were detected at a rate of less than 10% or less; and 22 chemicals were detected at a frequency of 10% or greater (ranging from 10% to 66.5% detections). Fundamental and exploratory statistical analyses were performed on these latter analyses results by grouping results by surface type. The statistical analyses were limited due to low frequency of detections and dilutions of samples which impacted detection limits. The infiltrate data through three permeable surfaces were analyzed as non-parametric data by the Kaplan-Meier estimation method for fundamental statistics; there were some statistically observable difference in concentration between pavement types when using Tarone-Ware Comparison Hypothesis Test. Additionally Spearman Rank order non-parame

  12. Experimental Design in Clinical 'Omics Biomarker Discovery.

    PubMed

    Forshed, Jenny

    2017-11-03

    This tutorial highlights some issues in the experimental design of clinical 'omics biomarker discovery, how to avoid bias and get as true quantities as possible from biochemical analyses, and how to select samples to improve the chance of answering the clinical question at issue. This includes the importance of defining clinical aim and end point, knowing the variability in the results, randomization of samples, sample size, statistical power, and how to avoid confounding factors by including clinical data in the sample selection, that is, how to avoid unpleasant surprises at the point of statistical analysis. The aim of this Tutorial is to help translational clinical and preclinical biomarker candidate research and to improve the validity and potential of future biomarker candidate findings.

  13. Use of the Analysis of the Volatile Faecal Metabolome in Screening for Colorectal Cancer

    PubMed Central

    2015-01-01

    Diagnosis of colorectal cancer is an invasive and expensive colonoscopy, which is usually carried out after a positive screening test. Unfortunately, existing screening tests lack specificity and sensitivity, hence many unnecessary colonoscopies are performed. Here we report on a potential new screening test for colorectal cancer based on the analysis of volatile organic compounds (VOCs) in the headspace of faecal samples. Faecal samples were obtained from subjects who had a positive faecal occult blood sample (FOBT). Subjects subsequently had colonoscopies performed to classify them into low risk (non-cancer) and high risk (colorectal cancer) groups. Volatile organic compounds were analysed by selected ion flow tube mass spectrometry (SIFT-MS) and then data were analysed using both univariate and multivariate statistical methods. Ions most likely from hydrogen sulphide, dimethyl sulphide and dimethyl disulphide are statistically significantly higher in samples from high risk rather than low risk subjects. Results using multivariate methods show that the test gives a correct classification of 75% with 78% specificity and 72% sensitivity on FOBT positive samples, offering a potentially effective alternative to FOBT. PMID:26086914

  14. Temporal and spatial assessment of river surface water quality using multivariate statistical techniques: a study in Can Tho City, a Mekong Delta area, Vietnam.

    PubMed

    Phung, Dung; Huang, Cunrui; Rutherford, Shannon; Dwirahmadi, Febi; Chu, Cordia; Wang, Xiaoming; Nguyen, Minh; Nguyen, Nga Huy; Do, Cuong Manh; Nguyen, Trung Hieu; Dinh, Tuan Anh Diep

    2015-05-01

    The present study is an evaluation of temporal/spatial variations of surface water quality using multivariate statistical techniques, comprising cluster analysis (CA), principal component analysis (PCA), factor analysis (FA) and discriminant analysis (DA). Eleven water quality parameters were monitored at 38 different sites in Can Tho City, a Mekong Delta area of Vietnam from 2008 to 2012. Hierarchical cluster analysis grouped the 38 sampling sites into three clusters, representing mixed urban-rural areas, agricultural areas and industrial zone. FA/PCA resulted in three latent factors for the entire research location, three for cluster 1, four for cluster 2, and four for cluster 3 explaining 60, 60.2, 80.9, and 70% of the total variance in the respective water quality. The varifactors from FA indicated that the parameters responsible for water quality variations are related to erosion from disturbed land or inflow of effluent from sewage plants and industry, discharges from wastewater treatment plants and domestic wastewater, agricultural activities and industrial effluents, and contamination by sewage waste with faecal coliform bacteria through sewer and septic systems. Discriminant analysis (DA) revealed that nephelometric turbidity units (NTU), chemical oxygen demand (COD) and NH₃ are the discriminating parameters in space, affording 67% correct assignation in spatial analysis; pH and NO₂ are the discriminating parameters according to season, assigning approximately 60% of cases correctly. The findings suggest a possible revised sampling strategy that can reduce the number of sampling sites and the indicator parameters responsible for large variations in water quality. This study demonstrates the usefulness of multivariate statistical techniques for evaluation of temporal/spatial variations in water quality assessment and management.

  15. Statistical assessment on a combined analysis of GRYN-ROMN-UCBN upland vegetation vital signs

    USGS Publications Warehouse

    Irvine, Kathryn M.; Rodhouse, Thomas J.

    2014-01-01

    As of 2013, Rocky Mountain and Upper Columbia Basin Inventory and Monitoring Networks have multiple years of vegetation data and Greater Yellowstone Network has three years of vegetation data and monitoring is ongoing in all three networks. Our primary objective is to assess whether a combined analysis of these data aimed at exploring correlations with climate and weather data is feasible. We summarize the core survey design elements across protocols and point out the major statistical challenges for a combined analysis at present. The dissimilarity in response designs between ROMN and UCBN-GRYN network protocols presents a statistical challenge that has not been resolved yet. However, the UCBN and GRYN data are compatible as they implement a similar response design; therefore, a combined analysis is feasible and will be pursued in future. When data collected by different networks are combined, the survey design describing the merged dataset is (likely) a complex survey design. A complex survey design is the result of combining datasets from different sampling designs. A complex survey design is characterized by unequal probability sampling, varying stratification, and clustering (see Lohr 2010 Chapter 7 for general overview). Statistical analysis of complex survey data requires modifications to standard methods, one of which is to include survey design weights within a statistical model. We focus on this issue for a combined analysis of upland vegetation from these networks, leaving other topics for future research. We conduct a simulation study on the possible effects of equal versus unequal probability selection of points on parameter estimates of temporal trend using available packages within the R statistical computing package. We find that, as written, using lmer or lm for trend detection in a continuous response and clm and clmm for visually estimated cover classes with “raw” GRTS design weights specified for the weight argument leads to substantially different results and/or computational instability. However, when only fixed effects are of interest, the survey package (svyglm and svyolr) may be suitable for a model-assisted analysis for trend. We provide possible directions for future research into combined analysis for ordinal and continuous vital sign indictors.

  16. Hair Mineral Analysis and Disruptive Behavior in Clinically Normal Young Men.

    ERIC Educational Resources Information Center

    Struempler, Richard E.; And Others

    1985-01-01

    Forty young navy recruits were selected for hair mineral analysis on the basis of three criteria: mental test scores, demerits during training, and premature discharge from the navy. Statistical analysis revealed several significant relationships between behavioral criteria and mineral measures. Findings confirmed, in a nonclinical sample, hair…

  17. Soils element activities for the period October 1973--September 1974

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Fowler, E.B.; Essington, E.H.; White, M.G.

    Soils Element activities were conducted on behalf of the U. S. Atomic Energy Commission's Nevada Applied Ecology Group (NAEG) program to provide source term information for the other program elements and maintain continuous cognizance of program requirements for sampling, sample preparation, and analysis. Activities included presentation of papers; participation in workshops; analysis of soil, vegetation, and animal tissue samples for $sup 238$Pu, $sup 239-240$Pu, $sup 241$Am, $sup 137$Cs, $sup 60$Co, and gamma scan for routine and laboratory quality control purposes; preparation and analysis of animal tissue samples for NAEG laboratory certification; studies on a number of analytical, sample preparation, andmore » sample collection procedures; and contributions to the evaluation of procedures for calculation of specialized counting statistics. (auth)« less

  18. Multivariate analysis: greater insights into complex systems

    USDA-ARS?s Scientific Manuscript database

    Many agronomic researchers measure and collect multiple response variables in an effort to understand the more complex nature of the system being studied. Multivariate (MV) statistical methods encompass the simultaneous analysis of all random variables (RV) measured on each experimental or sampling ...

  19. Compound-Specific Isotope Analysis of Diesel Fuels in a Forensic Investigation

    NASA Astrophysics Data System (ADS)

    Muhammad, Syahidah; Frew, Russell; Hayman, Alan

    2015-02-01

    Compound-specific isotope analysis (CSIA) offers great potential as a tool to provide chemical evidence in a forensic investigation. Many attempts to trace environmental oil spills were successful where isotopic values were particularly distinct. However, difficulties arise when a large data set is analyzed and the isotopic differences between samples are subtle. In the present study, discrimination of diesel oils involved in a diesel theft case was carried out to infer the relatedness of the samples to potential source samples. This discriminatory analysis used a suite of hydrocarbon diagnostic indices, alkanes, to generate carbon and hydrogen isotopic data of the compositions of the compounds which were then processed using multivariate statistical analyses to infer the relatedness of the data set. The results from this analysis were put into context by comparing the data with the δ13C and δ2H of alkanes in commercial diesel samples obtained from various locations in the South Island of New Zealand. Based on the isotopic character of the alkanes, it is suggested that diesel fuels involved in the diesel theft case were distinguishable. This manuscript shows that CSIA when used in tandem with multivariate statistical analysis provide a defensible means to differentiate and source-apportion qualitatively similar oils at the molecular level. This approach was able to overcome confounding challenges posed by the near single-point source of origin i.e. the very subtle differences in isotopic values between the samples.

  20. Quantifying Morphological Features of α-U3O8 with Image Analysis for Nuclear Forensics.

    PubMed

    Olsen, Adam M; Richards, Bryony; Schwerdt, Ian; Heffernan, Sean; Lusk, Robert; Smith, Braxton; Jurrus, Elizabeth; Ruggiero, Christy; McDonald, Luther W

    2017-03-07

    Morphological changes in U 3 O 8 based on calcination temperature have been quantified enabling a morphological feature to serve as a signature of processing history in nuclear forensics. Five separate calcination temperatures were used to synthesize α-U 3 O 8 , and each sample was characterized using powder X-ray diffraction (p-XRD) and scanning electron microscopy (SEM). The p-XRD spectra were used to evaluate the purity of the synthesized U-oxide; the morphological analysis for materials (MAMA) software was utilized to quantitatively characterize the particle shape and size as indicated by the SEM images. Analysis comparing the particle attributes, such as particle area at each of the temperatures, was completed using the Kolmogorov-Smirnov two sample test (K-S test). These results illustrate a distinct statistical difference between each calcination temperature. To provide a framework for forensic analysis of an unknown sample, the sample distributions at each temperature were compared to randomly selected distributions (100, 250, 500, and 750 particles) from each synthesized temperature to determine if they were statistically different. It was found that 750 particles were required to differentiate between all of the synthesized temperatures with a confidence interval of 99.0%. Results from this study provide the first quantitative morphological study of U-oxides, and reveals the potential strength of morphological particle analysis in nuclear forensics by providing a framework for a more rapid characterization of interdicted uranium oxide samples.

  1. Compound-specific isotope analysis of diesel fuels in a forensic investigation

    PubMed Central

    Muhammad, Syahidah A.; Frew, Russell D.; Hayman, Alan R.

    2015-01-01

    Compound-specific isotope analysis (CSIA) offers great potential as a tool to provide chemical evidence in a forensic investigation. Many attempts to trace environmental oil spills were successful where isotopic values were particularly distinct. However, difficulties arise when a large data set is analyzed and the isotopic differences between samples are subtle. In the present study, discrimination of diesel oils involved in a diesel theft case was carried out to infer the relatedness of the samples to potential source samples. This discriminatory analysis used a suite of hydrocarbon diagnostic indices, alkanes, to generate carbon and hydrogen isotopic data of the compositions of the compounds which were then processed using multivariate statistical analyses to infer the relatedness of the data set. The results from this analysis were put into context by comparing the data with the δ13C and δ2H of alkanes in commercial diesel samples obtained from various locations in the South Island of New Zealand. Based on the isotopic character of the alkanes, it is suggested that diesel fuels involved in the diesel theft case were distinguishable. This manuscript shows that CSIA when used in tandem with multivariate statistical analysis provide a defensible means to differentiate and source-apportion qualitatively similar oils at the molecular level. This approach was able to overcome confounding challenges posed by the near single-point source of origin, i.e., the very subtle differences in isotopic values between the samples. PMID:25774366

  2. A statistical analysis of seat belt effectiveness in 1973-1975 model cars involved in towaway crashes. Volume 1

    DOT National Transportation Integrated Search

    1976-09-01

    Standardized injury rates and seat belt effectiveness measures are derived from a probability sample of towaway accidents involving 1973-1975 model cars. The data were collected in five different geographic regions. Weighted sample size available for...

  3. Scaling up to address data science challenges

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wendelberger, Joanne R.

    Statistics and Data Science provide a variety of perspectives and technical approaches for exploring and understanding Big Data. Partnerships between scientists from different fields such as statistics, machine learning, computer science, and applied mathematics can lead to innovative approaches for addressing problems involving increasingly large amounts of data in a rigorous and effective manner that takes advantage of advances in computing. Here, this article will explore various challenges in Data Science and will highlight statistical approaches that can facilitate analysis of large-scale data including sampling and data reduction methods, techniques for effective analysis and visualization of large-scale simulations, and algorithmsmore » and procedures for efficient processing.« less

  4. Scaling up to address data science challenges

    DOE PAGES

    Wendelberger, Joanne R.

    2017-04-27

    Statistics and Data Science provide a variety of perspectives and technical approaches for exploring and understanding Big Data. Partnerships between scientists from different fields such as statistics, machine learning, computer science, and applied mathematics can lead to innovative approaches for addressing problems involving increasingly large amounts of data in a rigorous and effective manner that takes advantage of advances in computing. Here, this article will explore various challenges in Data Science and will highlight statistical approaches that can facilitate analysis of large-scale data including sampling and data reduction methods, techniques for effective analysis and visualization of large-scale simulations, and algorithmsmore » and procedures for efficient processing.« less

  5. Statistical Analysis of Compressive and Flexural Test Results on the Sustainable Adobe Reinforced with Steel Wire Mesh

    NASA Astrophysics Data System (ADS)

    Jokhio, Gul A.; Syed Mohsin, Sharifah M.; Gul, Yasmeen

    2018-04-01

    It has been established that Adobe provides, in addition to being sustainable and economic, a better indoor air quality without spending extensive amounts of energy as opposed to the modern synthetic materials. The material, however, suffers from weak structural behaviour when subjected to adverse loading conditions. A wide range of mechanical properties has been reported in literature owing to lack of research and standardization. The present paper presents the statistical analysis of the results that were obtained through compressive and flexural tests on Adobe samples. Adobe specimens with and without wire mesh reinforcement were tested and the results were reported. The statistical analysis of these results presents an interesting read. It has been found that the compressive strength of adobe increases by about 43% after adding a single layer of wire mesh reinforcement. This increase is statistically significant. The flexural response of Adobe has also shown improvement with the addition of wire mesh reinforcement, however, the statistical significance of the same cannot be established.

  6. Total RNA Sequencing Analysis of DCIS Progressing to Invasive Breast Cancer

    DTIC Science & Technology

    2015-09-01

    EPICOPY to obtain reliable copy number variation ( CNV ) data from the methylome array data, thereby decreasing the DNA requirements in half...in the R statistical environment. Samples were assessed for good performance on the array using detection p-values, a metric implemented by...Illumina to identify probes detected with confidence. Samples less than 90% of probes detected were removed from the analysis and probes undetected in any

  7. Chemical Species, Micromorphology, and XRD Fingerprint Analysis of Tibetan Medicine Zuotai Containing Mercury

    PubMed Central

    Li, Cen; Yang, Hongxia; Xiao, Yuancan; Zhandui; Sanglao; Wang, Zhang; Ladan, Duojie; Bi, Hongtao

    2016-01-01

    Zuotai (gTso thal) is one of the famous drugs containing mercury in Tibetan medicine. However, little is known about the chemical substance basis of its pharmacodynamics and the intrinsic link of different samples sources so far. Given this, energy dispersive spectrometry of X-ray (EDX), scanning electron microscopy (SEM), atomic force microscopy (AFM), and powder X-ray diffraction (XRD) were used to assay the elements, micromorphology, and phase composition of nine Zuotai samples from different regions, respectively; the XRD fingerprint features of Zuotai were analyzed by multivariate statistical analysis. EDX result shows that Zuotai contains Hg, S, O, Fe, Al, Cu, and other elements. SEM and AFM observations suggest that Zuotai is a kind of ancient nanodrug. Its particles are mainly in the range of 100–800 nm, which commonly further aggregate into 1–30 μm loosely amorphous particles. XRD test shows that β-HgS, S8, and α-HgS are its main phase compositions. XRD fingerprint analysis indicates that the similarity degrees of nine samples are very high, and the results of multivariate statistical analysis are broadly consistent with sample sources. The present research has revealed the physicochemical characteristics of Zuotai, and it would play a positive role in interpreting this mysterious Tibetan drug. PMID:27738409

  8. Chemical Species, Micromorphology, and XRD Fingerprint Analysis of Tibetan Medicine Zuotai Containing Mercury.

    PubMed

    Li, Cen; Yang, Hongxia; Du, Yuzhi; Xiao, Yuancan; Zhandui; Sanglao; Wang, Zhang; Ladan, Duojie; Bi, Hongtao; Wei, Lixin

    2016-01-01

    Zuotai ( gTso thal ) is one of the famous drugs containing mercury in Tibetan medicine. However, little is known about the chemical substance basis of its pharmacodynamics and the intrinsic link of different samples sources so far. Given this, energy dispersive spectrometry of X-ray (EDX), scanning electron microscopy (SEM), atomic force microscopy (AFM), and powder X-ray diffraction (XRD) were used to assay the elements, micromorphology, and phase composition of nine Zuotai samples from different regions, respectively; the XRD fingerprint features of Zuotai were analyzed by multivariate statistical analysis. EDX result shows that Zuotai contains Hg, S, O, Fe, Al, Cu, and other elements. SEM and AFM observations suggest that Zuotai is a kind of ancient nanodrug. Its particles are mainly in the range of 100-800 nm, which commonly further aggregate into 1-30  μ m loosely amorphous particles. XRD test shows that β -HgS, S 8 , and α -HgS are its main phase compositions. XRD fingerprint analysis indicates that the similarity degrees of nine samples are very high, and the results of multivariate statistical analysis are broadly consistent with sample sources. The present research has revealed the physicochemical characteristics of Zuotai , and it would play a positive role in interpreting this mysterious Tibetan drug.

  9. Four hundred or more participants needed for stable contingency table estimates of clinical prediction rule performance.

    PubMed

    Kent, Peter; Boyle, Eleanor; Keating, Jennifer L; Albert, Hanne B; Hartvigsen, Jan

    2017-02-01

    To quantify variability in the results of statistical analyses based on contingency tables and discuss the implications for the choice of sample size for studies that derive clinical prediction rules. An analysis of three pre-existing sets of large cohort data (n = 4,062-8,674) was performed. In each data set, repeated random sampling of various sample sizes, from n = 100 up to n = 2,000, was performed 100 times at each sample size and the variability in estimates of sensitivity, specificity, positive and negative likelihood ratios, posttest probabilities, odds ratios, and risk/prevalence ratios for each sample size was calculated. There were very wide, and statistically significant, differences in estimates derived from contingency tables from the same data set when calculated in sample sizes below 400 people, and typically, this variability stabilized in samples of 400-600 people. Although estimates of prevalence also varied significantly in samples below 600 people, that relationship only explains a small component of the variability in these statistical parameters. To reduce sample-specific variability, contingency tables should consist of 400 participants or more when used to derive clinical prediction rules or test their performance. Copyright © 2016 Elsevier Inc. All rights reserved.

  10. Statistical parsimony networks and species assemblages in Cephalotrichid nemerteans (nemertea).

    PubMed

    Chen, Haixia; Strand, Malin; Norenburg, Jon L; Sun, Shichun; Kajihara, Hiroshi; Chernyshev, Alexey V; Maslakova, Svetlana A; Sundberg, Per

    2010-09-21

    It has been suggested that statistical parsimony network analysis could be used to get an indication of species represented in a set of nucleotide data, and the approach has been used to discuss species boundaries in some taxa. Based on 635 base pairs of the mitochondrial protein-coding gene cytochrome c oxidase I (COI), we analyzed 152 nemertean specimens using statistical parsimony network analysis with the connection probability set to 95%. The analysis revealed 15 distinct networks together with seven singletons. Statistical parsimony yielded three networks supporting the species status of Cephalothrix rufifrons, C. major and C. spiralis as they currently have been delineated by morphological characters and geographical location. Many other networks contained haplotypes from nearby geographical locations. Cladistic structure by maximum likelihood analysis overall supported the network analysis, but indicated a false positive result where subnetworks should have been connected into one network/species. This probably is caused by undersampling of the intraspecific haplotype diversity. Statistical parsimony network analysis provides a rapid and useful tool for detecting possible undescribed/cryptic species among cephalotrichid nemerteans based on COI gene. It should be combined with phylogenetic analysis to get indications of false positive results, i.e., subnetworks that would have been connected with more extensive haplotype sampling.

  11. Methodological approaches in analysing observational data: A practical example on how to address clustering and selection bias.

    PubMed

    Trutschel, Diana; Palm, Rebecca; Holle, Bernhard; Simon, Michael

    2017-11-01

    Because not every scientific question on effectiveness can be answered with randomised controlled trials, research methods that minimise bias in observational studies are required. Two major concerns influence the internal validity of effect estimates: selection bias and clustering. Hence, to reduce the bias of the effect estimates, more sophisticated statistical methods are needed. To introduce statistical approaches such as propensity score matching and mixed models into representative real-world analysis and to conduct the implementation in statistical software R to reproduce the results. Additionally, the implementation in R is presented to allow the results to be reproduced. We perform a two-level analytic strategy to address the problems of bias and clustering: (i) generalised models with different abilities to adjust for dependencies are used to analyse binary data and (ii) the genetic matching and covariate adjustment methods are used to adjust for selection bias. Hence, we analyse the data from two population samples, the sample produced by the matching method and the full sample. The different analysis methods in this article present different results but still point in the same direction. In our example, the estimate of the probability of receiving a case conference is higher in the treatment group than in the control group. Both strategies, genetic matching and covariate adjustment, have their limitations but complement each other to provide the whole picture. The statistical approaches were feasible for reducing bias but were nevertheless limited by the sample used. For each study and obtained sample, the pros and cons of the different methods have to be weighted. Copyright © 2017 The Author(s). Published by Elsevier Ltd.. All rights reserved.

  12. Rapid Classification and Identification of Multiple Microorganisms with Accurate Statistical Significance via High-Resolution Tandem Mass Spectrometry

    NASA Astrophysics Data System (ADS)

    Alves, Gelio; Wang, Guanghui; Ogurtsov, Aleksey Y.; Drake, Steven K.; Gucek, Marjan; Sacks, David B.; Yu, Yi-Kuo

    2018-06-01

    Rapid and accurate identification and classification of microorganisms is of paramount importance to public health and safety. With the advance of mass spectrometry (MS) technology, the speed of identification can be greatly improved. However, the increasing number of microbes sequenced is complicating correct microbial identification even in a simple sample due to the large number of candidates present. To properly untwine candidate microbes in samples containing one or more microbes, one needs to go beyond apparent morphology or simple "fingerprinting"; to correctly prioritize the candidate microbes, one needs to have accurate statistical significance in microbial identification. We meet these challenges by using peptide-centric representations of microbes to better separate them and by augmenting our earlier analysis method that yields accurate statistical significance. Here, we present an updated analysis workflow that uses tandem MS (MS/MS) spectra for microbial identification or classification. We have demonstrated, using 226 MS/MS publicly available data files (each containing from 2500 to nearly 100,000 MS/MS spectra) and 4000 additional MS/MS data files, that the updated workflow can correctly identify multiple microbes at the genus and often the species level for samples containing more than one microbe. We have also shown that the proposed workflow computes accurate statistical significances, i.e., E values for identified peptides and unified E values for identified microbes. Our updated analysis workflow MiCId, a freely available software for Microorganism Classification and Identification, is available for download at https://www.ncbi.nlm.nih.gov/CBBresearch/Yu/downloads.html.

  13. Rapid Classification and Identification of Multiple Microorganisms with Accurate Statistical Significance via High-Resolution Tandem Mass Spectrometry.

    PubMed

    Alves, Gelio; Wang, Guanghui; Ogurtsov, Aleksey Y; Drake, Steven K; Gucek, Marjan; Sacks, David B; Yu, Yi-Kuo

    2018-06-05

    Rapid and accurate identification and classification of microorganisms is of paramount importance to public health and safety. With the advance of mass spectrometry (MS) technology, the speed of identification can be greatly improved. However, the increasing number of microbes sequenced is complicating correct microbial identification even in a simple sample due to the large number of candidates present. To properly untwine candidate microbes in samples containing one or more microbes, one needs to go beyond apparent morphology or simple "fingerprinting"; to correctly prioritize the candidate microbes, one needs to have accurate statistical significance in microbial identification. We meet these challenges by using peptide-centric representations of microbes to better separate them and by augmenting our earlier analysis method that yields accurate statistical significance. Here, we present an updated analysis workflow that uses tandem MS (MS/MS) spectra for microbial identification or classification. We have demonstrated, using 226 MS/MS publicly available data files (each containing from 2500 to nearly 100,000 MS/MS spectra) and 4000 additional MS/MS data files, that the updated workflow can correctly identify multiple microbes at the genus and often the species level for samples containing more than one microbe. We have also shown that the proposed workflow computes accurate statistical significances, i.e., E values for identified peptides and unified E values for identified microbes. Our updated analysis workflow MiCId, a freely available software for Microorganism Classification and Identification, is available for download at https://www.ncbi.nlm.nih.gov/CBBresearch/Yu/downloads.html . Graphical Abstract ᅟ.

  14. Optimized Design and Analysis of Sparse-Sampling fMRI Experiments

    PubMed Central

    Perrachione, Tyler K.; Ghosh, Satrajit S.

    2013-01-01

    Sparse-sampling is an important methodological advance in functional magnetic resonance imaging (fMRI), in which silent delays are introduced between MR volume acquisitions, allowing for the presentation of auditory stimuli without contamination by acoustic scanner noise and for overt vocal responses without motion-induced artifacts in the functional time series. As such, the sparse-sampling technique has become a mainstay of principled fMRI research into the cognitive and systems neuroscience of speech, language, hearing, and music. Despite being in use for over a decade, there has been little systematic investigation of the acquisition parameters, experimental design considerations, and statistical analysis approaches that bear on the results and interpretation of sparse-sampling fMRI experiments. In this report, we examined how design and analysis choices related to the duration of repetition time (TR) delay (an acquisition parameter), stimulation rate (an experimental design parameter), and model basis function (an analysis parameter) act independently and interactively to affect the neural activation profiles observed in fMRI. First, we conducted a series of computational simulations to explore the parameter space of sparse design and analysis with respect to these variables; second, we validated the results of these simulations in a series of sparse-sampling fMRI experiments. Overall, these experiments suggest the employment of three methodological approaches that can, in many situations, substantially improve the detection of neurophysiological response in sparse fMRI: (1) Sparse analyses should utilize a physiologically informed model that incorporates hemodynamic response convolution to reduce model error. (2) The design of sparse fMRI experiments should maintain a high rate of stimulus presentation to maximize effect size. (3) TR delays of short to intermediate length can be used between acquisitions of sparse-sampled functional image volumes to increase the number of samples and improve statistical power. PMID:23616742

  15. Optimized design and analysis of sparse-sampling FMRI experiments.

    PubMed

    Perrachione, Tyler K; Ghosh, Satrajit S

    2013-01-01

    Sparse-sampling is an important methodological advance in functional magnetic resonance imaging (fMRI), in which silent delays are introduced between MR volume acquisitions, allowing for the presentation of auditory stimuli without contamination by acoustic scanner noise and for overt vocal responses without motion-induced artifacts in the functional time series. As such, the sparse-sampling technique has become a mainstay of principled fMRI research into the cognitive and systems neuroscience of speech, language, hearing, and music. Despite being in use for over a decade, there has been little systematic investigation of the acquisition parameters, experimental design considerations, and statistical analysis approaches that bear on the results and interpretation of sparse-sampling fMRI experiments. In this report, we examined how design and analysis choices related to the duration of repetition time (TR) delay (an acquisition parameter), stimulation rate (an experimental design parameter), and model basis function (an analysis parameter) act independently and interactively to affect the neural activation profiles observed in fMRI. First, we conducted a series of computational simulations to explore the parameter space of sparse design and analysis with respect to these variables; second, we validated the results of these simulations in a series of sparse-sampling fMRI experiments. Overall, these experiments suggest the employment of three methodological approaches that can, in many situations, substantially improve the detection of neurophysiological response in sparse fMRI: (1) Sparse analyses should utilize a physiologically informed model that incorporates hemodynamic response convolution to reduce model error. (2) The design of sparse fMRI experiments should maintain a high rate of stimulus presentation to maximize effect size. (3) TR delays of short to intermediate length can be used between acquisitions of sparse-sampled functional image volumes to increase the number of samples and improve statistical power.

  16. Color quality of pigments in cochineals (Dactylopius coccus Costa). Geographical origin characterization using multivariate statistical analysis.

    PubMed

    Méndez, Jesús; González, Mónica; Lobo, M Gloria; Carnero, Aurelio

    2004-03-10

    The commercial value of a cochineal (Dactylopius coccus Costa) sample is associated with its color quality. Because the cochineal is a legal food colorant, its color quality is generally understood as its pigment content. Simply put, the higher this content, the more valuable the sample is to the market. In an effort to devise a way to measure the color quality of a cochineal, the present study evaluates different parameters of color measurement such as chromatic attributes (L*, and a*), percentage of carminic acid, tint determination, and chromatographic profile of pigments. Tint determination did not achieve this objective because this parameter does not correlate with carminic acid content. On the other hand, carminic acid showed a highly significant correlation (r = - 0.922, p = 0.000) with L* values determined from powdered cochineal samples. The combination of the information from the spectrophotometric determination of carminic acid with that of the pigment profile acquired by liquid chromatography (LC) and the composition of the red and yellow pigment groups, also acquired by LC, enables greater accuracy in judging the quality of the final sample. As a result of this study, it was possible to achieve the separation of cochineal samples according to geographical origin using two statistical techniques: cluster analysis and principal component analysis.

  17. Classification of Malaysia aromatic rice using multivariate statistical analysis

    NASA Astrophysics Data System (ADS)

    Abdullah, A. H.; Adom, A. H.; Shakaff, A. Y. Md; Masnan, M. J.; Zakaria, A.; Rahim, N. A.; Omar, O.

    2015-05-01

    Aromatic rice (Oryza sativa L.) is considered as the best quality premium rice. The varieties are preferred by consumers because of its preference criteria such as shape, colour, distinctive aroma and flavour. The price of aromatic rice is higher than ordinary rice due to its special needed growth condition for instance specific climate and soil. Presently, the aromatic rice quality is identified by using its key elements and isotopic variables. The rice can also be classified via Gas Chromatography Mass Spectrometry (GC-MS) or human sensory panels. However, the uses of human sensory panels have significant drawbacks such as lengthy training time, and prone to fatigue as the number of sample increased and inconsistent. The GC-MS analysis techniques on the other hand, require detailed procedures, lengthy analysis and quite costly. This paper presents the application of in-house developed Electronic Nose (e-nose) to classify new aromatic rice varieties. The e-nose is used to classify the variety of aromatic rice based on the samples odour. The samples were taken from the variety of rice. The instrument utilizes multivariate statistical data analysis, including Principal Component Analysis (PCA), Linear Discriminant Analysis (LDA) and K-Nearest Neighbours (KNN) to classify the unknown rice samples. The Leave-One-Out (LOO) validation approach is applied to evaluate the ability of KNN to perform recognition and classification of the unspecified samples. The visual observation of the PCA and LDA plots of the rice proves that the instrument was able to separate the samples into different clusters accordingly. The results of LDA and KNN with low misclassification error support the above findings and we may conclude that the e-nose is successfully applied to the classification of the aromatic rice varieties.

  18. New features added to EVALIDator: ratio estimation and county choropleth maps

    Treesearch

    Patrick D. Miles; Mark H. Hansen

    2012-01-01

    The EVALIDator Web application, developed in 2007, provides estimates and sampling errors for many user selected forest statistics from the Forest Inventory and Analysis Database (FIADB). Among the statistics estimated are forest area, number of trees, biomass, volume, growth, removals, and mortality. A new release of EVALIDator, developed in 2012, has an option to...

  19. Batch reporting of forest inventory statistics using the EVALIDator

    Treesearch

    Patrick D. Miles

    2015-01-01

    The EVALIDator Web application, developed in 2007, provides estimates and sampling errors of forest statistics (e.g., forest area, number of trees, tree biomass) from data stored in the Forest Inventory and Analysis database. In response to user demand, new features have been added to the EVALIDator. The most recent additions are 1) the ability to generate multiple...

  20. A critique of Rasch residual fit statistics.

    PubMed

    Karabatsos, G

    2000-01-01

    In test analysis involving the Rasch model, a large degree of importance is placed on the "objective" measurement of individual abilities and item difficulties. The degree to which the objectivity properties are attained, of course, depends on the degree to which the data fit the Rasch model. It is therefore important to utilize fit statistics that accurately and reliably detect the person-item response inconsistencies that threaten the measurement objectivity of persons and items. Given this argument, it is somewhat surprising that there is far more emphasis placed in the objective measurement of person and items than there is in the measurement quality of Rasch fit statistics. This paper provides a critical analysis of the residual fit statistics of the Rasch model, arguably the most often used fit statistics, in an effort to illustrate that the task of Rasch fit analysis is not as simple and straightforward as it appears to be. The faulty statistical properties of the residual fit statistics do not allow either a convenient or a straightforward approach to Rasch fit analysis. For instance, given a residual fit statistic, the use of a single minimum critical value for misfit diagnosis across different testing situations, where the situations vary in sample and test properties, leads to both the overdetection and underdetection of misfit. To improve this situation, it is argued that psychometricians need to implement residual-free Rasch fit statistics that are based on the number of Guttman response errors, or use indices that are statistically optimal in detecting measurement disturbances.

  1. Evidence for plant-derived xenomiRs based on a large-scale analysis of public small RNA sequencing data from human samples.

    PubMed

    Zhao, Qi; Liu, Yuanning; Zhang, Ning; Hu, Menghan; Zhang, Hao; Joshi, Trupti; Xu, Dong

    2018-01-01

    In recent years, an increasing number of studies have reported the presence of plant miRNAs in human samples, which resulted in a hypothesis asserting the existence of plant-derived exogenous microRNA (xenomiR). However, this hypothesis is not widely accepted in the scientific community due to possible sample contamination and the small sample size with lack of rigorous statistical analysis. This study provides a systematic statistical test that can validate (or invalidate) the plant-derived xenomiR hypothesis by analyzing 388 small RNA sequencing data from human samples in 11 types of body fluids/tissues. A total of 166 types of plant miRNAs were found in at least one human sample, of which 14 plant miRNAs represented more than 80% of the total plant miRNAs abundance in human samples. Plant miRNA profiles were characterized to be tissue-specific in different human samples. Meanwhile, the plant miRNAs identified from microbiome have an insignificant abundance compared to those from humans, while plant miRNA profiles in human samples were significantly different from those in plants, suggesting that sample contamination is an unlikely reason for all the plant miRNAs detected in human samples. This study also provides a set of testable synthetic miRNAs with isotopes that can be detected in situ after being fed to animals.

  2. ALK-FISH borderline cases in non-small cell lung cancer: Implications for diagnostics and clinical decision making.

    PubMed

    von Laffert, Maximilian; Stenzinger, Albrecht; Hummel, Michael; Weichert, Wilko; Lenze, Dido; Warth, Arne; Penzel, Roland; Herbst, Hermann; Kellner, Udo; Jurmeister, Philipp; Schirmacher, Peter; Dietel, Manfred; Klauschen, Frederick

    2015-12-01

    Fluorescence in-situ hybridization (FISH) for the detection of ALK-rearrangements in non-small cell lung cancer (NSCLC) is based on at first sight clear cut-off criteria (≥15% of tumor cells) for split signals (SS) and single red signals (SRS). However, NSCLC with SS-counts around the cut-off may cause interpretation problems. Tissue microarrays containing 753 surgically resected NSCLCs were independently tested for ALK-alterations by FISH and immunohistochemistry (IHC). Our analysis focused on samples with SS/SRS in the range between 10% and 20% (ALK-FISH borderline group). To better understand the role of these samples in routine diagnostics, we performed statistical analyses to systematically estimate the probability of ALK-FISH-misclassification (false negative or positive) for different numbers of evaluated tumor cell nuclei (30, 50, 100, and 200). 94.3% (710/753) of the cases were classified as unequivocally (<10% or ≥20%) ALK-FISH-negative (93%; 700/753) or positive (1.3%; 10/753) and showed concordant IHC results. 5.7% (43/753) of the samples showed SS/SRS between 10% and 20% of the tumor cells. Out of these, 7% (3/43; ALK-FISH: 14%, 18% and 20%) were positive by ALK-IHC, while 93% (40/43) had no detectable expression of the ALK-protein. Statistical analysis showed that ALK-FISH misclassifications occur frequently for samples with rearrangements between 10% and 20% if ALK-characterization is based on a sharp cut-off point (15%). If results in this interval are defined as equivocal (borderline), statistical sampling-related ALK-FISH misclassifications will occur in less than 1% of the cases if 100 tumor cells are evaluated. While ALK status can be determined robustly for the majority of NSCLC by FISH our analysis showed that ∼6% of the cases belong to a borderline group for which ALK-FISH evaluation has only limited reliability due to statistical sampling effects. These cases should be considered equivocal and therapy decisions should include additional tests and clinical considerations. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.

  3. A robust clustering algorithm for identifying problematic samples in genome-wide association studies.

    PubMed

    Bellenguez, Céline; Strange, Amy; Freeman, Colin; Donnelly, Peter; Spencer, Chris C A

    2012-01-01

    High-throughput genotyping arrays provide an efficient way to survey single nucleotide polymorphisms (SNPs) across the genome in large numbers of individuals. Downstream analysis of the data, for example in genome-wide association studies (GWAS), often involves statistical models of genotype frequencies across individuals. The complexities of the sample collection process and the potential for errors in the experimental assay can lead to biases and artefacts in an individual's inferred genotypes. Rather than attempting to model these complications, it has become a standard practice to remove individuals whose genome-wide data differ from the sample at large. Here we describe a simple, but robust, statistical algorithm to identify samples with atypical summaries of genome-wide variation. Its use as a semi-automated quality control tool is demonstrated using several summary statistics, selected to identify different potential problems, and it is applied to two different genotyping platforms and sample collections. The algorithm is written in R and is freely available at www.well.ox.ac.uk/chris-spencer chris.spencer@well.ox.ac.uk Supplementary data are available at Bioinformatics online.

  4. Creamatocrit analysis of human milk overestimates fat and energy content when compared to a human milk analyzer using mid-infrared spectroscopy.

    PubMed

    O'Neill, Edward F; Radmacher, Paula G; Sparks, Blake; Adamkin, David H

    2013-05-01

    Human milk (HM) is the preferred feeding for human infants but may be inadequate to support the rapid growth of the very-low-birth-weight infant. The creamatocrit (CMCT) has been widely used to guide health care professionals as they analyze HM fortification; however, the CMCT method is based on an equation using assumptions for protein and carbohydrate with fat as the only measured variable. The aim of the present study was to test the hypothesis that a human milk analyzer (HMA) would provide more accurate data for fat and energy content than analysis by CMCT. Fifty-one well-mixed samples of previously frozen expressed HM were obtained after thawing. Previously assayed "control" milk samples were thawed and also run with unknowns. All milk samples were prewarmed at 40°C and then analyzed by both CMCT and HMA. CMCT fat results were substituted in the CMCT equation to reach a value for energy (kcal/oz). Fat results from HMA were entered into a computer model to reach a value for energy (kcal/oz). Fat and energy results were compared by paired t test with statistical significance set at P < 0.05. An additional 10 samples were analyzed locally by both methods and then sent to a certified laboratory for quantitative analysis. Results for fat and energy were analyzed by 1-way analysis of variance with statistical significance set at P < 0.05. Mean fat content by CMCT (5.8 ± 1.9 g/dL) was significantly higher than by HMA (3.2 ± 1.1 g/dL, P < 0.001). Mean energy by CMCT (21.8 ± 3.4 kcal/oz) was also significantly higher than by HMA (17.1 ± 2.9, P < 0.001). Comparison of biochemical analysis with HMA of the subset of milk samples showed no statistical difference for fat and energy, whereas CMCT was significantly higher than for both fat (P < 0.001) and energy (P = 0.002). The CMCT method appears to overestimate fat and energy content of HM samples when compared with HMA and biochemical methods.

  5. A new statistic for identifying batch effects in high-throughput genomic data that uses guided principal component analysis.

    PubMed

    Reese, Sarah E; Archer, Kellie J; Therneau, Terry M; Atkinson, Elizabeth J; Vachon, Celine M; de Andrade, Mariza; Kocher, Jean-Pierre A; Eckel-Passow, Jeanette E

    2013-11-15

    Batch effects are due to probe-specific systematic variation between groups of samples (batches) resulting from experimental features that are not of biological interest. Principal component analysis (PCA) is commonly used as a visual tool to determine whether batch effects exist after applying a global normalization method. However, PCA yields linear combinations of the variables that contribute maximum variance and thus will not necessarily detect batch effects if they are not the largest source of variability in the data. We present an extension of PCA to quantify the existence of batch effects, called guided PCA (gPCA). We describe a test statistic that uses gPCA to test whether a batch effect exists. We apply our proposed test statistic derived using gPCA to simulated data and to two copy number variation case studies: the first study consisted of 614 samples from a breast cancer family study using Illumina Human 660 bead-chip arrays, whereas the second case study consisted of 703 samples from a family blood pressure study that used Affymetrix SNP Array 6.0. We demonstrate that our statistic has good statistical properties and is able to identify significant batch effects in two copy number variation case studies. We developed a new statistic that uses gPCA to identify whether batch effects exist in high-throughput genomic data. Although our examples pertain to copy number data, gPCA is general and can be used on other data types as well. The gPCA R package (Available via CRAN) provides functionality and data to perform the methods in this article. reesese@vcu.edu

  6. Bioassessment Tools for Stony Corals: Monitoring Approaches and Proposed Sampling Plan for the U.S. Virgin Islands

    EPA Science Inventory

    This document describes three general approaches to the design of a sampling plan for biological monitoring of coral reefs. Status assessment, trend detection and targeted monitoring each require a different approach to site selection and statistical analysis. For status assessm...

  7. Monitoring changes in exotic vegetation

    Treesearch

    Robert D. Sutter

    1998-01-01

    Ecological monitoring provides critical information for management decisions by measuring changes in managed and unmanaged populations, communities and ecological systems. It integrates ecology, goal and objective setting, sampling design, sampling methods, and statistical analysis. It is a topic that I, with a team of Nature Conservancy ecologists, teach in a six day...

  8. Statistical Analysis of a Large Sample Size Pyroshock Test Data Set Including Post Flight Data Assessment. Revision 1

    NASA Technical Reports Server (NTRS)

    Hughes, William O.; McNelis, Anne M.

    2010-01-01

    The Earth Observing System (EOS) Terra spacecraft was launched on an Atlas IIAS launch vehicle on its mission to observe planet Earth in late 1999. Prior to launch, the new design of the spacecraft's pyroshock separation system was characterized by a series of 13 separation ground tests. The analysis methods used to evaluate this unusually large amount of shock data will be discussed in this paper, with particular emphasis on population distributions and finding statistically significant families of data, leading to an overall shock separation interface level. The wealth of ground test data also allowed a derivation of a Mission Assurance level for the flight. All of the flight shock measurements were below the EOS Terra Mission Assurance level thus contributing to the overall success of the EOS Terra mission. The effectiveness of the statistical methodology for characterizing the shock interface level and for developing a flight Mission Assurance level from a large sample size of shock data is demonstrated in this paper.

  9. A statistical comparison of two carbon fiber/epoxy fabrication techniques

    NASA Technical Reports Server (NTRS)

    Hodge, A. J.

    1991-01-01

    A statistical comparison of the compression strengths of specimens that were fabricated by either a platen press or an autoclave were performed on IM6/3501-6 carbon/epoxy composites of 16-ply (0,+45,90,-45)(sub S2) lay-up configuration. The samples were cured with the same parameters and processing materials. It was found that the autoclaved panels were thicker than the platen press cured samples. Two hundred samples of each type of cure process were compression tested. The autoclaved samples had an average strength of 450 MPa (65.5 ksi), while the press cured samples had an average strength of 370 MPa (54.0 ksi). A Weibull analysis of the data showed that there is only a 30 pct. probability that the two types of cure systems yield specimens that can be considered from the same family.

  10. PIXE analysis of elements in gastric cancer and adjacent mucosa

    NASA Astrophysics Data System (ADS)

    Liu, Qixin; Zhong, Ming; Zhang, Xiaofeng; Yan, Lingnuo; Xu, Yongling; Ye, Simao

    1990-04-01

    The elemental regional distributions in 20 resected human stomach tissues were obtained using PIXE analysis. The samples were pathologically divided into four types: normal, adjacent mucosa A, adjacent mucosa B and cancer. The targets for PIXE analysis were prepared by wet digestion with a pressure bomb system. P, K, Fe, Cu, Zn and Se were measured and statistically analysed. We found significantly higher concentrations of P, K, Cu, Zn and a higher ratio of Cu compared to Zn in cancer tissue as compared with normal tissue, but statistically no significant difference between adjacent mucosa and cancer tissue was found.

  11. Fast and accurate imputation of summary statistics enhances evidence of functional enrichment.

    PubMed

    Pasaniuc, Bogdan; Zaitlen, Noah; Shi, Huwenbo; Bhatia, Gaurav; Gusev, Alexander; Pickrell, Joseph; Hirschhorn, Joel; Strachan, David P; Patterson, Nick; Price, Alkes L

    2014-10-15

    Imputation using external reference panels (e.g. 1000 Genomes) is a widely used approach for increasing power in genome-wide association studies and meta-analysis. Existing hidden Markov models (HMM)-based imputation approaches require individual-level genotypes. Here, we develop a new method for Gaussian imputation from summary association statistics, a type of data that is becoming widely available. In simulations using 1000 Genomes (1000G) data, this method recovers 84% (54%) of the effective sample size for common (>5%) and low-frequency (1-5%) variants [increasing to 87% (60%) when summary linkage disequilibrium information is available from target samples] versus the gold standard of 89% (67%) for HMM-based imputation, which cannot be applied to summary statistics. Our approach accounts for the limited sample size of the reference panel, a crucial step to eliminate false-positive associations, and it is computationally very fast. As an empirical demonstration, we apply our method to seven case-control phenotypes from the Wellcome Trust Case Control Consortium (WTCCC) data and a study of height in the British 1958 birth cohort (1958BC). Gaussian imputation from summary statistics recovers 95% (105%) of the effective sample size (as quantified by the ratio of [Formula: see text] association statistics) compared with HMM-based imputation from individual-level genotypes at the 227 (176) published single nucleotide polymorphisms (SNPs) in the WTCCC (1958BC height) data. In addition, for publicly available summary statistics from large meta-analyses of four lipid traits, we publicly release imputed summary statistics at 1000G SNPs, which could not have been obtained using previously published methods, and demonstrate their accuracy by masking subsets of the data. We show that 1000G imputation using our approach increases the magnitude and statistical evidence of enrichment at genic versus non-genic loci for these traits, as compared with an analysis without 1000G imputation. Thus, imputation of summary statistics will be a valuable tool in future functional enrichment analyses. Publicly available software package available at http://bogdan.bioinformatics.ucla.edu/software/. bpasaniuc@mednet.ucla.edu or aprice@hsph.harvard.edu Supplementary materials are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  12. Multivariate statistical assessment of heavy metal pollution sources of groundwater around a lead and zinc plant.

    PubMed

    Zamani, Abbas Ali; Yaftian, Mohammad Reza; Parizanganeh, Abdolhossein

    2012-12-17

    The contamination of groundwater by heavy metal ions around a lead and zinc plant has been studied. As a case study groundwater contamination in Bonab Industrial Estate (Zanjan-Iran) for iron, cobalt, nickel, copper, zinc, cadmium and lead content was investigated using differential pulse polarography (DPP). Although, cobalt, copper and zinc were found correspondingly in 47.8%, 100.0%, and 100.0% of the samples, they did not contain these metals above their maximum contaminant levels (MCLs). Cadmium was detected in 65.2% of the samples and 17.4% of them were polluted by this metal. All samples contained detectable levels of lead and iron with 8.7% and 13.0% of the samples higher than their MCLs. Nickel was also found in 78.3% of the samples, out of which 8.7% were polluted. In general, the results revealed the contamination of groundwater sources in the studied zone. The higher health risks are related to lead, nickel, and cadmium ions. Multivariate statistical techniques were applied for interpreting the experimental data and giving a description for the sources. The data analysis showed correlations and similarities between investigated heavy metals and helps to classify these ion groups. Cluster analysis identified five clusters among the studied heavy metals. Cluster 1 consisted of Pb, Cu, and cluster 3 included Cd, Fe; also each of the elements Zn, Co and Ni was located in groups with single member. The same results were obtained by factor analysis. Statistical investigations revealed that anthropogenic factors and notably lead and zinc plant and pedo-geochemical pollution sources are influencing water quality in the studied area.

  13. Multivariate statistical assessment of heavy metal pollution sources of groundwater around a lead and zinc plant

    PubMed Central

    2012-01-01

    The contamination of groundwater by heavy metal ions around a lead and zinc plant has been studied. As a case study groundwater contamination in Bonab Industrial Estate (Zanjan-Iran) for iron, cobalt, nickel, copper, zinc, cadmium and lead content was investigated using differential pulse polarography (DPP). Although, cobalt, copper and zinc were found correspondingly in 47.8%, 100.0%, and 100.0% of the samples, they did not contain these metals above their maximum contaminant levels (MCLs). Cadmium was detected in 65.2% of the samples and 17.4% of them were polluted by this metal. All samples contained detectable levels of lead and iron with 8.7% and 13.0% of the samples higher than their MCLs. Nickel was also found in 78.3% of the samples, out of which 8.7% were polluted. In general, the results revealed the contamination of groundwater sources in the studied zone. The higher health risks are related to lead, nickel, and cadmium ions. Multivariate statistical techniques were applied for interpreting the experimental data and giving a description for the sources. The data analysis showed correlations and similarities between investigated heavy metals and helps to classify these ion groups. Cluster analysis identified five clusters among the studied heavy metals. Cluster 1 consisted of Pb, Cu, and cluster 3 included Cd, Fe; also each of the elements Zn, Co and Ni was located in groups with single member. The same results were obtained by factor analysis. Statistical investigations revealed that anthropogenic factors and notably lead and zinc plant and pedo-geochemical pollution sources are influencing water quality in the studied area. PMID:23369182

  14. How Big of a Problem is Analytic Error in Secondary Analyses of Survey Data?

    PubMed

    West, Brady T; Sakshaug, Joseph W; Aurelien, Guy Alain S

    2016-01-01

    Secondary analyses of survey data collected from large probability samples of persons or establishments further scientific progress in many fields. The complex design features of these samples improve data collection efficiency, but also require analysts to account for these features when conducting analysis. Unfortunately, many secondary analysts from fields outside of statistics, biostatistics, and survey methodology do not have adequate training in this area, and as a result may apply incorrect statistical methods when analyzing these survey data sets. This in turn could lead to the publication of incorrect inferences based on the survey data that effectively negate the resources dedicated to these surveys. In this article, we build on the results of a preliminary meta-analysis of 100 peer-reviewed journal articles presenting analyses of data from a variety of national health surveys, which suggested that analytic errors may be extremely prevalent in these types of investigations. We first perform a meta-analysis of a stratified random sample of 145 additional research products analyzing survey data from the Scientists and Engineers Statistical Data System (SESTAT), which describes features of the U.S. Science and Engineering workforce, and examine trends in the prevalence of analytic error across the decades used to stratify the sample. We once again find that analytic errors appear to be quite prevalent in these studies. Next, we present several example analyses of real SESTAT data, and demonstrate that a failure to perform these analyses correctly can result in substantially biased estimates with standard errors that do not adequately reflect complex sample design features. Collectively, the results of this investigation suggest that reviewers of this type of research need to pay much closer attention to the analytic methods employed by researchers attempting to publish or present secondary analyses of survey data.

  15. How Big of a Problem is Analytic Error in Secondary Analyses of Survey Data?

    PubMed Central

    West, Brady T.; Sakshaug, Joseph W.; Aurelien, Guy Alain S.

    2016-01-01

    Secondary analyses of survey data collected from large probability samples of persons or establishments further scientific progress in many fields. The complex design features of these samples improve data collection efficiency, but also require analysts to account for these features when conducting analysis. Unfortunately, many secondary analysts from fields outside of statistics, biostatistics, and survey methodology do not have adequate training in this area, and as a result may apply incorrect statistical methods when analyzing these survey data sets. This in turn could lead to the publication of incorrect inferences based on the survey data that effectively negate the resources dedicated to these surveys. In this article, we build on the results of a preliminary meta-analysis of 100 peer-reviewed journal articles presenting analyses of data from a variety of national health surveys, which suggested that analytic errors may be extremely prevalent in these types of investigations. We first perform a meta-analysis of a stratified random sample of 145 additional research products analyzing survey data from the Scientists and Engineers Statistical Data System (SESTAT), which describes features of the U.S. Science and Engineering workforce, and examine trends in the prevalence of analytic error across the decades used to stratify the sample. We once again find that analytic errors appear to be quite prevalent in these studies. Next, we present several example analyses of real SESTAT data, and demonstrate that a failure to perform these analyses correctly can result in substantially biased estimates with standard errors that do not adequately reflect complex sample design features. Collectively, the results of this investigation suggest that reviewers of this type of research need to pay much closer attention to the analytic methods employed by researchers attempting to publish or present secondary analyses of survey data. PMID:27355817

  16. Introducing 3D U-statistic method for separating anomaly from background in exploration geochemical data with associated software development

    NASA Astrophysics Data System (ADS)

    Ghannadpour, Seyyed Saeed; Hezarkhani, Ardeshir

    2016-03-01

    The U-statistic method is one of the most important structural methods to separate the anomaly from the background. It considers the location of samples and carries out the statistical analysis of the data without judging from a geochemical point of view and tries to separate subpopulations and determine anomalous areas. In the present study, to use U-statistic method in three-dimensional (3D) condition, U-statistic is applied on the grade of two ideal test examples, by considering sample Z values (elevation). So far, this is the first time that this method has been applied on a 3D condition. To evaluate the performance of 3D U-statistic method and in order to compare U-statistic with one non-structural method, the method of threshold assessment based on median and standard deviation (MSD method) is applied on the two example tests. Results show that the samples indicated by U-statistic method as anomalous are more regular and involve less dispersion than those indicated by the MSD method. So that, according to the location of anomalous samples, denser areas of them can be determined as promising zones. Moreover, results show that at a threshold of U = 0, the total error of misclassification for U-statistic method is much smaller than the total error of criteria of bar {x}+n× s. Finally, 3D model of two test examples for separating anomaly from background using 3D U-statistic method is provided. The source code for a software program, which was developed in the MATLAB programming language in order to perform the calculations of the 3D U-spatial statistic method, is additionally provided. This software is compatible with all the geochemical varieties and can be used in similar exploration projects.

  17. Multi-element fingerprinting as a tool in origin authentication of four east China marine species.

    PubMed

    Guo, Lipan; Gong, Like; Yu, Yanlei; Zhang, Hong

    2013-12-01

    The contents of 25 elements in 4 types of commercial marine species from the East China Sea were determined by inductively coupled plasma mass spectrometry and atomic absorption spectrometry. The elemental composition was used to differentiate marine species according to geographical origin by multivariate statistical analysis. The results showed that principal component analysis could distinguish samples from different areas and reveal the elements which played the most important role in origin diversity. The established models by partial least squares discriminant analysis (PLS-DA) and by probabilistic neural network (PNN) can both precisely predict the origin of the marine species. Further study indicated that PLS-DA and PNN were efficacious in regional discrimination. The models from these 2 statistical methods, with an accuracy of 97.92% and 100%, respectively, could both distinguish samples from different areas without the need for species differentiation. © 2013 Institute of Food Technologists®

  18. The analysis of morphometric data on rocky mountain wolves and artic wolves using statistical method

    NASA Astrophysics Data System (ADS)

    Ammar Shafi, Muhammad; Saifullah Rusiman, Mohd; Hamzah, Nor Shamsidah Amir; Nor, Maria Elena; Ahmad, Noor’ani; Azia Hazida Mohamad Azmi, Nur; Latip, Muhammad Faez Ab; Hilmi Azman, Ahmad

    2018-04-01

    Morphometrics is a quantitative analysis depending on the shape and size of several specimens. Morphometric quantitative analyses are commonly used to analyse fossil record, shape and size of specimens and others. The aim of the study is to find the differences between rocky mountain wolves and arctic wolves based on gender. The sample utilised secondary data which included seven variables as independent variables and two dependent variables. Statistical modelling was used in the analysis such was the analysis of variance (ANOVA) and multivariate analysis of variance (MANOVA). The results showed there exist differentiating results between arctic wolves and rocky mountain wolves based on independent factors and gender.

  19. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Woodroffe, J. R.; Brito, T. V.; Jordanova, V. K.

    In the standard practice of neutron multiplicity counting , the first three sampled factorial moments of the event triggered neutron count distribution were used to quantify the three main neutron source terms: the spontaneous fissile material effective mass, the relative (α,n) production and the induced fission source responsible for multiplication. Our study compares three methods to quantify the statistical uncertainty of the estimated mass: the bootstrap method, propagation of variance through moments, and statistical analysis of cycle data method. Each of the three methods was implemented on a set of four different NMC measurements, held at the JRC-laboratory in Ispra,more » Italy, sampling four different Pu samples in a standard Plutonium Scrap Multiplicity Counter (PSMC) well counter.« less

  20. Determination of polarimetric parameters of honey by near-infrared transflectance spectroscopy.

    PubMed

    García-Alvarez, M; Ceresuela, S; Huidobro, J F; Hermida, M; Rodríguez-Otero, J L

    2002-01-30

    NIR transflectance spectroscopy was used to determine polarimetric parameters (direct polarization, polarization after inversion, specific rotation in dry matter, and polarization due to nonmonosaccharides) and sucrose in honey. In total, 156 honey samples were collected during 1992 (45 samples), 1995 (56 samples), and 1996 (55 samples). Samples were analyzed by NIR spectroscopy and polarimetric methods. Calibration (118 samples) and validation (38 samples) sets were made up; honeys from the three years were included in both sets. Calibrations were performed by modified partial least-squares regression and scatter correction by standard normal variation and detrend methods. For direct polarization, polarization after inversion, specific rotation in dry matter, and polarization due to nonmonosaccharides, good statistics (bias, SEV, and R(2)) were obtained for the validation set, and no statistically (p = 0.05) significant differences were found between instrumental and polarimetric methods for these parameters. Statistical data for sucrose were not as good as those of the other parameters. Therefore, NIR spectroscopy is not an effective method for quantitative analysis of sucrose in these honey samples. However, NIR spectroscopy may be an acceptable method for semiquantitative evaluation of sucrose for honeys, such as those in our study, containing up to 3% of sucrose. Further work is necessary to validate the uncertainty at higher levels.

  1. A new strategy for statistical analysis-based fingerprint establishment: Application to quality assessment of Semen sojae praeparatum.

    PubMed

    Guo, Hui; Zhang, Zhen; Yao, Yuan; Liu, Jialin; Chang, Ruirui; Liu, Zhao; Hao, Hongyuan; Huang, Taohong; Wen, Jun; Zhou, Tingting

    2018-08-30

    Semen sojae praeparatum with homology of medicine and food is a famous traditional Chinese medicine. A simple and effective quality fingerprint analysis, coupled with chemometrics methods, was developed for quality assessment of Semen sojae praeparatum. First, similarity analysis (SA) and hierarchical clusting analysis (HCA) were applied to select the qualitative markers, which obviously influence the quality of Semen sojae praeparatum. 21 chemicals were selected and characterized by high resolution ion trap/time-of-flight mass spectrometry (LC-IT-TOF-MS). Subsequently, principal components analysis (PCA) and orthogonal partial least squares discriminant analysis (OPLS-DA) were conducted to select the quantitative markers of Semen sojae praeparatum samples from different origins. Moreover, 11 compounds with statistical significance were determined quantitatively, which provided an accurate and informative data for quality evaluation. This study proposes a new strategy for "statistic analysis-based fingerprint establishment", which would be a valuable reference for further study. Copyright © 2018 Elsevier Ltd. All rights reserved.

  2. [The principal components analysis--method to classify the statistical variables with applications in medicine].

    PubMed

    Dascălu, Cristina Gena; Antohe, Magda Ecaterina

    2009-01-01

    Based on the eigenvalues and the eigenvectors analysis, the principal component analysis has the purpose to identify the subspace of the main components from a set of parameters, which are enough to characterize the whole set of parameters. Interpreting the data for analysis as a cloud of points, we find through geometrical transformations the directions where the cloud's dispersion is maximal--the lines that pass through the cloud's center of weight and have a maximal density of points around them (by defining an appropriate criteria function and its minimization. This method can be successfully used in order to simplify the statistical analysis on questionnaires--because it helps us to select from a set of items only the most relevant ones, which cover the variations of the whole set of data. For instance, in the presented sample we started from a questionnaire with 28 items and, applying the principal component analysis we identified 7 principal components--or main items--fact that simplifies significantly the further data statistical analysis.

  3. Conservative Tests under Satisficing Models of Publication Bias.

    PubMed

    McCrary, Justin; Christensen, Garret; Fanelli, Daniele

    2016-01-01

    Publication bias leads consumers of research to observe a selected sample of statistical estimates calculated by producers of research. We calculate critical values for statistical significance that could help to adjust after the fact for the distortions created by this selection effect, assuming that the only source of publication bias is file drawer bias. These adjusted critical values are easy to calculate and differ from unadjusted critical values by approximately 50%-rather than rejecting a null hypothesis when the t-ratio exceeds 2, the analysis suggests rejecting a null hypothesis when the t-ratio exceeds 3. Samples of published social science research indicate that on average, across research fields, approximately 30% of published t-statistics fall between the standard and adjusted cutoffs.

  4. Conservative Tests under Satisficing Models of Publication Bias

    PubMed Central

    McCrary, Justin; Christensen, Garret; Fanelli, Daniele

    2016-01-01

    Publication bias leads consumers of research to observe a selected sample of statistical estimates calculated by producers of research. We calculate critical values for statistical significance that could help to adjust after the fact for the distortions created by this selection effect, assuming that the only source of publication bias is file drawer bias. These adjusted critical values are easy to calculate and differ from unadjusted critical values by approximately 50%—rather than rejecting a null hypothesis when the t-ratio exceeds 2, the analysis suggests rejecting a null hypothesis when the t-ratio exceeds 3. Samples of published social science research indicate that on average, across research fields, approximately 30% of published t-statistics fall between the standard and adjusted cutoffs. PMID:26901834

  5. Across-cohort QC analyses of GWAS summary statistics from complex traits.

    PubMed

    Chen, Guo-Bo; Lee, Sang Hong; Robinson, Matthew R; Trzaskowski, Maciej; Zhu, Zhi-Xiang; Winkler, Thomas W; Day, Felix R; Croteau-Chonka, Damien C; Wood, Andrew R; Locke, Adam E; Kutalik, Zoltán; Loos, Ruth J F; Frayling, Timothy M; Hirschhorn, Joel N; Yang, Jian; Wray, Naomi R; Visscher, Peter M

    2016-01-01

    Genome-wide association studies (GWASs) have been successful in discovering SNP trait associations for many quantitative traits and common diseases. Typically, the effect sizes of SNP alleles are very small and this requires large genome-wide association meta-analyses (GWAMAs) to maximize statistical power. A trend towards ever-larger GWAMA is likely to continue, yet dealing with summary statistics from hundreds of cohorts increases logistical and quality control problems, including unknown sample overlap, and these can lead to both false positive and false negative findings. In this study, we propose four metrics and visualization tools for GWAMA, using summary statistics from cohort-level GWASs. We propose methods to examine the concordance between demographic information, and summary statistics and methods to investigate sample overlap. (I) We use the population genetics F st statistic to verify the genetic origin of each cohort and their geographic location, and demonstrate using GWAMA data from the GIANT Consortium that geographic locations of cohorts can be recovered and outlier cohorts can be detected. (II) We conduct principal component analysis based on reported allele frequencies, and are able to recover the ancestral information for each cohort. (III) We propose a new statistic that uses the reported allelic effect sizes and their standard errors to identify significant sample overlap or heterogeneity between pairs of cohorts. (IV) To quantify unknown sample overlap across all pairs of cohorts, we propose a method that uses randomly generated genetic predictors that does not require the sharing of individual-level genotype data and does not breach individual privacy.

  6. Across-cohort QC analyses of GWAS summary statistics from complex traits

    PubMed Central

    Chen, Guo-Bo; Lee, Sang Hong; Robinson, Matthew R; Trzaskowski, Maciej; Zhu, Zhi-Xiang; Winkler, Thomas W; Day, Felix R; Croteau-Chonka, Damien C; Wood, Andrew R; Locke, Adam E; Kutalik, Zoltán; Loos, Ruth J F; Frayling, Timothy M; Hirschhorn, Joel N; Yang, Jian; Wray, Naomi R; Visscher, Peter M

    2017-01-01

    Genome-wide association studies (GWASs) have been successful in discovering SNP trait associations for many quantitative traits and common diseases. Typically, the effect sizes of SNP alleles are very small and this requires large genome-wide association meta-analyses (GWAMAs) to maximize statistical power. A trend towards ever-larger GWAMA is likely to continue, yet dealing with summary statistics from hundreds of cohorts increases logistical and quality control problems, including unknown sample overlap, and these can lead to both false positive and false negative findings. In this study, we propose four metrics and visualization tools for GWAMA, using summary statistics from cohort-level GWASs. We propose methods to examine the concordance between demographic information, and summary statistics and methods to investigate sample overlap. (I) We use the population genetics Fst statistic to verify the genetic origin of each cohort and their geographic location, and demonstrate using GWAMA data from the GIANT Consortium that geographic locations of cohorts can be recovered and outlier cohorts can be detected. (II) We conduct principal component analysis based on reported allele frequencies, and are able to recover the ancestral information for each cohort. (III) We propose a new statistic that uses the reported allelic effect sizes and their standard errors to identify significant sample overlap or heterogeneity between pairs of cohorts. (IV) To quantify unknown sample overlap across all pairs of cohorts, we propose a method that uses randomly generated genetic predictors that does not require the sharing of individual-level genotype data and does not breach individual privacy. PMID:27552965

  7. Patterns of Puffery: An Analysis of Non-Fiction Blurbs

    ERIC Educational Resources Information Center

    Cronin, Blaise; La Barre, Kathryn

    2005-01-01

    The blurb is a paratextual element which has not previously been subjected to systematic analysis. We describe the nature and purpose of this publishing epiphenomenon, highlight some of the related marketing issues and ethical concerns and provide a statistical analysis of almost 2000 blurbs identified in a sample of 450 non-fiction books.…

  8. HICOSMO: cosmology with a complete sample of galaxy clusters - II. Cosmological results

    NASA Astrophysics Data System (ADS)

    Schellenberger, G.; Reiprich, T. H.

    2017-10-01

    The X-ray bright, hot gas in the potential well of a galaxy cluster enables systematic X-ray studies of samples of galaxy clusters to constrain cosmological parameters. HIFLUGCS consists of the 64 X-ray brightest galaxy clusters in the Universe, building up a local sample. Here, we utilize this sample to determine, for the first time, individual hydrostatic mass estimates for all the clusters of the sample and, by making use of the completeness of the sample, we quantify constraints on the two interesting cosmological parameters, Ωm and σ8. We apply our total hydrostatic and gas mass estimates from the X-ray analysis to a Bayesian cosmological likelihood analysis and leave several parameters free to be constrained. We find Ωm = 0.30 ± 0.01 and σ8 = 0.79 ± 0.03 (statistical uncertainties, 68 per cent credibility level) using our default analysis strategy combining both a mass function analysis and the gas mass fraction results. The main sources of biases that we correct here are (1) the influence of galaxy groups (incompleteness in parent samples and differing behaviour of the Lx-M relation), (2) the hydrostatic mass bias, (3) the extrapolation of the total mass (comparing various methods), (4) the theoretical halo mass function and (5) other physical effects (non-negligible neutrino mass). We find that galaxy groups introduce a strong bias, since their number density seems to be over predicted by the halo mass function. On the other hand, incorporating baryonic effects does not result in a significant change in the constraints. The total (uncorrected) systematic uncertainties (∼20 per cent) clearly dominate the statistical uncertainties on cosmological parameters for our sample.

  9. A Review of ETS Differential Item Functioning Assessment Procedures: Flagging Rules, Minimum Sample Size Requirements, and Criterion Refinement. Research Report. ETS RR-12-08

    ERIC Educational Resources Information Center

    Zwick, Rebecca

    2012-01-01

    Differential item functioning (DIF) analysis is a key component in the evaluation of the fairness and validity of educational tests. The goal of this project was to review the status of ETS DIF analysis procedures, focusing on three aspects: (a) the nature and stringency of the statistical rules used to flag items, (b) the minimum sample size…

  10. Analyzing Immunoglobulin Repertoires

    PubMed Central

    Chaudhary, Neha; Wesemann, Duane R.

    2018-01-01

    Somatic assembly of T cell receptor and B cell receptor (BCR) genes produces a vast diversity of lymphocyte antigen recognition capacity. The advent of efficient high-throughput sequencing of lymphocyte antigen receptor genes has recently generated unprecedented opportunities for exploration of adaptive immune responses. With these opportunities have come significant challenges in understanding the analysis techniques that most accurately reflect underlying biological phenomena. In this regard, sample preparation and sequence analysis techniques, which have largely been borrowed and adapted from other fields, continue to evolve. Here, we review current methods and challenges of library preparation, sequencing and statistical analysis of lymphocyte receptor repertoire studies. We discuss the general steps in the process of immune repertoire generation including sample preparation, platforms available for sequencing, processing of sequencing data, measurable features of the immune repertoire, and the statistical tools that can be used for analysis and interpretation of the data. Because BCR analysis harbors additional complexities, such as immunoglobulin (Ig) (i.e., antibody) gene somatic hypermutation and class switch recombination, the emphasis of this review is on Ig/BCR sequence analysis. PMID:29593723

  11. Methods for trend analysis: Examples with problem/failure data

    NASA Technical Reports Server (NTRS)

    Church, Curtis K.

    1989-01-01

    Statistics are emphasized as an important role in quality control and reliability. Consequently, Trend Analysis Techniques recommended a variety of statistical methodologies that could be applied to time series data. The major goal of the working handbook, using data from the MSFC Problem Assessment System, is to illustrate some of the techniques in the NASA standard, some different techniques, and to notice patterns of data. Techniques for trend estimation used are: regression (exponential, power, reciprocal, straight line) and Kendall's rank correlation coefficient. The important details of a statistical strategy for estimating a trend component are covered in the examples. However, careful analysis and interpretation is necessary because of small samples and frequent zero problem reports in a given time period. Further investigations to deal with these issues are being conducted.

  12. STAPP: Spatiotemporal analysis of plantar pressure measurements using statistical parametric mapping.

    PubMed

    Booth, Brian G; Keijsers, Noël L W; Sijbers, Jan; Huysmans, Toon

    2018-05-03

    Pedobarography produces large sets of plantar pressure samples that are routinely subsampled (e.g. using regions of interest) or aggregated (e.g. center of pressure trajectories, peak pressure images) in order to simplify statistical analysis and provide intuitive clinical measures. We hypothesize that these data reductions discard gait information that can be used to differentiate between groups or conditions. To test the hypothesis of null information loss, we created an implementation of statistical parametric mapping (SPM) for dynamic plantar pressure datasets (i.e. plantar pressure videos). Our SPM software framework brings all plantar pressure videos into anatomical and temporal correspondence, then performs statistical tests at each sampling location in space and time. Novelly, we introduce non-linear temporal registration into the framework in order to normalize for timing differences within the stance phase. We refer to our software framework as STAPP: spatiotemporal analysis of plantar pressure measurements. Using STAPP, we tested our hypothesis on plantar pressure videos from 33 healthy subjects walking at different speeds. As walking speed increased, STAPP was able to identify significant decreases in plantar pressure at mid-stance from the heel through the lateral forefoot. The extent of these plantar pressure decreases has not previously been observed using existing plantar pressure analysis techniques. We therefore conclude that the subsampling of plantar pressure videos - a task which led to the discarding of gait information in our study - can be avoided using STAPP. Copyright © 2018 Elsevier B.V. All rights reserved.

  13. Introductory Statistics Students' Conceptual Understanding of Study Design and Conclusions

    NASA Astrophysics Data System (ADS)

    Fry, Elizabeth Brondos

    Recommended learning goals for students in introductory statistics courses include the ability to recognize and explain the key role of randomness in designing studies and in drawing conclusions from those studies involving generalizations to a population or causal claims (GAISE College Report ASA Revision Committee, 2016). The purpose of this study was to explore introductory statistics students' understanding of the distinct roles that random sampling and random assignment play in study design and the conclusions that can be made from each. A study design unit lasting two and a half weeks was designed and implemented in four sections of an undergraduate introductory statistics course based on modeling and simulation. The research question that this study attempted to answer is: How does introductory statistics students' conceptual understanding of study design and conclusions (in particular, unbiased estimation and establishing causation) change after participating in a learning intervention designed to promote conceptual change in these areas? In order to answer this research question, a forced-choice assessment called the Inferences from Design Assessment (IDEA) was developed as a pretest and posttest, along with two open-ended assignments, a group quiz and a lab assignment. Quantitative analysis of IDEA results and qualitative analysis of the group quiz and lab assignment revealed that overall, students' mastery of study design concepts significantly increased after the unit, and the great majority of students successfully made the appropriate connections between random sampling and generalization, and between random assignment and causal claims. However, a small, but noticeable portion of students continued to demonstrate misunderstandings, such as confusion between random sampling and random assignment.

  14. Analysis of wind bias change with respect to time at Cape Kennedy, Florida, and Vandenberg AFB, California

    NASA Technical Reports Server (NTRS)

    Adelfang, S. I.

    1978-01-01

    A statistical analysis is presented of the temporal variability of wind vectors at 1 km altitude intervals from 0 to 27 km altitude after applying a digital filter to the original wind profile data sample.

  15. A new u-statistic with superior design sensitivity in matched observational studies.

    PubMed

    Rosenbaum, Paul R

    2011-09-01

    In an observational or nonrandomized study of treatment effects, a sensitivity analysis indicates the magnitude of bias from unmeasured covariates that would need to be present to alter the conclusions of a naïve analysis that presumes adjustments for observed covariates suffice to remove all bias. The power of sensitivity analysis is the probability that it will reject a false hypothesis about treatment effects allowing for a departure from random assignment of a specified magnitude; in particular, if this specified magnitude is "no departure" then this is the same as the power of a randomization test in a randomized experiment. A new family of u-statistics is proposed that includes Wilcoxon's signed rank statistic but also includes other statistics with substantially higher power when a sensitivity analysis is performed in an observational study. Wilcoxon's statistic has high power to detect small effects in large randomized experiments-that is, it often has good Pitman efficiency-but small effects are invariably sensitive to small unobserved biases. Members of this family of u-statistics that emphasize medium to large effects can have substantially higher power in a sensitivity analysis. For example, in one situation with 250 pair differences that are Normal with expectation 1/2 and variance 1, the power of a sensitivity analysis that uses Wilcoxon's statistic is 0.08 while the power of another member of the family of u-statistics is 0.66. The topic is examined by performing a sensitivity analysis in three observational studies, using an asymptotic measure called the design sensitivity, and by simulating power in finite samples. The three examples are drawn from epidemiology, clinical medicine, and genetic toxicology. © 2010, The International Biometric Society.

  16. Statistical framework for detection of genetically modified organisms based on Next Generation Sequencing.

    PubMed

    Willems, Sander; Fraiture, Marie-Alice; Deforce, Dieter; De Keersmaecker, Sigrid C J; De Loose, Marc; Ruttink, Tom; Herman, Philippe; Van Nieuwerburgh, Filip; Roosens, Nancy

    2016-02-01

    Because the number and diversity of genetically modified (GM) crops has significantly increased, their analysis based on real-time PCR (qPCR) methods is becoming increasingly complex and laborious. While several pioneers already investigated Next Generation Sequencing (NGS) as an alternative to qPCR, its practical use has not been assessed for routine analysis. In this study a statistical framework was developed to predict the number of NGS reads needed to detect transgene sequences, to prove their integration into the host genome and to identify the specific transgene event in a sample with known composition. This framework was validated by applying it to experimental data from food matrices composed of pure GM rice, processed GM rice (noodles) or a 10% GM/non-GM rice mixture, revealing some influential factors. Finally, feasibility of NGS for routine analysis of GM crops was investigated by applying the framework to samples commonly encountered in routine analysis of GM crops. Copyright © 2015 The Authors. Published by Elsevier Ltd.. All rights reserved.

  17. Method and system of Jones-matrix mapping of blood plasma films with "fuzzy" analysis in differentiation of breast pathology changes

    NASA Astrophysics Data System (ADS)

    Zabolotna, Natalia I.; Radchenko, Kostiantyn O.; Karas, Oleksandr V.

    2018-01-01

    A fibroadenoma diagnosing of breast using statistical analysis (determination and analysis of statistical moments of the 1st-4th order) of the obtained polarization images of Jones matrix imaginary elements of the optically thin (attenuation coefficient τ <= 0,1 ) blood plasma films with further intellectual differentiation based on the method of "fuzzy" logic and discriminant analysis were proposed. The accuracy of the intellectual differentiation of blood plasma samples to the "norm" and "fibroadenoma" of breast was 82.7% by the method of linear discriminant analysis, and by the "fuzzy" logic method is 95.3%. The obtained results allow to confirm the potentially high level of reliability of the method of differentiation by "fuzzy" analysis.

  18. Textural Analysis and Substrate Classification in the Nearshore Region of Lake Superior Using High-Resolution Multibeam Bathymetry

    NASA Astrophysics Data System (ADS)

    Dennison, Andrew G.

    Classification of the seafloor substrate can be done with a variety of methods. These methods include Visual (dives, drop cameras); mechanical (cores, grab samples); acoustic (statistical analysis of echosounder returns). Acoustic methods offer a more powerful and efficient means of collecting useful information about the bottom type. Due to the nature of an acoustic survey, larger areas can be sampled, and by combining the collected data with visual and mechanical survey methods provide greater confidence in the classification of a mapped region. During a multibeam sonar survey, both bathymetric and backscatter data is collected. It is well documented that the statistical characteristic of a sonar backscatter mosaic is dependent on bottom type. While classifying the bottom-type on the basis on backscatter alone can accurately predict and map bottom-type, i.e a muddy area from a rocky area, it lacks the ability to resolve and capture fine textural details, an important factor in many habitat mapping studies. Statistical processing of high-resolution multibeam data can capture the pertinent details about the bottom-type that are rich in textural information. Further multivariate statistical processing can then isolate characteristic features, and provide the basis for an accurate classification scheme. The development of a new classification method is described here. It is based upon the analysis of textural features in conjunction with ground truth sampling. The processing and classification result of two geologically distinct areas in nearshore regions of Lake Superior; off the Lester River,MN and Amnicon River, WI are presented here, using the Minnesota Supercomputer Institute's Mesabi computing cluster for initial processing. Processed data is then calibrated using ground truth samples to conduct an accuracy assessment of the surveyed areas. From analysis of high-resolution bathymetry data collected at both survey sites is was possible to successfully calculate a series of measures that describe textural information about the lake floor. Further processing suggests that the features calculated capture a significant amount of statistical information about the lake floor terrain as well. Two sources of error, an anomalous heave and refraction error significantly deteriorated the quality of the processed data and resulting validate results. Ground truth samples used to validate the classification methods utilized for both survey sites, however, resulted in accuracy values ranging from 5 -30 percent at the Amnicon River, and between 60-70 percent for the Lester River. The final results suggest that this new processing methodology does adequately capture textural information about the lake floor and does provide an acceptable classification in the absence of significant data quality issues.

  19. Statistical Analysis of Mineral Concentration for the Geographic Identification of Garlic Samples from Sicily (Italy), Tunisia and Spain

    PubMed Central

    Vadalà, Rossella; Mottese, Antonio F.; Bua, Giuseppe D.; Salvo, Andrea; Mallamace, Domenico; Corsaro, Carmelo; Vasi, Sebastiano; Giofrè, Salvatore V.; Alfa, Maria; Cicero, Nicola; Dugo, Giacomo

    2016-01-01

    We performed a statistical analysis of the concentration of mineral elements, by means of inductively coupled plasma mass spectrometry (ICP-MS), in different varieties of garlic from Spain, Tunisia, and Italy. Nubia Red Garlic (Sicily) is one of the most known Italian varieties that belongs to traditional Italian food products (P.A.T.) of the Ministry of Agriculture, Food, and Forestry. The obtained results suggest that the concentrations of the considered elements may serve as geographical indicators for the discrimination of the origin of the different samples. In particular, we found a relatively high content of Selenium in the garlic variety known as Nubia red garlic, and, indeed, it could be used as an anticarcinogenic agent. PMID:28231115

  20. Seasonal rationalization of river water quality sampling locations: a comparative study of the modified Sanders and multivariate statistical approaches.

    PubMed

    Varekar, Vikas; Karmakar, Subhankar; Jha, Ramakar

    2016-02-01

    The design of surface water quality sampling location is a crucial decision-making process for rationalization of monitoring network. The quantity, quality, and types of available dataset (watershed characteristics and water quality data) may affect the selection of appropriate design methodology. The modified Sanders approach and multivariate statistical techniques [particularly factor analysis (FA)/principal component analysis (PCA)] are well-accepted and widely used techniques for design of sampling locations. However, their performance may vary significantly with quantity, quality, and types of available dataset. In this paper, an attempt has been made to evaluate performance of these techniques by accounting the effect of seasonal variation, under a situation of limited water quality data but extensive watershed characteristics information, as continuous and consistent river water quality data is usually difficult to obtain, whereas watershed information may be made available through application of geospatial techniques. A case study of Kali River, Western Uttar Pradesh, India, is selected for the analysis. The monitoring was carried out at 16 sampling locations. The discrete and diffuse pollution loads at different sampling sites were estimated and accounted using modified Sanders approach, whereas the monitored physical and chemical water quality parameters were utilized as inputs for FA/PCA. The designed optimum number of sampling locations for monsoon and non-monsoon seasons by modified Sanders approach are eight and seven while that for FA/PCA are eleven and nine, respectively. Less variation in the number and locations of designed sampling sites were obtained by both techniques, which shows stability of results. A geospatial analysis has also been carried out to check the significance of designed sampling location with respect to river basin characteristics and land use of the study area. Both methods are equally efficient; however, modified Sanders approach outperforms FA/PCA when limited water quality and extensive watershed information is available. The available water quality dataset is limited and FA/PCA-based approach fails to identify monitoring locations with higher variation, as these multivariate statistical approaches are data-driven. The priority/hierarchy and number of sampling sites designed by modified Sanders approach are well justified by the land use practices and observed river basin characteristics of the study area.

  1. Office for Civil Rights Survey Redesign: A Feasibility Survey. Contractor Report. Statistical Analysis Report.

    ERIC Educational Resources Information Center

    Mansfield, Wendy; Farris, Elizabeth

    This report provides results of a Fast Response Survey System (FRSS) study conducted by the National Center for Education Statistics for the Office for Civil Rights (OCR). The OCR wanted input for their decision-making process on possible modifications to their biennial survey of a national sample of public school districts (PSDs). The survey, the…

  2. Learning Model and Form of Assesment toward the Inferensial Statistical Achievement by Controlling Numeric Thinking Skills

    ERIC Educational Resources Information Center

    Widiana, I. Wayan; Jampel, I. Nyoman

    2016-01-01

    This study aimed to find out the effect of learning model and form of assessment toward inferential statistical achievement after controlling numeric thinking skills. This study was quasi experimental study with 130 students as the sample. The data analysis used ANCOVA. After controlling numeric thinking skills, the result of this study show that:…

  3. Statistical Analyses of Scatterplots to Identify Important Factors in Large-Scale Simulations

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kleijnen, J.P.C.; Helton, J.C.

    1999-04-01

    The robustness of procedures for identifying patterns in scatterplots generated in Monte Carlo sensitivity analyses is investigated. These procedures are based on attempts to detect increasingly complex patterns in the scatterplots under consideration and involve the identification of (1) linear relationships with correlation coefficients, (2) monotonic relationships with rank correlation coefficients, (3) trends in central tendency as defined by means, medians and the Kruskal-Wallis statistic, (4) trends in variability as defined by variances and interquartile ranges, and (5) deviations from randomness as defined by the chi-square statistic. The following two topics related to the robustness of these procedures are consideredmore » for a sequence of example analyses with a large model for two-phase fluid flow: the presence of Type I and Type II errors, and the stability of results obtained with independent Latin hypercube samples. Observations from analysis include: (1) Type I errors are unavoidable, (2) Type II errors can occur when inappropriate analysis procedures are used, (3) physical explanations should always be sought for why statistical procedures identify variables as being important, and (4) the identification of important variables tends to be stable for independent Latin hypercube samples.« less

  4. An introduction to Bayesian statistics in health psychology.

    PubMed

    Depaoli, Sarah; Rus, Holly M; Clifton, James P; van de Schoot, Rens; Tiemensma, Jitske

    2017-09-01

    The aim of the current article is to provide a brief introduction to Bayesian statistics within the field of health psychology. Bayesian methods are increasing in prevalence in applied fields, and they have been shown in simulation research to improve the estimation accuracy of structural equation models, latent growth curve (and mixture) models, and hierarchical linear models. Likewise, Bayesian methods can be used with small sample sizes since they do not rely on large sample theory. In this article, we discuss several important components of Bayesian statistics as they relate to health-based inquiries. We discuss the incorporation and impact of prior knowledge into the estimation process and the different components of the analysis that should be reported in an article. We present an example implementing Bayesian estimation in the context of blood pressure changes after participants experienced an acute stressor. We conclude with final thoughts on the implementation of Bayesian statistics in health psychology, including suggestions for reviewing Bayesian manuscripts and grant proposals. We have also included an extensive amount of online supplementary material to complement the content presented here, including Bayesian examples using many different software programmes and an extensive sensitivity analysis examining the impact of priors.

  5. Statistical dependency in visual scanning

    NASA Technical Reports Server (NTRS)

    Ellis, Stephen R.; Stark, Lawrence

    1986-01-01

    A method to identify statistical dependencies in the positions of eye fixations is developed and applied to eye movement data from subjects who viewed dynamic displays of air traffic and judged future relative position of aircraft. Analysis of approximately 23,000 fixations on points of interest on the display identified statistical dependencies in scanning that were independent of the physical placement of the points of interest. Identification of these dependencies is inconsistent with random-sampling-based theories used to model visual search and information seeking.

  6. Mapping cell populations in flow cytometry data for cross‐sample comparison using the Friedman–Rafsky test statistic as a distance measure

    PubMed Central

    Hsiao, Chiaowen; Liu, Mengya; Stanton, Rick; McGee, Monnie; Qian, Yu

    2015-01-01

    Abstract Flow cytometry (FCM) is a fluorescence‐based single‐cell experimental technology that is routinely applied in biomedical research for identifying cellular biomarkers of normal physiological responses and abnormal disease states. While many computational methods have been developed that focus on identifying cell populations in individual FCM samples, very few have addressed how the identified cell populations can be matched across samples for comparative analysis. This article presents FlowMap‐FR, a novel method for cell population mapping across FCM samples. FlowMap‐FR is based on the Friedman–Rafsky nonparametric test statistic (FR statistic), which quantifies the equivalence of multivariate distributions. As applied to FCM data by FlowMap‐FR, the FR statistic objectively quantifies the similarity between cell populations based on the shapes, sizes, and positions of fluorescence data distributions in the multidimensional feature space. To test and evaluate the performance of FlowMap‐FR, we simulated the kinds of biological and technical sample variations that are commonly observed in FCM data. The results show that FlowMap‐FR is able to effectively identify equivalent cell populations between samples under scenarios of proportion differences and modest position shifts. As a statistical test, FlowMap‐FR can be used to determine whether the expression of a cellular marker is statistically different between two cell populations, suggesting candidates for new cellular phenotypes by providing an objective statistical measure. In addition, FlowMap‐FR can indicate situations in which inappropriate splitting or merging of cell populations has occurred during gating procedures. We compared the FR statistic with the symmetric version of Kullback–Leibler divergence measure used in a previous population matching method with both simulated and real data. The FR statistic outperforms the symmetric version of KL‐distance in distinguishing equivalent from nonequivalent cell populations. FlowMap‐FR was also employed as a distance metric to match cell populations delineated by manual gating across 30 FCM samples from a benchmark FlowCAP data set. An F‐measure of 0.88 was obtained, indicating high precision and recall of the FR‐based population matching results. FlowMap‐FR has been implemented as a standalone R/Bioconductor package so that it can be easily incorporated into current FCM data analytical workflows. © 2015 International Society for Advancement of Cytometry PMID:26274018

  7. Mapping cell populations in flow cytometry data for cross-sample comparison using the Friedman-Rafsky test statistic as a distance measure.

    PubMed

    Hsiao, Chiaowen; Liu, Mengya; Stanton, Rick; McGee, Monnie; Qian, Yu; Scheuermann, Richard H

    2016-01-01

    Flow cytometry (FCM) is a fluorescence-based single-cell experimental technology that is routinely applied in biomedical research for identifying cellular biomarkers of normal physiological responses and abnormal disease states. While many computational methods have been developed that focus on identifying cell populations in individual FCM samples, very few have addressed how the identified cell populations can be matched across samples for comparative analysis. This article presents FlowMap-FR, a novel method for cell population mapping across FCM samples. FlowMap-FR is based on the Friedman-Rafsky nonparametric test statistic (FR statistic), which quantifies the equivalence of multivariate distributions. As applied to FCM data by FlowMap-FR, the FR statistic objectively quantifies the similarity between cell populations based on the shapes, sizes, and positions of fluorescence data distributions in the multidimensional feature space. To test and evaluate the performance of FlowMap-FR, we simulated the kinds of biological and technical sample variations that are commonly observed in FCM data. The results show that FlowMap-FR is able to effectively identify equivalent cell populations between samples under scenarios of proportion differences and modest position shifts. As a statistical test, FlowMap-FR can be used to determine whether the expression of a cellular marker is statistically different between two cell populations, suggesting candidates for new cellular phenotypes by providing an objective statistical measure. In addition, FlowMap-FR can indicate situations in which inappropriate splitting or merging of cell populations has occurred during gating procedures. We compared the FR statistic with the symmetric version of Kullback-Leibler divergence measure used in a previous population matching method with both simulated and real data. The FR statistic outperforms the symmetric version of KL-distance in distinguishing equivalent from nonequivalent cell populations. FlowMap-FR was also employed as a distance metric to match cell populations delineated by manual gating across 30 FCM samples from a benchmark FlowCAP data set. An F-measure of 0.88 was obtained, indicating high precision and recall of the FR-based population matching results. FlowMap-FR has been implemented as a standalone R/Bioconductor package so that it can be easily incorporated into current FCM data analytical workflows. © The Authors. Published by Wiley Periodicals, Inc. on behalf of ISAC.

  8. Statistical considerations for plot design, sampling procedures, analysis, and quality assurance of ozone injury studies

    Treesearch

    Michael Arbaugh; Larry Bednar

    1996-01-01

    The sampling methods used to monitor ozone injury to ponderosa and Jeffrey pines depend on the objectives of the study, geographic and genetic composition of the forest, and the source and composition of air pollutant emissions. By using a standardized sampling methodology, it may be possible to compare conditions within local areas more accurately, and to apply the...

  9. Determination of element composition and extraterrestrial material occurrence in moss and lichen samples from King George Island (Antarctica) using reactor neutron activation analysis and SEM microscopy.

    PubMed

    Mróz, Tomasz; Szufa, Katarzyna; Frontasyeva, Marina V; Tselmovich, Vladimir; Ostrovnaya, Tatiana; Kornaś, Andrzej; Olech, Maria A; Mietelski, Jerzy W; Brudecki, Kamil

    2018-01-01

    Seven lichens (Usnea antarctica and U. aurantiacoatra) and nine moss samples (Sanionia uncinata) collected in King George Island were analyzed using instrumental neutron activation analysis, and concentration of major and trace elements was calculated. For some elements, the concentrations observed in moss samples were higher than corresponding values reported from other sites in the Antarctica, but in the lichens, these were in the same range of concentrations. Scanning electron microscopy (SEM) and statistical analysis showed large influence of volcanic-origin particles. Also, the interplanetary cosmic particles (ICP) were observed in investigated samples, as mosses and lichens are good collectors of ICP and micrometeorites.

  10. Multivariate statistical analysis of heavy metal concentration in soils of Yelagiri Hills, Tamilnadu, India--spectroscopical approach.

    PubMed

    Chandrasekaran, A; Ravisankar, R; Harikrishnan, N; Satapathy, K K; Prasad, M V R; Kanagasabapathy, K V

    2015-02-25

    Anthropogenic activities increase the accumulation of heavy metals in the soil environment. Soil pollution significantly reduces environmental quality and affects the human health. In the present study soil samples were collected at different locations of Yelagiri Hills, Tamilnadu, India for heavy metal analysis. The samples were analyzed for twelve selected heavy metals (Mg, Al, K, Ca, Ti, Fe, V, Cr, Mn, Co, Ni and Zn) using energy dispersive X-ray fluorescence (EDXRF) spectroscopy. Heavy metals concentration in soil were investigated using enrichment factor (EF), geo-accumulation index (Igeo), contamination factor (CF) and pollution load index (PLI) to determine metal accumulation, distribution and its pollution status. Heavy metal toxicity risk was assessed using soil quality guidelines (SQGs) given by target and intervention values of Dutch soil standards. The concentration of Ni, Co, Zn, Cr, Mn, Fe, Ti, K, Al, Mg were mainly controlled by natural sources. Multivariate statistical methods such as correlation matrix, principal component analysis and cluster analysis were applied for the identification of heavy metal sources (anthropogenic/natural origin). Geo-statistical methods such as kirging identified hot spots of metal contamination in road areas influenced mainly by presence of natural rocks. Copyright © 2014 Elsevier B.V. All rights reserved.

  11. Application of Digital Image Analysis to Determine Pancreatic Islet Mass and Purity in Clinical Islet Isolation and Transplantation

    PubMed Central

    Wang, Ling-jia; Kissler, Hermann J; Wang, Xiaojun; Cochet, Olivia; Krzystyniak, Adam; Misawa, Ryosuke; Golab, Karolina; Tibudan, Martin; Grzanka, Jakub; Savari, Omid; Grose, Randall; Kaufman, Dixon B; Millis, Michael; Witkowski, Piotr

    2015-01-01

    Pancreatic islet mass, represented by islet equivalent (IEQ), is the most important parameter in decision making for clinical islet transplantation. To obtain IEQ, the sample of islets is routinely counted manually under a microscope and discarded thereafter. Islet purity, another parameter in islet processing, is routinely acquired by estimation only. In this study, we validated our digital image analysis (DIA) system developed using the software of Image Pro Plus for islet mass and purity assessment. Application of the DIA allows to better comply with current good manufacturing practice (cGMP) standards. Human islet samples were captured as calibrated digital images for the permanent record. Five trained technicians participated in determination of IEQ and purity by manual counting method and DIA. IEQ count showed statistically significant correlations between the manual method and DIA in all sample comparisons (r >0.819 and p < 0.0001). Statistically significant difference in IEQ between both methods was found only in High purity 100μL sample group (p = 0.029). As far as purity determination, statistically significant differences between manual assessment and DIA measurement was found in High and Low purity 100μL samples (p<0.005), In addition, islet particle number (IPN) and the IEQ/IPN ratio did not differ statistically between manual counting method and DIA. In conclusion, the DIA used in this study is a reliable technique in determination of IEQ and purity. Islet sample preserved as a digital image and results produced by DIA can be permanently stored for verification, technical training and islet information exchange between different islet centers. Therefore, DIA complies better with cGMP requirements than the manual counting method. We propose DIA as a quality control tool to supplement the established standard manual method for islets counting and purity estimation. PMID:24806436

  12. Assessment of levels of bacterial contamination of large wild game meat in Europe.

    PubMed

    Membré, Jeanne-Marie; Laroche, Michel; Magras, Catherine

    2011-08-01

    The variations in prevalence and levels of pathogens and fecal contamination indicators in large wild game meat were studied to assess their potential impact on consumers. This analysis was based on hazard analysis, data generation and statistical analysis. A total of 2919 meat samples from three species (red deer, roe deer, wild boar) were collected at French game meat traders' facilities using two sampling protocols. Information was gathered on the types of meat cuts (forequarter or haunch; first sampling protocol) or type of retail-ready meat (stewing meat or roasting meat; second protocol), and also on the meat storage conditions (frozen or chilled), country of origin (eight countries) and shooting season (autumn, winter, spring). The samples were analyzed in both protocols for detection and enumeration of Escherichia coli, coagulase+staphylococci and Clostridium perfringens. In addition, detection and enumeration of thermotolerant coliforms and Listeria monocytogenes were performed for samples collected in the first and second protocols, respectively. The levels of bacterial contamination of the raw meat were determined by performing statistical analysis involving probabilistic techniques and Bayesian inference. C. perfringens was found in the highest numbers for the three indicators of microbial quality, hygiene and good handling, and L. monocytogenes in the lowest. Differences in contamination levels between game species and between meats distributed as chilled or frozen products were not significant. These results might be included in quantitative exposure assessments. Copyright © 2011 Elsevier Ltd. All rights reserved.

  13. Grain size statistics and depositional pattern of the Ecca Group sandstones, Karoo Supergroup in the Eastern Cape Province, South Africa

    NASA Astrophysics Data System (ADS)

    Baiyegunhi, Christopher; Liu, Kuiwu; Gwavava, Oswald

    2017-11-01

    Grain size analysis is a vital sedimentological tool used to unravel the hydrodynamic conditions, mode of transportation and deposition of detrital sediments. In this study, detailed grain-size analysis was carried out on thirty-five sandstone samples from the Ecca Group in the Eastern Cape Province of South Africa. Grain-size statistical parameters, bivariate analysis, linear discriminate functions, Passega diagrams and log-probability curves were used to reveal the depositional processes, sedimentation mechanisms, hydrodynamic energy conditions and to discriminate different depositional environments. The grain-size parameters show that most of the sandstones are very fine to fine grained, moderately well sorted, mostly near-symmetrical and mesokurtic in nature. The abundance of very fine to fine grained sandstones indicate the dominance of low energy environment. The bivariate plots show that the samples are mostly grouped, except for the Prince Albert samples that show scattered trend, which is due to the either mixture of two modes in equal proportion in bimodal sediments or good sorting in unimodal sediments. The linear discriminant function analysis is dominantly indicative of turbidity current deposits under shallow marine environments for samples from the Prince Albert, Collingham and Ripon Formations, while those samples from the Fort Brown Formation are lacustrine or deltaic deposits. The C-M plots indicated that the sediments were deposited mainly by suspension and saltation, and graded suspension. Visher diagrams show that saltation is the major process of transportation, followed by suspension.

  14. A Framework for Establishing Standard Reference Scale of Texture by Multivariate Statistical Analysis Based on Instrumental Measurement and Sensory Evaluation.

    PubMed

    Zhi, Ruicong; Zhao, Lei; Xie, Nan; Wang, Houyin; Shi, Bolin; Shi, Jingye

    2016-01-13

    A framework of establishing standard reference scale (texture) is proposed by multivariate statistical analysis according to instrumental measurement and sensory evaluation. Multivariate statistical analysis is conducted to rapidly select typical reference samples with characteristics of universality, representativeness, stability, substitutability, and traceability. The reasonableness of the framework method is verified by establishing standard reference scale of texture attribute (hardness) with Chinese well-known food. More than 100 food products in 16 categories were tested using instrumental measurement (TPA test), and the result was analyzed with clustering analysis, principal component analysis, relative standard deviation, and analysis of variance. As a result, nine kinds of foods were determined to construct the hardness standard reference scale. The results indicate that the regression coefficient between the estimated sensory value and the instrumentally measured value is significant (R(2) = 0.9765), which fits well with Stevens's theory. The research provides reliable a theoretical basis and practical guide for quantitative standard reference scale establishment on food texture characteristics.

  15. Complexity quantification of dense array EEG using sample entropy analysis.

    PubMed

    Ramanand, Pravitha; Nampoori, V P N; Sreenivasan, R

    2004-09-01

    In this paper, a time series complexity analysis of dense array electroencephalogram signals is carried out using the recently introduced Sample Entropy (SampEn) measure. This statistic quantifies the regularity in signals recorded from systems that can vary from the purely deterministic to purely stochastic realm. The present analysis is conducted with an objective of gaining insight into complexity variations related to changing brain dynamics for EEG recorded from the three cases of passive, eyes closed condition, a mental arithmetic task and the same mental task carried out after a physical exertion task. It is observed that the statistic is a robust quantifier of complexity suited for short physiological signals such as the EEG and it points to the specific brain regions that exhibit lowered complexity during the mental task state as compared to a passive, relaxed state. In the case of mental tasks carried out before and after the performance of a physical exercise, the statistic can detect the variations brought in by the intermediate fatigue inducing exercise period. This enhances its utility in detecting subtle changes in the brain state that can find wider scope for applications in EEG based brain studies.

  16. Variation of Water Quality Parameters with Siltation Depth for River Ichamati Along International Border with Bangladesh Using Multivariate Statistical Techniques

    NASA Astrophysics Data System (ADS)

    Roy, P. K.; Pal, S.; Banerjee, G.; Biswas Roy, M.; Ray, D.; Majumder, A.

    2014-12-01

    River is considered as one of the main sources of freshwater all over the world. Hence analysis and maintenance of this water resource is globally considered a matter of major concern. This paper deals with the assessment of surface water quality of the Ichamati river using multivariate statistical techniques. Eight distinct surface water quality observation stations were located and samples were collected. For the samples collected statistical techniques were applied to the physico-chemical parameters and depth of siltation. In this paper cluster analysis is done to determine the relations between surface water quality and siltation depth of river Ichamati. Multiple regressions and mathematical equation modeling have been done to characterize surface water quality of Ichamati river on the basis of physico-chemical parameters. It was found that surface water quality of the downstream river was different from the water quality of the upstream. The analysis of the water quality parameters of the Ichamati river clearly indicate high pollution load on the river water which can be accounted to agricultural discharge, tidal effect and soil erosion. The results further reveal that with the increase in depth of siltation, water quality degraded.

  17. Using Social Network Analysis to Better Understand Compulsive Exercise Behavior Among a Sample of Sorority Members.

    PubMed

    Patterson, Megan S; Goodson, Patricia

    2017-05-01

    Compulsive exercise, a form of unhealthy exercise often associated with prioritizing exercise and feeling guilty when exercise is missed, is a common precursor to and symptom of eating disorders. College-aged women are at high risk of exercising compulsively compared with other groups. Social network analysis (SNA) is a theoretical perspective and methodology allowing researchers to observe the effects of relational dynamics on the behaviors of people. SNA was used to assess the relationship between compulsive exercise and body dissatisfaction, physical activity, and network variables. Descriptive statistics were conducted using SPSS, and quadratic assignment procedure (QAP) analyses were conducted using UCINET. QAP regression analysis revealed a statistically significant model (R 2 = .375, P < .0001) predicting compulsive exercise behavior. Physical activity, body dissatisfaction, and network variables were statistically significant predictor variables in the QAP regression model. In our sample, women who are connected to "important" or "powerful" people in their network are likely to have higher compulsive exercise scores. This result provides healthcare practitioners key target points for intervention within similar groups of women. For scholars researching eating disorders and associated behaviors, this study supports looking into group dynamics and network structure in conjunction with body dissatisfaction and exercise frequency.

  18. A Tool for Estimating Variability in Wood Preservative Treatment Retention

    Treesearch

    Patricia K. Lebow; Adam M. Taylor; Timothy M. Young

    2015-01-01

    Composite sampling is standard practice for evaluation of preservative retention levels in preservative-treated wood. Current protocols provide an average retention value but no estimate of uncertainty. Here we describe a statistical method for calculating uncertainty estimates using the standard sampling regime with minimal additional chemical analysis. This tool can...

  19. Time Delay Embedding Increases Estimation Precision of Models of Intraindividual Variability

    ERIC Educational Resources Information Center

    von Oertzen, Timo; Boker, Steven M.

    2010-01-01

    This paper investigates the precision of parameters estimated from local samples of time dependent functions. We find that "time delay embedding," i.e., structuring data prior to analysis by constructing a data matrix of overlapping samples, increases the precision of parameter estimates and in turn statistical power compared to standard…

  20. A multicenter study of viable PCR using propidium monoazide to detect Legionella in water samples.

    PubMed

    Scaturro, Maria; Fontana, Stefano; Dell'eva, Italo; Helfer, Fabrizia; Marchio, Michele; Stefanetti, Maria Vittoria; Cavallaro, Mario; Miglietta, Marilena; Montagna, Maria Teresa; De Giglio, Osvalda; Cuna, Teresa; Chetti, Leonarda; Sabattini, Maria Antonietta Bucci; Carlotti, Michela; Viggiani, Mariagabriella; Stenico, Alberta; Romanin, Elisa; Bonanni, Emma; Ottaviano, Claudio; Franzin, Laura; Avanzini, Claudio; Demarie, Valerio; Corbella, Marta; Cambieri, Patrizia; Marone, Piero; Rota, Maria Cristina; Bella, Antonino; Ricci, Maria Luisa

    2016-07-01

    Legionella quantification in environmental samples is overestimated by qPCR. Combination with a viable dye, such as Propidium monoazide (PMA), could make qPCR (named then vPCR) very reliable. In this multicentre study 717 artificial water samples, spiked with fixed concentrations of Legionella and interfering bacterial flora, were analysed by qPCR, vPCR and culture and data were compared by statistical analysis. A heat-treatment at 55 °C for 10 minutes was also performed to obtain viable and not-viable bacteria. When data of vPCR were compared with those of culture and qPCR, statistical analysis showed significant differences (P < 0.001). However, although the heat-treatment caused an abatement of CFU/mL ≤1 to 1 log10 unit, the comparison between untreated and heat-treated samples analysed by vPCR highlighted non-significant differences (P > 0.05). Overall this study provided a good experimental reproducibility of vPCR but also highlighted limits of PMA in the discriminating capability of dead and live bacteria, making vPCR not completely reliable. Copyright © 2016 Elsevier Inc. All rights reserved.

  1. Determination of variables in the prediction of strontium distribution coefficients for selected sediments

    USGS Publications Warehouse

    Pace, M.N.; Rosentreter, J.J.; Bartholomay, R.C.

    2001-01-01

    Idaho State University and the US Geological Survey, in cooperation with the US Department of Energy, conducted a study to determine and evaluate strontium distribution coefficients (Kds) of subsurface materials at the Idaho National Engineering and Environmental Laboratory (INEEL). The Kds were determined to aid in assessing the variability of strontium Kds and their effects on chemical transport of strontium-90 in the Snake River Plain aquifer system. Data from batch experiments done to determine strontium Kds of five sediment-infill samples and six standard reference material samples were analyzed by using multiple linear regression analysis and the stepwise variable-selection method in the statistical program, Statistical Product and Service Solutions, to derive an equation of variables that can be used to predict strontium Kds of sediment-infill samples. The sediment-infill samples were from basalt vesicles and fractures from a selected core at the INEEL; strontium Kds ranged from ???201 to 356 ml g-1. The standard material samples consisted of clay minerals and calcite. The statistical analyses of the batch-experiment results showed that the amount of strontium in the initial solution, the amount of manganese oxide in the sample material, and the amount of potassium in the initial solution are the most important variables in predicting strontium Kds of sediment-infill samples.

  2. On the Use of Biomineral Oxygen Isotope Data to Identify Human Migrants in the Archaeological Record: Intra-Sample Variation, Statistical Methods and Geographical Considerations

    PubMed Central

    Lightfoot, Emma; O’Connell, Tamsin C.

    2016-01-01

    Oxygen isotope analysis of archaeological skeletal remains is an increasingly popular tool to study past human migrations. It is based on the assumption that human body chemistry preserves the δ18O of precipitation in such a way as to be a useful technique for identifying migrants and, potentially, their homelands. In this study, the first such global survey, we draw on published human tooth enamel and bone bioapatite data to explore the validity of using oxygen isotope analyses to identify migrants in the archaeological record. We use human δ18O results to show that there are large variations in human oxygen isotope values within a population sample. This may relate to physiological factors influencing the preservation of the primary isotope signal, or due to human activities (such as brewing, boiling, stewing, differential access to water sources and so on) causing variation in ingested water and food isotope values. We compare the number of outliers identified using various statistical methods. We determine that the most appropriate method for identifying migrants is dependent on the data but is likely to be the IQR or median absolute deviation from the median under most archaeological circumstances. Finally, through a spatial assessment of the dataset, we show that the degree of overlap in human isotope values from different locations across Europe is such that identifying individuals’ homelands on the basis of oxygen isotope analysis alone is not possible for the regions analysed to date. Oxygen isotope analysis is a valid method for identifying first-generation migrants from an archaeological site when used appropriately, however it is difficult to identify migrants using statistical methods for a sample size of less than c. 25 individuals. In the absence of local previous analyses, each sample should be treated as an individual dataset and statistical techniques can be used to identify migrants, but in most cases pinpointing a specific homeland should not be attempted. PMID:27124001

  3. Statistics Clinic

    NASA Technical Reports Server (NTRS)

    Feiveson, Alan H.; Foy, Millennia; Ploutz-Snyder, Robert; Fiedler, James

    2014-01-01

    Do you have elevated p-values? Is the data analysis process getting you down? Do you experience anxiety when you need to respond to criticism of statistical methods in your manuscript? You may be suffering from Insufficient Statistical Support Syndrome (ISSS). For symptomatic relief of ISSS, come for a free consultation with JSC biostatisticians at our help desk during the poster sessions at the HRP Investigators Workshop. Get answers to common questions about sample size, missing data, multiple testing, when to trust the results of your analyses and more. Side effects may include sudden loss of statistics anxiety, improved interpretation of your data, and increased confidence in your results.

  4. 42 CFR 402.109 - Statistical sampling.

    Code of Federal Regulations, 2011 CFR

    2011-10-01

    ... 42 Public Health 2 2011-10-01 2011-10-01 false Statistical sampling. 402.109 Section 402.109... Statistical sampling. (a) Purpose. CMS or OIG may introduce the results of a statistical sampling study to... or caused to be presented. (b) Prima facie evidence. The results of the statistical sampling study...

  5. 42 CFR 402.109 - Statistical sampling.

    Code of Federal Regulations, 2010 CFR

    2010-10-01

    ... 42 Public Health 2 2010-10-01 2010-10-01 false Statistical sampling. 402.109 Section 402.109... Statistical sampling. (a) Purpose. CMS or OIG may introduce the results of a statistical sampling study to... or caused to be presented. (b) Prima facie evidence. The results of the statistical sampling study...

  6. Sampling designs for contaminant temporal trend analyses using sedentary species exemplified by the snails Bellamya aeruginosa and Viviparus viviparus.

    PubMed

    Yin, Ge; Danielsson, Sara; Dahlberg, Anna-Karin; Zhou, Yihui; Qiu, Yanling; Nyberg, Elisabeth; Bignert, Anders

    2017-10-01

    Environmental monitoring typically assumes samples and sampling activities to be representative of the population being studied. Given a limited budget, an appropriate sampling strategy is essential to support detecting temporal trends of contaminants. In the present study, based on real chemical analysis data on polybrominated diphenyl ethers in snails collected from five subsites in Tianmu Lake, computer simulation is performed to evaluate three sampling strategies by the estimation of required sample size, to reach a detection of an annual change of 5% with a statistical power of 80% and 90% with a significant level of 5%. The results showed that sampling from an arbitrarily selected sampling spot is the worst strategy, requiring much more individual analyses to achieve the above mentioned criteria compared with the other two approaches. A fixed sampling site requires the lowest sample size but may not be representative for the intended study object e.g. a lake and is also sensitive to changes of that particular sampling site. In contrast, sampling at multiple sites along the shore each year, and using pooled samples when the cost to collect and prepare individual specimens are much lower than the cost for chemical analysis, would be the most robust and cost efficient strategy in the long run. Using statistical power as criterion, the results demonstrated quantitatively the consequences of various sampling strategies, and could guide users with respect of required sample sizes depending on sampling design for long term monitoring programs. Copyright © 2017 Elsevier Ltd. All rights reserved.

  7. Assessment of sampling stability in ecological applications of discriminant analysis

    USGS Publications Warehouse

    Williams, B.K.; Titus, K.

    1988-01-01

    A simulation study was undertaken to assess the sampling stability of the variable loadings in linear discriminant function analysis. A factorial design was used for the factors of multivariate dimensionality, dispersion structure, configuration of group means, and sample size. A total of 32,400 discriminant analyses were conducted, based on data from simulated populations with appropriate underlying statistical distributions. A review of 60 published studies and 142 individual analyses indicated that sample sizes in ecological studies often have met that requirement. However, individual group sample sizes frequently were very unequal, and checks of assumptions usually were not reported. The authors recommend that ecologists obtain group sample sizes that are at least three times as large as the number of variables measured.

  8. Quantifying in situ Zooplankton Movement and Trophic Impacts on Thin Layers in East Sound, Washington

    DTIC Science & Technology

    2006-09-30

    strength of the combination is that the tracking system quantifies swimming behaviors of protists in natural seawater samples with large numbers of motile...Sound was to link observations of thin layers to behavioral analysis of protists resident above, within, and below these features. Analysis of our...cells and diatom chains. We are not yet able to make statistical statements about swimming characteristics of the motile protists in our video samples

  9. Partial Least Squares Regression Can Aid in Detecting Differential Abundance of Multiple Features in Sets of Metagenomic Samples

    PubMed Central

    Libiger, Ondrej; Schork, Nicholas J.

    2015-01-01

    It is now feasible to examine the composition and diversity of microbial communities (i.e., “microbiomes”) that populate different human organs and orifices using DNA sequencing and related technologies. To explore the potential links between changes in microbial communities and various diseases in the human body, it is essential to test associations involving different species within and across microbiomes, environmental settings and disease states. Although a number of statistical techniques exist for carrying out relevant analyses, it is unclear which of these techniques exhibit the greatest statistical power to detect associations given the complexity of most microbiome datasets. We compared the statistical power of principal component regression, partial least squares regression, regularized regression, distance-based regression, Hill's diversity measures, and a modified test implemented in the popular and widely used microbiome analysis methodology “Metastats” across a wide range of simulated scenarios involving changes in feature abundance between two sets of metagenomic samples. For this purpose, simulation studies were used to change the abundance of microbial species in a real dataset from a published study examining human hands. Each technique was applied to the same data, and its ability to detect the simulated change in abundance was assessed. We hypothesized that a small subset of methods would outperform the rest in terms of the statistical power. Indeed, we found that the Metastats technique modified to accommodate multivariate analysis and partial least squares regression yielded high power under the models and data sets we studied. The statistical power of diversity measure-based tests, distance-based regression and regularized regression was significantly lower. Our results provide insight into powerful analysis strategies that utilize information on species counts from large microbiome data sets exhibiting skewed frequency distributions obtained on a small to moderate number of samples. PMID:26734061

  10. Analysis models for the estimation of oceanic fields

    NASA Technical Reports Server (NTRS)

    Carter, E. F.; Robinson, A. R.

    1987-01-01

    A general model for statistically optimal estimates is presented for dealing with scalar, vector and multivariate datasets. The method deals with anisotropic fields and treats space and time dependence equivalently. Problems addressed include the analysis, or the production of synoptic time series of regularly gridded fields from irregular and gappy datasets, and the estimate of fields by compositing observations from several different instruments and sampling schemes. Technical issues are discussed, including the convergence of statistical estimates, the choice of representation of the correlations, the influential domain of an observation, and the efficiency of numerical computations.

  11. Microarray R-based analysis of complex lysate experiments with MIRACLE

    PubMed Central

    List, Markus; Block, Ines; Pedersen, Marlene Lemvig; Christiansen, Helle; Schmidt, Steffen; Thomassen, Mads; Tan, Qihua; Baumbach, Jan; Mollenhauer, Jan

    2014-01-01

    Motivation: Reverse-phase protein arrays (RPPAs) allow sensitive quantification of relative protein abundance in thousands of samples in parallel. Typical challenges involved in this technology are antibody selection, sample preparation and optimization of staining conditions. The issue of combining effective sample management and data analysis, however, has been widely neglected. Results: This motivated us to develop MIRACLE, a comprehensive and user-friendly web application bridging the gap between spotting and array analysis by conveniently keeping track of sample information. Data processing includes correction of staining bias, estimation of protein concentration from response curves, normalization for total protein amount per sample and statistical evaluation. Established analysis methods have been integrated with MIRACLE, offering experimental scientists an end-to-end solution for sample management and for carrying out data analysis. In addition, experienced users have the possibility to export data to R for more complex analyses. MIRACLE thus has the potential to further spread utilization of RPPAs as an emerging technology for high-throughput protein analysis. Availability: Project URL: http://www.nanocan.org/miracle/ Contact: mlist@health.sdu.dk Supplementary information: Supplementary data are available at Bioinformatics online. PMID:25161257

  12. Microarray R-based analysis of complex lysate experiments with MIRACLE.

    PubMed

    List, Markus; Block, Ines; Pedersen, Marlene Lemvig; Christiansen, Helle; Schmidt, Steffen; Thomassen, Mads; Tan, Qihua; Baumbach, Jan; Mollenhauer, Jan

    2014-09-01

    Reverse-phase protein arrays (RPPAs) allow sensitive quantification of relative protein abundance in thousands of samples in parallel. Typical challenges involved in this technology are antibody selection, sample preparation and optimization of staining conditions. The issue of combining effective sample management and data analysis, however, has been widely neglected. This motivated us to develop MIRACLE, a comprehensive and user-friendly web application bridging the gap between spotting and array analysis by conveniently keeping track of sample information. Data processing includes correction of staining bias, estimation of protein concentration from response curves, normalization for total protein amount per sample and statistical evaluation. Established analysis methods have been integrated with MIRACLE, offering experimental scientists an end-to-end solution for sample management and for carrying out data analysis. In addition, experienced users have the possibility to export data to R for more complex analyses. MIRACLE thus has the potential to further spread utilization of RPPAs as an emerging technology for high-throughput protein analysis. Project URL: http://www.nanocan.org/miracle/. Supplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press.

  13. Overall voice and strain level analysis in rock singers.

    PubMed

    Gonsalves, Aline; Amin, Elisabeth; Behlau, Mara

    2010-01-01

    overall voice and strain level analysis in rock singers. to analyze the voice o rock singers according to two specific parameters: overall level of vocal deviation (OLVD) and strain level (SL); to compare these parameters in three different music samples. participants were 26 male rock singers, ranging in age from 17 to 46 years (mean = 29.8 years). All of the participants answered a questionnaire for sample characterization and were submitted to the recording of three voice samples: Brazilian National Anthem (BNA), Satisfaction and self-selected repertoire song (RS). Voice samples were analyzed by five speech-language pathologists according to OLVD and SL. Statistical analysis was done using the software SPSS, version 13.0. statistically significant differences were observed for the mean values of OLVD and SL during the performance of Satisfaction (OLVD = 32.8 and SL = 0.024 / p=0.024) and during the RS performance (OLVD = 38.4 and SL = 55.8 / p=0.010). The values of OLVD and SL are directly proportional to the samples of the BNA* and RS**, i.e. the higher the strain the higher the OLVD (p,0.001*; p=0.010**). When individually analyzing the three song samples, it is observed that the OLVD does not vary significantly among them. However, the mean values present a trend to increase from non-rock to rock performances (24.0 BNA / 32.8 Satisfaction / 38.4 RS). The level of strain found during the BNA performance presents statistically significant difference when compared to the rock performances (Satisfaction and RS, p=0.008 and p=0.001). the obtained data suggest that rock style is related to the greater use of vocal strain and that this strain does not necessarily impose a negative impression to the voice, but corresponds to a common interpretative factor related to this style of music.

  14. Improving the sampling strategy of the Joint Danube Survey 3 (2013) by means of multivariate statistical techniques applied on selected physico-chemical and biological data.

    PubMed

    Hamchevici, Carmen; Udrea, Ion

    2013-11-01

    The concept of basin-wide Joint Danube Survey (JDS) was launched by the International Commission for the Protection of the Danube River (ICPDR) as a tool for investigative monitoring under the Water Framework Directive (WFD), with a frequency of 6 years. The first JDS was carried out in 2001 and its success in providing key information for characterisation of the Danube River Basin District as required by WFD lead to the organisation of the second JDS in 2007, which was the world's biggest river research expedition in that year. The present paper presents an approach for improving the survey strategy for the next planned survey JDS3 (2013) by means of several multivariate statistical techniques. In order to design the optimum structure in terms of parameters and sampling sites, principal component analysis (PCA), factor analysis (FA) and cluster analysis were applied on JDS2 data for 13 selected physico-chemical and one biological element measured in 78 sampling sites located on the main course of the Danube. Results from PCA/FA showed that most of the dataset variance (above 75%) was explained by five varifactors loaded with 8 out of 14 variables: physical (transparency and total suspended solids), relevant nutrients (N-nitrates and P-orthophosphates), feedback effects of primary production (pH, alkalinity and dissolved oxygen) and algal biomass. Taking into account the representation of the factor scores given by FA versus sampling sites and the major groups generated by the clustering procedure, the spatial network of the next survey could be carefully tailored, leading to a decreasing of sampling sites by more than 30%. The approach of target oriented sampling strategy based on the selected multivariate statistics can provide a strong reduction in dimensionality of the original data and corresponding costs as well, without any loss of information.

  15. Power analysis and trend detection for water quality monitoring data. An application for the Greater Yellowstone Inventory and Monitoring Network

    USGS Publications Warehouse

    Irvine, Kathryn M.; Manlove, Kezia; Hollimon, Cynthia

    2012-01-01

    An important consideration for long term monitoring programs is determining the required sampling effort to detect trends in specific ecological indicators of interest. To enhance the Greater Yellowstone Inventory and Monitoring Network’s water resources protocol(s) (O’Ney 2006 and O’Ney et al. 2009 [under review]), we developed a set of tools to: (1) determine the statistical power for detecting trends of varying magnitude in a specified water quality parameter over different lengths of sampling (years) and different within-year collection frequencies (monthly or seasonal sampling) at particular locations using historical data, and (2) perform periodic trend analyses for water quality parameters while addressing seasonality and flow weighting. A power analysis for trend detection is a statistical procedure used to estimate the probability of rejecting the hypothesis of no trend when in fact there is a trend, within a specific modeling framework. In this report, we base our power estimates on using the seasonal Kendall test (Helsel and Hirsch 2002) for detecting trend in water quality parameters measured at fixed locations over multiple years. We also present procedures (R-scripts) for conducting a periodic trend analysis using the seasonal Kendall test with and without flow adjustment. This report provides the R-scripts developed for power and trend analysis, tutorials, and the associated tables and graphs. The purpose of this report is to provide practical information for monitoring network staff on how to use these statistical tools for water quality monitoring data sets.

  16. The multiple imputation method: a case study involving secondary data analysis.

    PubMed

    Walani, Salimah R; Cleland, Charles M

    2015-05-01

    To illustrate with the example of a secondary data analysis study the use of the multiple imputation method to replace missing data. Most large public datasets have missing data, which need to be handled by researchers conducting secondary data analysis studies. Multiple imputation is a technique widely used to replace missing values while preserving the sample size and sampling variability of the data. The 2004 National Sample Survey of Registered Nurses. The authors created a model to impute missing values using the chained equation method. They used imputation diagnostics procedures and conducted regression analysis of imputed data to determine the differences between the log hourly wages of internationally educated and US-educated registered nurses. The authors used multiple imputation procedures to replace missing values in a large dataset with 29,059 observations. Five multiple imputed datasets were created. Imputation diagnostics using time series and density plots showed that imputation was successful. The authors also present an example of the use of multiple imputed datasets to conduct regression analysis to answer a substantive research question. Multiple imputation is a powerful technique for imputing missing values in large datasets while preserving the sample size and variance of the data. Even though the chained equation method involves complex statistical computations, recent innovations in software and computation have made it possible for researchers to conduct this technique on large datasets. The authors recommend nurse researchers use multiple imputation methods for handling missing data to improve the statistical power and external validity of their studies.

  17. Measuring the statistical validity of summary meta-analysis and meta-regression results for use in clinical practice.

    PubMed

    Willis, Brian H; Riley, Richard D

    2017-09-20

    An important question for clinicians appraising a meta-analysis is: are the findings likely to be valid in their own practice-does the reported effect accurately represent the effect that would occur in their own clinical population? To this end we advance the concept of statistical validity-where the parameter being estimated equals the corresponding parameter for a new independent study. Using a simple ('leave-one-out') cross-validation technique, we demonstrate how we may test meta-analysis estimates for statistical validity using a new validation statistic, Vn, and derive its distribution. We compare this with the usual approach of investigating heterogeneity in meta-analyses and demonstrate the link between statistical validity and homogeneity. Using a simulation study, the properties of Vn and the Q statistic are compared for univariate random effects meta-analysis and a tailored meta-regression model, where information from the setting (included as model covariates) is used to calibrate the summary estimate to the setting of application. Their properties are found to be similar when there are 50 studies or more, but for fewer studies Vn has greater power but a higher type 1 error rate than Q. The power and type 1 error rate of Vn are also shown to depend on the within-study variance, between-study variance, study sample size, and the number of studies in the meta-analysis. Finally, we apply Vn to two published meta-analyses and conclude that it usefully augments standard methods when deciding upon the likely validity of summary meta-analysis estimates in clinical practice. © 2017 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd. © 2017 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd.

  18. Simulation of parametric model towards the fixed covariate of right censored lung cancer data

    NASA Astrophysics Data System (ADS)

    Afiqah Muhamad Jamil, Siti; Asrul Affendi Abdullah, M.; Kek, Sie Long; Ridwan Olaniran, Oyebayo; Enera Amran, Syahila

    2017-09-01

    In this study, simulation procedure was applied to measure the fixed covariate of right censored data by using parametric survival model. The scale and shape parameter were modified to differentiate the analysis of parametric regression survival model. Statistically, the biases, mean biases and the coverage probability were used in this analysis. Consequently, different sample sizes were employed to distinguish the impact of parametric regression model towards right censored data with 50, 100, 150 and 200 number of sample. R-statistical software was utilised to develop the coding simulation with right censored data. Besides, the final model of right censored simulation was compared with the right censored lung cancer data in Malaysia. It was found that different values of shape and scale parameter with different sample size, help to improve the simulation strategy for right censored data and Weibull regression survival model is suitable fit towards the simulation of survival of lung cancer patients data in Malaysia.

  19. Effect of environment and genotype on commercial maize hybrids using LC/MS-based metabolomics.

    PubMed

    Baniasadi, Hamid; Vlahakis, Chris; Hazebroek, Jan; Zhong, Cathy; Asiago, Vincent

    2014-02-12

    We recently applied gas chromatography coupled to time-of-flight mass spectrometry (GC/TOF-MS) and multivariate statistical analysis to measure biological variation of many metabolites due to environment and genotype in forage and grain samples collected from 50 genetically diverse nongenetically modified (non-GM) DuPont Pioneer commercial maize hybrids grown at six North American locations. In the present study, the metabolome coverage was extended using a core subset of these grain and forage samples employing ultra high pressure liquid chromatography (uHPLC) mass spectrometry (LC/MS). A total of 286 and 857 metabolites were detected in grain and forage samples, respectively, using LC/MS. Multivariate statistical analysis was utilized to compare and correlate the metabolite profiles. Environment had a greater effect on the metabolome than genetic background. The results of this study support and extend previously published insights into the environmental and genetic associated perturbations to the metabolome that are not associated with transgenic modification.

  20. Comparative statistical analysis of carcinogenic and non-carcinogenic effects of uranium in groundwater samples from different regions of Punjab, India.

    PubMed

    Saini, Komal; Singh, Parminder; Bajwa, Bikramjit Singh

    2016-12-01

    LED flourimeter has been used for microanalysis of uranium concentration in groundwater samples collected from six districts of South West (SW), West (W) and North East (NE) Punjab, India. Average value of uranium content in water samples of SW Punjab is observed to be higher than WHO, USEPA recommended safe limit of 30µgl -1 as well as AERB proposed limit of 60µgl -1 . Whereas, for W and NE region of Punjab, average level of uranium concentration was within AERB recommended limit of 60µgl -1 . Average value observed in SW Punjab is around 3-4 times the value observed in W Punjab, whereas its value is more than 17 times the average value observed in NE region of Punjab. Statistical analysis of carcinogenic as well as non carcinogenic risks due to uranium have been evaluated for each studied district. Copyright © 2016 Elsevier Ltd. All rights reserved.

  1. HPLC determination of caffeine in coffee beverage

    NASA Astrophysics Data System (ADS)

    Fajara, B. E. P.; Susanti, H.

    2017-11-01

    Coffee is the second largest beverage which is consumed by people in the world, besides the water. One of the compounds which contained in coffee is caffeine. Caffeine has the pharmacological effect such as stimulating the central nervous system. The purpose of this study is to determine the level of caffeine in coffee beverages with HPLC method. Three branded coffee beverages which include in 3 of Top Brand Index 2016 Phase 2 were used as samples. Qualitative analysis was performed by Parry method, Dragendorff reagent, and comparing the retention time between sample and caffeine standard. Quantitative analysis was done by HPLC method with methanol-water (95:5v/v) as mobile phase and ODS as stationary phasewith flow rate 1 mL/min and UV 272 nm as the detector. The level of caffeine data was statistically analyzed using Anova at 95% confidence level. The Qualitative analysis showed that the three samples contained caffeine. The average of caffeine level in coffee bottles of X, Y, and Z were 138.048 mg/bottle, 109.699 mg/bottle, and 147.669 mg/bottle, respectively. The caffeine content of the three coffee beverage samples are statistically different (p<0.05). The levels of caffeine contained in X, Y, and Z coffee beverage samples were not meet the requirements set by the Indonesian Standard Agency of 50 mg/serving.

  2. Analysis of Statistical Methods Currently used in Toxicology Journals

    PubMed Central

    Na, Jihye; Yang, Hyeri

    2014-01-01

    Statistical methods are frequently used in toxicology, yet it is not clear whether the methods employed by the studies are used consistently and conducted based on sound statistical grounds. The purpose of this paper is to describe statistical methods used in top toxicology journals. More specifically, we sampled 30 papers published in 2014 from Toxicology and Applied Pharmacology, Archives of Toxicology, and Toxicological Science and described methodologies used to provide descriptive and inferential statistics. One hundred thirteen endpoints were observed in those 30 papers, and most studies had sample size less than 10, with the median and the mode being 6 and 3 & 6, respectively. Mean (105/113, 93%) was dominantly used to measure central tendency, and standard error of the mean (64/113, 57%) and standard deviation (39/113, 34%) were used to measure dispersion, while few studies provide justifications regarding why the methods being selected. Inferential statistics were frequently conducted (93/113, 82%), with one-way ANOVA being most popular (52/93, 56%), yet few studies conducted either normality or equal variance test. These results suggest that more consistent and appropriate use of statistical method is necessary which may enhance the role of toxicology in public health. PMID:25343012

  3. Analysis of Statistical Methods Currently used in Toxicology Journals.

    PubMed

    Na, Jihye; Yang, Hyeri; Bae, SeungJin; Lim, Kyung-Min

    2014-09-01

    Statistical methods are frequently used in toxicology, yet it is not clear whether the methods employed by the studies are used consistently and conducted based on sound statistical grounds. The purpose of this paper is to describe statistical methods used in top toxicology journals. More specifically, we sampled 30 papers published in 2014 from Toxicology and Applied Pharmacology, Archives of Toxicology, and Toxicological Science and described methodologies used to provide descriptive and inferential statistics. One hundred thirteen endpoints were observed in those 30 papers, and most studies had sample size less than 10, with the median and the mode being 6 and 3 & 6, respectively. Mean (105/113, 93%) was dominantly used to measure central tendency, and standard error of the mean (64/113, 57%) and standard deviation (39/113, 34%) were used to measure dispersion, while few studies provide justifications regarding why the methods being selected. Inferential statistics were frequently conducted (93/113, 82%), with one-way ANOVA being most popular (52/93, 56%), yet few studies conducted either normality or equal variance test. These results suggest that more consistent and appropriate use of statistical method is necessary which may enhance the role of toxicology in public health.

  4. Analysis and meta-analysis of single-case designs: an introduction.

    PubMed

    Shadish, William R

    2014-04-01

    The last 10 years have seen great progress in the analysis and meta-analysis of single-case designs (SCDs). This special issue includes five articles that provide an overview of current work on that topic, including standardized mean difference statistics, multilevel models, Bayesian statistics, and generalized additive models. Each article analyzes a common example across articles and presents syntax or macros for how to do them. These articles are followed by commentaries from single-case design researchers and journal editors. This introduction briefly describes each article and then discusses several issues that must be addressed before we can know what analyses will eventually be best to use in SCD research. These issues include modeling trend, modeling error covariances, computing standardized effect size estimates, assessing statistical power, incorporating more accurate models of outcome distributions, exploring whether Bayesian statistics can improve estimation given the small samples common in SCDs, and the need for annotated syntax and graphical user interfaces that make complex statistics accessible to SCD researchers. The article then discusses reasons why SCD researchers are likely to incorporate statistical analyses into their research more often in the future, including changing expectations and contingencies regarding SCD research from outside SCD communities, changes and diversity within SCD communities, corrections of erroneous beliefs about the relationship between SCD research and statistics, and demonstrations of how statistics can help SCD researchers better meet their goals. Copyright © 2013 Society for the Study of School Psychology. Published by Elsevier Ltd. All rights reserved.

  5. Data-optimized source modeling with the Backwards Liouville Test–Kinetic method

    DOE PAGES

    Woodroffe, J. R.; Brito, T. V.; Jordanova, V. K.; ...

    2017-09-14

    In the standard practice of neutron multiplicity counting , the first three sampled factorial moments of the event triggered neutron count distribution were used to quantify the three main neutron source terms: the spontaneous fissile material effective mass, the relative (α,n) production and the induced fission source responsible for multiplication. Our study compares three methods to quantify the statistical uncertainty of the estimated mass: the bootstrap method, propagation of variance through moments, and statistical analysis of cycle data method. Each of the three methods was implemented on a set of four different NMC measurements, held at the JRC-laboratory in Ispra,more » Italy, sampling four different Pu samples in a standard Plutonium Scrap Multiplicity Counter (PSMC) well counter.« less

  6. Estimating the mass variance in neutron multiplicity counting-A comparison of approaches

    NASA Astrophysics Data System (ADS)

    Dubi, C.; Croft, S.; Favalli, A.; Ocherashvili, A.; Pedersen, B.

    2017-12-01

    In the standard practice of neutron multiplicity counting , the first three sampled factorial moments of the event triggered neutron count distribution are used to quantify the three main neutron source terms: the spontaneous fissile material effective mass, the relative (α , n) production and the induced fission source responsible for multiplication. This study compares three methods to quantify the statistical uncertainty of the estimated mass: the bootstrap method, propagation of variance through moments, and statistical analysis of cycle data method. Each of the three methods was implemented on a set of four different NMC measurements, held at the JRC-laboratory in Ispra, Italy, sampling four different Pu samples in a standard Plutonium Scrap Multiplicity Counter (PSMC) well counter.

  7. Estimating the mass variance in neutron multiplicity counting $-$ A comparison of approaches

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Dubi, C.; Croft, S.; Favalli, A.

    In the standard practice of neutron multiplicity counting, the first three sampled factorial moments of the event triggered neutron count distribution are used to quantify the three main neutron source terms: the spontaneous fissile material effective mass, the relative (α,n) production and the induced fission source responsible for multiplication. This study compares three methods to quantify the statistical uncertainty of the estimated mass: the bootstrap method, propagation of variance through moments, and statistical analysis of cycle data method. Each of the three methods was implemented on a set of four different NMC measurements, held at the JRC-laboratory in Ispra, Italy,more » sampling four different Pu samples in a standard Plutonium Scrap Multiplicity Counter (PSMC) well counter.« less

  8. Estimating the mass variance in neutron multiplicity counting $-$ A comparison of approaches

    DOE PAGES

    Dubi, C.; Croft, S.; Favalli, A.; ...

    2017-09-14

    In the standard practice of neutron multiplicity counting, the first three sampled factorial moments of the event triggered neutron count distribution are used to quantify the three main neutron source terms: the spontaneous fissile material effective mass, the relative (α,n) production and the induced fission source responsible for multiplication. This study compares three methods to quantify the statistical uncertainty of the estimated mass: the bootstrap method, propagation of variance through moments, and statistical analysis of cycle data method. Each of the three methods was implemented on a set of four different NMC measurements, held at the JRC-laboratory in Ispra, Italy,more » sampling four different Pu samples in a standard Plutonium Scrap Multiplicity Counter (PSMC) well counter.« less

  9. Software for Data Analysis with Graphical Models

    NASA Technical Reports Server (NTRS)

    Buntine, Wray L.; Roy, H. Scott

    1994-01-01

    Probabilistic graphical models are being used widely in artificial intelligence and statistics, for instance, in diagnosis and expert systems, as a framework for representing and reasoning with probabilities and independencies. They come with corresponding algorithms for performing statistical inference. This offers a unifying framework for prototyping and/or generating data analysis algorithms from graphical specifications. This paper illustrates the framework with an example and then presents some basic techniques for the task: problem decomposition and the calculation of exact Bayes factors. Other tools already developed, such as automatic differentiation, Gibbs sampling, and use of the EM algorithm, make this a broad basis for the generation of data analysis software.

  10. Impact of Satellite Viewing-Swath Width on Global and Regional Aerosol Optical Thickness Statistics and Trends

    NASA Technical Reports Server (NTRS)

    Colarco, P. R.; Kahn, R. A.; Remer, L. A.; Levy, R. C.

    2014-01-01

    We use the Moderate Resolution Imaging Spectroradiometer (MODIS) satellite aerosol optical thickness (AOT) product to assess the impact of reduced swath width on global and regional AOT statistics and trends. Alongtrack and across-track sampling strategies are employed, in which the full MODIS data set is sub-sampled with various narrow-swath (approximately 400-800 km) and single pixel width (approximately 10 km) configurations. Although view-angle artifacts in the MODIS AOT retrieval confound direct comparisons between averages derived from different sub-samples, careful analysis shows that with many portions of the Earth essentially unobserved, spatial sampling introduces uncertainty in the derived seasonal-regional mean AOT. These AOT spatial sampling artifacts comprise up to 60%of the full-swath AOT value under moderate aerosol loading, and can be as large as 0.1 in some regions under high aerosol loading. Compared to full-swath observations, narrower swath and single pixel width sampling exhibits a reduced ability to detect AOT trends with statistical significance. On the other hand, estimates of the global, annual mean AOT do not vary significantly from the full-swath values as spatial sampling is reduced. Aggregation of the MODIS data at coarse grid scales (10 deg) shows consistency in the aerosol trends across sampling strategies, with increased statistical confidence, but quantitative errors in the derived trends are found even for the full-swath data when compared to high spatial resolution (0.5 deg) aggregations. Using results of a model-derived aerosol reanalysis, we find consistency in our conclusions about a seasonal-regional spatial sampling artifact in AOT Furthermore, the model shows that reduced spatial sampling can amount to uncertainty in computed shortwave top-ofatmosphere aerosol radiative forcing of 2-3 W m(sup-2). These artifacts are lower bounds, as possibly other unconsidered sampling strategies would perform less well. These results suggest that future aerosol satellite missions having significantly less than full-swath viewing are unlikely to sample the true AOT distribution well enough to obtain the statistics needed to reduce uncertainty in aerosol direct forcing of climate.

  11. Supercritical Fluid Extraction and Analysis of Tropospheric Aerosol Particles

    NASA Astrophysics Data System (ADS)

    Hansen, Kristen J.

    An integrated sampling and supercritical fluid extraction (SFE) cell has been designed for whole-sample analysis of organic compounds on tropospheric aerosol particles. The low-volume extraction cell has been interfaced with a sampling manifold for aerosol particle collection in the field. After sample collection, the entire SFE cell was coupled to a gas chromatograph; after on-line extraction, the cryogenically -focused sample was separated and the volatile compounds detected with either a mass spectrometer or a flame ionization detector. A 20-minute extraction at 450 atm and 90 ^circC with pure supercritical CO _2 is sufficient for quantitative extraction of most volatile compounds in aerosol particle samples. A comparison between SFE and thermal desorption, the traditional whole-sample technique for analyses of this type, was performed using ambient aerosol particle samples, as well as samples containing known amounts of standard analytes. The results of these studies indicate that SFE of atmospheric aerosol particles provides quantitative measurement of several classes of organic compounds. SFE provides information that is complementary to that gained by the thermal desorption analysis. The results also indicate that SFE with CO _2 can be validated as an alternative to thermal desorption for quantitative recovery of several organic compounds. In 1989, the organic constituents of atmospheric aerosol particles collected at Niwot Ridge, Colorado, along with various physical and meteorological data, were measured during a collaborative field study. Temporal changes in the composition of samples collected during summertime at the rural site were studied. Thermal desorption-GC/FID was used to quantify selected compounds in samples collected during the field study. The statistical analysis of the 1989 Niwot Ridge data set is presented in this work. Principal component analysis was performed on thirty-one variables selected from the data set in order to ascertain different source and process components, and to examine concentration changes in groups of variables with respect to time of day and meteorological conditions. Seven orthogonal groups of variables resulted from the statistical analysis; the groups serve as molecular markers for different biologic and anthropogenic emission sources. In addition, the results of the statistical analysis were used to investigate how several emission source contributions vary with respect to local atmospheric dynamics. Field studies were conducted in the urban environment in and around Boulder, CO. to characterize the dynamics, chemistry, and emission sources which affect the composition and concentration of different size-fractions of aerosol particles in the Boulder air mass. Relationships between different size fractions of particles and some gas-phase pollutants were elucidated. These field studies included an investigation of seasonal variations in the organic content and concentration of aerosol particles, and how these characteristics are related to local meteorology and to the concentration of some gas-phase pollutants. The elemental and organic composition of aerosol particles was investigated according to particle size in preliminary studies of size-differentiated samples of aerosol particles. In order to aid in future studies of urban aerosol particles, samples were collected at a forest fire near Boulder. Molecular markers specific to wood burning processes will be useful indicators of residential wood burning activities in future field studies.

  12. Ichthyoplankton abundance and variance in a large river system concerns for long-term monitoring

    USGS Publications Warehouse

    Holland-Bartels, Leslie E.; Dewey, Michael R.; Zigler, Steven J.

    1995-01-01

    System-wide spatial patterns of ichthyoplankton abundance and variability were assessed in the upper Mississippi and lower Illinois rivers to address the experimental design and statistical confidence in density estimates. Ichthyoplankton was sampled from June to August 1989 in primary milieus (vegetated and non-vegated backwaters and impounded areas, main channels and main channel borders) in three navigation pools (8, 13 and 26) of the upper Mississippi River and in a downstream reach of the Illinois River. Ichthyoplankton densities varied among stations of similar aquatic landscapes (milieus) more than among subsamples within a station. An analysis of sampling effort indicated that the collection of single samples at many stations in a given milieu type is statistically and economically preferable to the collection of multiple subsamples at fewer stations. Cluster analyses also revealed that stations only generally grouped by their preassigned milieu types. Pilot studies such as this can define station groupings and sources of variation beyond an a priori habitat classification. Thus the minimum intensity of sampling required to achieve a desired statistical confidence can be identified before implementing monitoring efforts.

  13. An automated system for chromosome analysis. Volume 1: Goals, system design, and performance

    NASA Technical Reports Server (NTRS)

    Castleman, K. R.; Melnyk, J. H.

    1975-01-01

    The design, construction, and testing of a complete system to produce karyotypes and chromosome measurement data from human blood samples, and a basis for statistical analysis of quantitative chromosome measurement data is described. The prototype was assembled, tested, and evaluated on clinical material and thoroughly documented.

  14. Effective Analysis of Reaction Time Data

    ERIC Educational Resources Information Center

    Whelan, Robert

    2008-01-01

    Most analyses of reaction time (RT) data are conducted by using the statistical techniques with which psychologists are most familiar, such as analysis of variance on the sample mean. Unfortunately, these methods are usually inappropriate for RT data, because they have little power to detect genuine differences in RT between conditions. In…

  15. A Comparison of Conjoint Analysis Response Formats

    Treesearch

    Kevin J. Boyle; Thomas P. Holmes; Mario F. Teisl; Brian Roe

    2001-01-01

    A split-sample design is used to evaluate the convergent validity of three response formats used in conjoint analysis experiments. WC investigate whether recoding rating data to rankings and choose-one formats, and recoding ranking data to choose one. result in structural models and welfare estimates that are statistically indistinguishable from...

  16. Determining Sample Sizes for Precise Contrast Analysis with Heterogeneous Variances

    ERIC Educational Resources Information Center

    Jan, Show-Li; Shieh, Gwowen

    2014-01-01

    The analysis of variance (ANOVA) is one of the most frequently used statistical analyses in practical applications. Accordingly, the single and multiple comparison procedures are frequently applied to assess the differences among mean effects. However, the underlying assumption of homogeneous variances may not always be tenable. This study…

  17. Analysis of vector wind change with respect to time for Vandenberg Air Force Base, California

    NASA Technical Reports Server (NTRS)

    Adelfang, S. I.

    1978-01-01

    A statistical analysis of the temporal variability of wind vectors at 1 km altitude intervals from 0 to 27 km altitude taken from a 10-year data sample of twice-daily rawinsode wind measurements over Vandenberg Air Force Base, California is presented.

  18. Thirty Years of Vegetation Change in the Coastal Santa Cruz Mountains of Northern California Detected Using Landsat Satellite Image Analysis

    NASA Technical Reports Server (NTRS)

    Potter, Christopher

    2015-01-01

    Results from Landsat satellite image times series analysis since 1983 of this study area showed gradual, statistically significant increases in the normalized difference vegetation index (NDVI) in more than 90% of the (predominantly second-growth) evergreen forest locations sampled.

  19. Instrumental and statistical methods for the comparison of class evidence

    NASA Astrophysics Data System (ADS)

    Liszewski, Elisa Anne

    Trace evidence is a major field within forensic science. Association of trace evidence samples can be problematic due to sample heterogeneity and a lack of quantitative criteria for comparing spectra or chromatograms. The aim of this study is to evaluate different types of instrumentation for their ability to discriminate among samples of various types of trace evidence. Chemometric analysis, including techniques such as Agglomerative Hierarchical Clustering, Principal Components Analysis, and Discriminant Analysis, was employed to evaluate instrumental data. First, automotive clear coats were analyzed by using microspectrophotometry to collect UV absorption data. In total, 71 samples were analyzed with classification accuracy of 91.61%. An external validation was performed, resulting in a prediction accuracy of 81.11%. Next, fiber dyes were analyzed using UV-Visible microspectrophotometry. While several physical characteristics of cotton fiber can be identified and compared, fiber color is considered to be an excellent source of variation, and thus was examined in this study. Twelve dyes were employed, some being visually indistinguishable. Several different analyses and comparisons were done, including an inter-laboratory comparison and external validations. Lastly, common plastic samples and other polymers were analyzed using pyrolysis-gas chromatography/mass spectrometry, and their pyrolysis products were then analyzed using multivariate statistics. The classification accuracy varied dependent upon the number of classes chosen, but the plastics were grouped based on composition. The polymers were used as an external validation and misclassifications occurred with chlorinated samples all being placed into the category containing PVC.

  20. Metabolomic fingerprinting employing DART-TOFMS for authentication of tomatoes and peppers from organic and conventional farming.

    PubMed

    Novotná, H; Kmiecik, O; Gałązka, M; Krtková, V; Hurajová, A; Schulzová, V; Hallmann, E; Rembiałkowska, E; Hajšlová, J

    2012-01-01

    The rapidly growing demand for organic food requires the availability of analytical tools enabling their authentication. Recently, metabolomic fingerprinting/profiling has been demonstrated as a challenging option for a comprehensive characterisation of small molecules occurring in plants, since their pattern may reflect the impact of various external factors. In a two-year pilot study, concerned with the classification of organic versus conventional crops, ambient mass spectrometry consisting of a direct analysis in real time (DART) ion source and a time-of-flight mass spectrometer (TOFMS) was employed. This novel methodology was tested on 40 tomato and 24 pepper samples grown under specified conditions. To calculate statistical models, the obtained data (mass spectra) were processed by the principal component analysis (PCA) followed by linear discriminant analysis (LDA). The results from the positive ionisation mode enabled better differentiation between organic and conventional samples than the results from the negative mode. In this case, the recognition ability obtained by LDA was 97.5% for tomato and 100% for pepper samples and the prediction abilities were above 80% for both sample sets. The results suggest that the year of production had stronger influence on the metabolomic fingerprints compared with the type of farming (organic versus conventional). In any case, DART-TOFMS is a promising tool for rapid screening of samples. Establishing comprehensive (multi-sample) long-term databases may further help to improve the quality of statistical classification models.

  1. Manual vs. computer-assisted sperm analysis: can CASA replace manual assessment of human semen in clinical practice?

    PubMed

    Talarczyk-Desole, Joanna; Berger, Anna; Taszarek-Hauke, Grażyna; Hauke, Jan; Pawelczyk, Leszek; Jedrzejczak, Piotr

    2017-01-01

    The aim of the study was to check the quality of computer-assisted sperm analysis (CASA) system in comparison to the reference manual method as well as standardization of the computer-assisted semen assessment. The study was conducted between January and June 2015 at the Andrology Laboratory of the Division of Infertility and Reproductive Endocrinology, Poznań University of Medical Sciences, Poland. The study group consisted of 230 men who gave sperm samples for the first time in our center as part of an infertility investigation. The samples underwent manual and computer-assisted assessment of concentration, motility and morphology. A total of 184 samples were examined twice: manually, according to the 2010 WHO recommendations, and with CASA, using the program set-tings provided by the manufacturer. Additionally, 46 samples underwent two manual analyses and two computer-assisted analyses. The p-value of p < 0.05 was considered as statistically significant. Statistically significant differences were found between all of the investigated sperm parameters, except for non-progressive motility, measured with CASA and manually. In the group of patients where all analyses with each method were performed twice on the same sample we found no significant differences between both assessments of the same probe, neither in the samples analyzed manually nor with CASA, although standard deviation was higher in the CASA group. Our results suggest that computer-assisted sperm analysis requires further improvement for a wider application in clinical practice.

  2. Standard reference water samples for rare earth element determinations

    USGS Publications Warehouse

    Verplanck, P.L.; Antweiler, Ronald C.; Nordstrom, D. Kirk; Taylor, Howard E.

    2001-01-01

    Standard reference water samples (SRWS) were collected from two mine sites, one near Ophir, CO, USA and the other near Redding, CA, USA. The samples were filtered, preserved, and analyzed for rare earth element (REE) concentrations (La, Ce, Pr, Nd, Sm, Eu, Gd, Tb, Dy, Ho, Er, Tm, Yb, and Lu) by inductively coupled plasma-mass spectrometry (ICP-MS). These two samples were acid mine waters with elevated concentrations of REEs (0.45-161 ??g/1). Seventeen international laboratories participated in a 'round-robin' chemical analysis program, which made it possible to evaluate the data by robust statistical procedures that are insensitive to outliers. The resulting most probable values are reported. Ten to 15 of the participants also reported values for Ba, Y, and Sc. Field parameters, major ion, and other trace element concentrations, not subject to statistical evaluation, are provided.

  3. Evaluation of asbestos levels in two schools before and after asbestos removal. Final report

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Karaffa, M.A.; Chesson, J.; Russell, J.

    This report presents a statistical evaluation of airborne asbestos data collected at two schools before and after removal of asbestos-containing material (ACM). Although the monitoring data are not totally consistent with new Asbestos Hazard Emergency Response Act (AHERA) requirements and recent EPA guidelines, the study evaluates these historical data by standard statistical methods to determine if abated work areas meet proposed clearance criteria. The objectives of this statistical analysis were to compare (1) airborne asbestos levels indoors after removal with levels outdoors, (2) airborne asbestos levels before and after removal of asbestos, and (3) static sampling and aggressive sampling ofmore » airborne asbestos. The results of this evaluation indicated the following: the effect of asbestos removal on indoor air quality is unpredictable; the variability in fiber concentrations among different sampling sites within the same building indicates the need to treat different sites as separate areas for the purpose of clearance; and aggressive sampling is appropriate for clearance testing because it captures more entrainable asbestos structures. Aggressive sampling lowers the chance of declaring a worksite clean when entrainable asbestos is still present.« less

  4. Objective Diagnosis of Cervical Cancer by Tissue Protein Profile Analysis

    NASA Astrophysics Data System (ADS)

    Patil, Ajeetkumar; Bhat, Sujatha; Rai, Lavanya; Kartha, V. B.; Chidangil, Santhosh

    2011-07-01

    Protein profiles of homogenized normal cervical tissue samples from hysterectomy subjects and cancerous cervical tissues from biopsy samples collected from patients with different stages of cervical cancer were recorded using High Performance Liquid Chromatography coupled with Laser Induced Fluorescence (HPLC-LIF). The Protein profiles were subjected to Principle Component Analysis to derive statistically significant parameters. Diagnosis of sample types were carried out by matching three parameters—scores of factors, squared residuals, and Mahalanobis Distance. ROC and Youden's Index curves for calibration standards were used for objective estimation of the optimum threshold for decision making and performance.

  5. Quasi-closed phase forward-backward linear prediction analysis of speech for accurate formant detection and estimation.

    PubMed

    Gowda, Dhananjaya; Airaksinen, Manu; Alku, Paavo

    2017-09-01

    Recently, a quasi-closed phase (QCP) analysis of speech signals for accurate glottal inverse filtering was proposed. However, the QCP analysis which belongs to the family of temporally weighted linear prediction (WLP) methods uses the conventional forward type of sample prediction. This may not be the best choice especially in computing WLP models with a hard-limiting weighting function. A sample selective minimization of the prediction error in WLP reduces the effective number of samples available within a given window frame. To counter this problem, a modified quasi-closed phase forward-backward (QCP-FB) analysis is proposed, wherein each sample is predicted based on its past as well as future samples thereby utilizing the available number of samples more effectively. Formant detection and estimation experiments on synthetic vowels generated using a physical modeling approach as well as natural speech utterances show that the proposed QCP-FB method yields statistically significant improvements over the conventional linear prediction and QCP methods.

  6. FabricS: A user-friendly, complete and robust software for particle shape-fabric analysis

    NASA Astrophysics Data System (ADS)

    Moreno Chávez, G.; Castillo Rivera, F.; Sarocchi, D.; Borselli, L.; Rodríguez-Sedano, L. A.

    2018-06-01

    Shape-fabric is a textural parameter related to the spatial arrangement of elongated particles in geological samples. Its usefulness spans a range from sedimentary petrology to igneous and metamorphic petrology. Independently of the process being studied, when a material flows, the elongated particles are oriented with the major axis in the direction of flow. In sedimentary petrology this information has been used for studies of paleo-flow direction of turbidites, the origin of quartz sediments, and locating ignimbrite vents, among others. In addition to flow direction and its polarity, the method enables flow rheology to be inferred. The use of shape-fabric has been limited due to the difficulties of automatically measuring particles and analyzing them with reliable circular statistics programs. This has dampened interest in the method for a long time. Shape-fabric measurement has increased in popularity since the 1980s thanks to the development of new image analysis techniques and circular statistics software. However, the programs currently available are unreliable, old and are incompatible with newer operating systems, or require programming skills. The goal of our work is to develop a user-friendly program, in the MATLAB environment, with a graphical user interface, that can process images and includes editing functions, and thresholds (elongation and size) for selecting a particle population and analyzing it with reliable circular statistics algorithms. Moreover, the method also has to produce rose diagrams, orientation vectors, and a complete series of statistical parameters. All these requirements are met by our new software. In this paper, we briefly explain the methodology from collection of oriented samples in the field to the minimum number of particles needed to obtain reliable fabric data. We obtained the data using specific statistical tests and taking into account the degree of iso-orientation of the samples and the required degree of reliability. The program has been verified by means of several simulations performed using appropriately designed features and by analyzing real samples.

  7. Biostatistics primer: part I.

    PubMed

    Overholser, Brian R; Sowinski, Kevin M

    2007-12-01

    Biostatistics is the application of statistics to biologic data. The field of statistics can be broken down into 2 fundamental parts: descriptive and inferential. Descriptive statistics are commonly used to categorize, display, and summarize data. Inferential statistics can be used to make predictions based on a sample obtained from a population or some large body of information. It is these inferences that are used to test specific research hypotheses. This 2-part review will outline important features of descriptive and inferential statistics as they apply to commonly conducted research studies in the biomedical literature. Part 1 in this issue will discuss fundamental topics of statistics and data analysis. Additionally, some of the most commonly used statistical tests found in the biomedical literature will be reviewed in Part 2 in the February 2008 issue.

  8. Identifying currents in the gene pool for bacterial populations using an integrative approach.

    PubMed

    Tang, Jing; Hanage, William P; Fraser, Christophe; Corander, Jukka

    2009-08-01

    The evolution of bacterial populations has recently become considerably better understood due to large-scale sequencing of population samples. It has become clear that DNA sequences from a multitude of genes, as well as a broad sample coverage of a target population, are needed to obtain a relatively unbiased view of its genetic structure and the patterns of ancestry connected to the strains. However, the traditional statistical methods for evolutionary inference, such as phylogenetic analysis, are associated with several difficulties under such an extensive sampling scenario, in particular when a considerable amount of recombination is anticipated to have taken place. To meet the needs of large-scale analyses of population structure for bacteria, we introduce here several statistical tools for the detection and representation of recombination between populations. Also, we introduce a model-based description of the shape of a population in sequence space, in terms of its molecular variability and affinity towards other populations. Extensive real data from the genus Neisseria are utilized to demonstrate the potential of an approach where these population genetic tools are combined with an phylogenetic analysis. The statistical tools introduced here are freely available in BAPS 5.2 software, which can be downloaded from http://web.abo.fi/fak/mnf/mate/jc/software/baps.html.

  9. Risk management in inpatient units in the Czech Republic from the point of view of nurses in leadership positions.

    PubMed

    Prokešová, Radka; Brabcová, Iva; Pokojová, Radka; Bártlová, Sylva

    2016-12-01

    The goal of this study was to assess specific features of risk management from the point of view of nurses in leadership positions in inpatient units in Czech hospitals. The study was performed using a quantitative research strategy, i.e., a questionnaire. The data sample was analyzed using SPSS v. 23.0. Pearson's chi-square and analysis of adjusted residues were used for identifying the existence associations of nominal and/or ordinal quantities. 315 nurses in leadership positions working in inpatient units of Czech hospitals were included in the sample. The sample was created using random selection by means of quotas. Based on the study results, statistically significant relations between the respondents' education and the utilization of methods to identify risks were identified. Furthermore, statistically significant relationships were found between a nurse's functional role within the system and regular analysis and evaluation of risks and between the type of the healthcare facility and the degree of patient involvement in risk management. The study found statistically significant correlations that can be used to increase the effectiveness of risk management in inpatient units of Czech hospitals. From this perspective, the fact that patient involvement in risk management was only reported by 37.8% of respondents seems to be the most notable problem.

  10. Water-quality, bed-sediment, and biological data (October 2015 through September 2016) and statistical summaries of data for streams in the Clark Fork Basin, Montana

    USGS Publications Warehouse

    Dodge, Kent A.; Hornberger, Michelle I.; Turner, Matthew A.

    2018-03-30

    Water, bed sediment, and biota were sampled in selected streams from Butte to near Missoula, Montana, as part of a monitoring program in the upper Clark Fork Basin of western Montana. The sampling program was led by the U.S. Geological Survey, in cooperation with the U.S. Environmental Protection Agency, to characterize aquatic resources in the Clark Fork Basin, with emphasis on trace elements associated with historic mining and smelting activities. Sampling sites were on the Clark Fork and selected tributaries. Water samples were collected periodically at 20 sites from October 2015 through September 2016. Bed-sediment and biota samples were collected once at 13 sites during August 2016.This report presents the analytical results and quality-assurance data for water-quality, bed-sediment, and biota samples collected at sites from October 2015 through September 2016. Water-quality data include concentrations of selected major ions, trace elements, and suspended sediment. Samples for analysis of turbidity were collected at 13 sites, whereas samples for analysis of dissolved organic carbon were collected at 10 sites. In addition, samples for analysis of nitrogen (nitrate plus nitrite) were collected at two sites. Daily values of mean suspended-sediment concentration and suspended-sediment discharge were determined for three sites. Seasonal daily values of turbidity were determined for five sites. Bed-sediment data include trace-element concentrations in the fine-grained (less than 0.063 millimeter) fraction. Biological data include trace-element concentrations in whole-body tissue of aquatic benthic insects. Statistical summaries of water-quality, bed-sediment, and biological data for sites in the upper Clark Fork Basin are provided for the period of record.

  11. metaCCA: summary statistics-based multivariate meta-analysis of genome-wide association studies using canonical correlation analysis.

    PubMed

    Cichonska, Anna; Rousu, Juho; Marttinen, Pekka; Kangas, Antti J; Soininen, Pasi; Lehtimäki, Terho; Raitakari, Olli T; Järvelin, Marjo-Riitta; Salomaa, Veikko; Ala-Korpela, Mika; Ripatti, Samuli; Pirinen, Matti

    2016-07-01

    A dominant approach to genetic association studies is to perform univariate tests between genotype-phenotype pairs. However, analyzing related traits together increases statistical power, and certain complex associations become detectable only when several variants are tested jointly. Currently, modest sample sizes of individual cohorts, and restricted availability of individual-level genotype-phenotype data across the cohorts limit conducting multivariate tests. We introduce metaCCA, a computational framework for summary statistics-based analysis of a single or multiple studies that allows multivariate representation of both genotype and phenotype. It extends the statistical technique of canonical correlation analysis to the setting where original individual-level records are not available, and employs a covariance shrinkage algorithm to achieve robustness.Multivariate meta-analysis of two Finnish studies of nuclear magnetic resonance metabolomics by metaCCA, using standard univariate output from the program SNPTEST, shows an excellent agreement with the pooled individual-level analysis of original data. Motivated by strong multivariate signals in the lipid genes tested, we envision that multivariate association testing using metaCCA has a great potential to provide novel insights from already published summary statistics from high-throughput phenotyping technologies. Code is available at https://github.com/aalto-ics-kepaco anna.cichonska@helsinki.fi or matti.pirinen@helsinki.fi Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.

  12. metaCCA: summary statistics-based multivariate meta-analysis of genome-wide association studies using canonical correlation analysis

    PubMed Central

    Cichonska, Anna; Rousu, Juho; Marttinen, Pekka; Kangas, Antti J.; Soininen, Pasi; Lehtimäki, Terho; Raitakari, Olli T.; Järvelin, Marjo-Riitta; Salomaa, Veikko; Ala-Korpela, Mika; Ripatti, Samuli; Pirinen, Matti

    2016-01-01

    Motivation: A dominant approach to genetic association studies is to perform univariate tests between genotype-phenotype pairs. However, analyzing related traits together increases statistical power, and certain complex associations become detectable only when several variants are tested jointly. Currently, modest sample sizes of individual cohorts, and restricted availability of individual-level genotype-phenotype data across the cohorts limit conducting multivariate tests. Results: We introduce metaCCA, a computational framework for summary statistics-based analysis of a single or multiple studies that allows multivariate representation of both genotype and phenotype. It extends the statistical technique of canonical correlation analysis to the setting where original individual-level records are not available, and employs a covariance shrinkage algorithm to achieve robustness. Multivariate meta-analysis of two Finnish studies of nuclear magnetic resonance metabolomics by metaCCA, using standard univariate output from the program SNPTEST, shows an excellent agreement with the pooled individual-level analysis of original data. Motivated by strong multivariate signals in the lipid genes tested, we envision that multivariate association testing using metaCCA has a great potential to provide novel insights from already published summary statistics from high-throughput phenotyping technologies. Availability and implementation: Code is available at https://github.com/aalto-ics-kepaco Contacts: anna.cichonska@helsinki.fi or matti.pirinen@helsinki.fi Supplementary information: Supplementary data are available at Bioinformatics online. PMID:27153689

  13. Correlating tephras and cryptotephras using glass compositional analyses and numerical and statistical methods: Review and evaluation

    NASA Astrophysics Data System (ADS)

    Lowe, David J.; Pearce, Nicholas J. G.; Jorgensen, Murray A.; Kuehn, Stephen C.; Tryon, Christian A.; Hayward, Chris L.

    2017-11-01

    We define tephras and cryptotephras and their components (mainly ash-sized particles of glass ± crystals in distal deposits) and summarize the basis of tephrochronology as a chronostratigraphic correlational and dating tool for palaeoenvironmental, geological, and archaeological research. We then document and appraise recent advances in analytical methods used to determine the major, minor, and trace elements of individual glass shards from tephra or cryptotephra deposits to aid their correlation and application. Protocols developed recently for the electron probe microanalysis of major elements in individual glass shards help to improve data quality and standardize reporting procedures. A narrow electron beam (diameter ∼3-5 μm) can now be used to analyze smaller glass shards than previously attainable. Reliable analyses of 'microshards' (defined here as glass shards <32 μm in diameter) using narrow beams are useful for fine-grained samples from distal or ultra-distal geographic locations, and for vesicular or microlite-rich glass shards or small melt inclusions. Caveats apply, however, in the microprobe analysis of very small microshards (≤∼5 μm in diameter), where particle geometry becomes important, and of microlite-rich glass shards where the potential problem of secondary fluorescence across phase boundaries needs to be recognised. Trace element analyses of individual glass shards using laser ablation inductively coupled plasma-mass spectrometry (LA-ICP-MS), with crater diameters of 20 μm and 10 μm, are now effectively routine, giving detection limits well below 1 ppm. Smaller ablation craters (<10 μm) can be subject to significant element fractionation during analysis, but the systematic relationship of such fractionation with glass composition suggests that analyses for some elements at these resolutions may be quantifiable. In undertaking analyses, either by microprobe or LA-ICP-MS, reference material data acquired using the same procedure, and preferably from the same analytical session, should be presented alongside new analytical data. In part 2 of the review, we describe, critically assess, and recommend ways in which tephras or cryptotephras can be correlated (in conjunction with other information) using numerical or statistical analyses of compositional data. Statistical methods provide a less subjective means of dealing with analytical data pertaining to tephra components (usually glass or crystals/phenocrysts) than heuristic alternatives. They enable a better understanding of relationships among the data from multiple viewpoints to be developed and help quantify the degree of uncertainty in establishing correlations. In common with other scientific hypothesis testing, it is easier to infer using such analysis that two or more tephras are different rather than the same. Adding stratigraphic, chronological, spatial, or palaeoenvironmental data (i.e. multiple criteria) is usually necessary and allows for more robust correlations to be made. A two-stage approach is useful, the first focussed on differences in the mean composition of samples, or their range, which can be visualised graphically via scatterplot matrices or bivariate plots coupled with the use of statistical tools such as distance measures, similarity coefficients, hierarchical cluster analysis (informed by distance measures or similarity or cophenetic coefficients), and principal components analysis (PCA). Some statistical methods (cluster analysis, discriminant analysis) are referred to as 'machine learning' in the computing literature. The second stage examines sample variance and the degree of compositional similarity so that sample equivalence or otherwise can be established on a statistical basis. This stage may involve discriminant function analysis (DFA), support vector machines (SVMs), canonical variates analysis (CVA), and ANOVA or MANOVA (or its two-sample special case, the Hotelling two-sample T2 test). Randomization tests can be used where distributional assumptions such as multivariate normality underlying parametric tests are doubtful. Compositional data may be transformed and scaled before being subjected to multivariate statistical procedures including calculation of distance matrices, hierarchical cluster analysis, and PCA. Such transformations may make the assumption of multivariate normality more appropriate. A sequential procedure using Mahalanobis distance and the Hotelling two-sample T2 test is illustrated using glass major element data from trachytic to phonolitic Kenyan tephras. All these methods require a broad range of high-quality compositional data which can be used to compare 'unknowns' with reference (training) sets that are sufficiently complete to account for all possible correlatives, including tephras with heterogeneous glasses that contain multiple compositional groups. Currently, incomplete databases are tending to limit correlation efficacy. The development of an open, online global database to facilitate progress towards integrated, high-quality tephrostratigraphic frameworks for different regions is encouraged.

  14. Heterogenic Solid Biofuel Sampling Methodology and Uncertainty Associated with Prompt Analysis

    PubMed Central

    Pazó, Jose A.; Granada, Enrique; Saavedra, Ángeles; Patiño, David; Collazo, Joaquín

    2010-01-01

    Accurate determination of the properties of biomass is of particular interest in studies on biomass combustion or cofiring. The aim of this paper is to develop a methodology for prompt analysis of heterogeneous solid fuels with an acceptable degree of accuracy. Special care must be taken with the sampling procedure to achieve an acceptable degree of error and low statistical uncertainty. A sampling and error determination methodology for prompt analysis is presented and validated. Two approaches for the propagation of errors are also given and some comparisons are made in order to determine which may be better in this context. Results show in general low, acceptable levels of uncertainty, demonstrating that the samples obtained in the process are representative of the overall fuel composition. PMID:20559506

  15. Radioactivity Registered With a Small Number of Events

    NASA Astrophysics Data System (ADS)

    Zlokazov, Victor; Utyonkov, Vladimir

    2018-02-01

    The synthesis of superheavy elements asks for the analysis of low statistics experimental data presumably obeying an unknown exponential distribution and to take the decision whether they originate from one source or have admixtures. Here we analyze predictions following from non-parametrical methods, employing only such fundamental sample properties as the sample mean, the median and the mode.

  16. Using Sieving and Unknown Sand Samples for a Sedimentation-Stratigraphy Class Project with Linkage to Introductory Courses

    ERIC Educational Resources Information Center

    Videtich, Patricia E.; Neal, William J.

    2012-01-01

    Using sieving and sample "unknowns" for instructional grain-size analysis and interpretation of sands in undergraduate sedimentology courses has advantages over other techniques. Students (1) learn to calculate and use statistics; (2) visually observe differences in the grain-size fractions, thereby developing a sense of specific size…

  17. DNA analysis in Disaster Victim Identification.

    PubMed

    Montelius, Kerstin; Lindblom, Bertil

    2012-06-01

    DNA profiling and matching is one of the primary methods to identify missing persons in a disaster, as defined by the Interpol Disaster Victim Identification Guide. The process to identify a victim by DNA includes: the collection of the best possible ante-mortem (AM) samples, the choice of post-mortem (PM) samples, DNA-analysis, matching and statistical weighting of the genetic relationship or match. Each disaster has its own scenario, and each scenario defines its own methods for identification of the deceased.

  18. Charm dimuon production in neutrino-nucleon interactions in the NOMAD experiment

    NASA Astrophysics Data System (ADS)

    Petti, Roberto; Samoylov, Oleg

    2012-09-01

    We present our new measurement of charm dimuon production in neutrino-iron interactions based upon the full statistics collected by the NOMAD experiment. After background subtraction we observe 15,340 charm dimuon events, providing the largest sample currently available. The analysis exploits the large inclusive charged current sample (about 9 million events after all analysis cuts) to constrain the total systematic uncertainty to about 2%. The extraction of strange sea and charm production parameters is also discussed.

  19. Charm dimuon production in neutrino-nucleon interactions in the NOMAD experiment

    NASA Astrophysics Data System (ADS)

    Petti, R.; Samoylov, O. B.

    2011-12-01

    We present our new measurement of charm dimuon production in neutrino-iron interactions based upon the full statistics collected by the NOMAD experiment. After background subtraction we observe 15,340 charm dimuon events, providing the largest sample currently available. The analysis exploits the large inclusive charged current sample (about 9 million events after all analysis cuts) to constrain the total systematic uncertainty to ˜2%. The extraction of strange sea and charm production parameters is also discussed.

  20. Evaluation of Primary Immunization Coverage of Infants Under Universal Immunization Programme in an Urban Area of Bangalore City Using Cluster Sampling and Lot Quality Assurance Sampling Techniques

    PubMed Central

    K, Punith; K, Lalitha; G, Suman; BS, Pradeep; Kumar K, Jayanth

    2008-01-01

    Research Question: Is LQAS technique better than cluster sampling technique in terms of resources to evaluate the immunization coverage in an urban area? Objective: To assess and compare the lot quality assurance sampling against cluster sampling in the evaluation of primary immunization coverage. Study Design: Population-based cross-sectional study. Study Setting: Areas under Mathikere Urban Health Center. Study Subjects: Children aged 12 months to 23 months. Sample Size: 220 in cluster sampling, 76 in lot quality assurance sampling. Statistical Analysis: Percentages and Proportions, Chi square Test. Results: (1) Using cluster sampling, the percentage of completely immunized, partially immunized and unimmunized children were 84.09%, 14.09% and 1.82%, respectively. With lot quality assurance sampling, it was 92.11%, 6.58% and 1.31%, respectively. (2) Immunization coverage levels as evaluated by cluster sampling technique were not statistically different from the coverage value as obtained by lot quality assurance sampling techniques. Considering the time and resources required, it was found that lot quality assurance sampling is a better technique in evaluating the primary immunization coverage in urban area. PMID:19876474

  1. Optimally estimating the sample mean from the sample size, median, mid-range, and/or mid-quartile range.

    PubMed

    Luo, Dehui; Wan, Xiang; Liu, Jiming; Tong, Tiejun

    2018-06-01

    The era of big data is coming, and evidence-based medicine is attracting increasing attention to improve decision making in medical practice via integrating evidence from well designed and conducted clinical research. Meta-analysis is a statistical technique widely used in evidence-based medicine for analytically combining the findings from independent clinical trials to provide an overall estimation of a treatment effectiveness. The sample mean and standard deviation are two commonly used statistics in meta-analysis but some trials use the median, the minimum and maximum values, or sometimes the first and third quartiles to report the results. Thus, to pool results in a consistent format, researchers need to transform those information back to the sample mean and standard deviation. In this article, we investigate the optimal estimation of the sample mean for meta-analysis from both theoretical and empirical perspectives. A major drawback in the literature is that the sample size, needless to say its importance, is either ignored or used in a stepwise but somewhat arbitrary manner, e.g. the famous method proposed by Hozo et al. We solve this issue by incorporating the sample size in a smoothly changing weight in the estimators to reach the optimal estimation. Our proposed estimators not only improve the existing ones significantly but also share the same virtue of the simplicity. The real data application indicates that our proposed estimators are capable to serve as "rules of thumb" and will be widely applied in evidence-based medicine.

  2. Statistical analysis of environmental monitoring data: does a worst case time for monitoring clean rooms exist?

    PubMed

    Cundell, A M; Bean, R; Massimore, L; Maier, C

    1998-01-01

    To determine the relationship between the sampling time of the environmental monitoring, i.e., viable counts, in aseptic filling areas and the microbial count and frequency of alerts for air, surface and personnel microbial monitoring, statistical analyses were conducted on 1) the frequency of alerts versus the time of day for routine environmental sampling conducted in calendar year 1994, and 2) environmental monitoring data collected at 30-minute intervals during routine aseptic filling operations over two separate days in four different clean rooms with multiple shifts and equipment set-ups at a parenteral manufacturing facility. Statistical analyses showed, except for one floor location that had significantly higher number of counts but no alert or action level samplings in the first two hours of operation, there was no relationship between the number of counts and the time of sampling. Further studies over a 30-day period at the floor location showed no relationship between time of sampling and microbial counts. The conclusion reached in the study was that there is no worst case time for environmental monitoring at that facility and that sampling any time during the aseptic filling operation will give a satisfactory measure of the microbial cleanliness in the clean room during the set-up and aseptic filling operation.

  3. Assessment of variations in thermal cycle life data of thermal barrier coated rods

    NASA Astrophysics Data System (ADS)

    Hendricks, R. C.; McDonald, G.

    An analysis of thermal cycle life data for 22 thermal barrier coated (TBC) specimens was conducted. The Zr02-8Y203/NiCrAlY plasma spray coated Rene 41 rods were tested in a Mach 0.3 Jet A/air burner flame. All specimens were subjected to the same coating and subsequent test procedures in an effort to control three parametric groups; material properties, geometry and heat flux. Statistically, the data sample space had a mean of 1330 cycles with a standard deviation of 520 cycles. The data were described by normal or log-normal distributions, but other models could also apply; the sample size must be increased to clearly delineate a statistical failure model. The statistical methods were also applied to adhesive/cohesive strength data for 20 TBC discs of the same composition, with similar results. The sample space had a mean of 9 MPa with a standard deviation of 4.2 MPa.

  4. Assessment of variations in thermal cycle life data of thermal barrier coated rods

    NASA Technical Reports Server (NTRS)

    Hendricks, R. C.; Mcdonald, G.

    1981-01-01

    An analysis of thermal cycle life data for 22 thermal barrier coated (TBC) specimens was conducted. The Zr02-8Y203/NiCrAlY plasma spray coated Rene 41 rods were tested in a Mach 0.3 Jet A/air burner flame. All specimens were subjected to the same coating and subsequent test procedures in an effort to control three parametric groups; material properties, geometry and heat flux. Statistically, the data sample space had a mean of 1330 cycles with a standard deviation of 520 cycles. The data were described by normal or log-normal distributions, but other models could also apply; the sample size must be increased to clearly delineate a statistical failure model. The statistical methods were also applied to adhesive/cohesive strength data for 20 TBC discs of the same composition, with similar results. The sample space had a mean of 9 MPa with a standard deviation of 4.2 MPa.

  5. Thirteen years of the fight against doping in figures.

    PubMed

    Aguilar, Millán; Muñoz-Guerra, Jesús; Plata, María Del Mar; Del Coso, Juan

    2017-06-01

    Every year, the World Anti-Doping Agency (WADA) publishes the main statistics reported by the accredited laboratories, which provide very valuable information for assessing changes in the patterns of doping in sports over time. Using the information provided since 2003 as the basis for the analysis, the evolution of doping/anti-doping figures over the last decade can be examined in reasonable detail, at least in reference to samples analyzed and categories of substances more commonly found in athletes' samples. This brief analysis of the WADA statistical reports leads us to the following outcomes: the increase in anti-doping pressure from 2003 to 2015, as evidenced by increased numbers of samples analyzed and banned substances, has not directly produced a higher frequency of adverse/atypical findings. Although this could be interpreted as steady state in the capacity to detect doping through this whole period, it also resulted in a significant increase in the absolute number of samples catalogued as doping (from 2247 in 2003 to 5912 in 2015). Anabolic agents have been the most common doping substances detected in all statistics reports while the remaining groups of substances are much less frequently found in doping control samples. Given that one might have expected the enhancement of the anti-doping programme led by WADA over this last decade to have increased the percentage of adverse/atypical findings, the fact that it did not might indicate the need to take another step in sampling strategies, such as 'more intelligent testing' based on the differences in the prevalence of doping substances among sports. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.

  6. [Effect of green tea or green tea extract consumption on body weight and body composition; systematic review and meta-analysis].

    PubMed

    Baladia, Eduard; Basulto, Julio; Manera, María; Martínez, Rodrigo; Calbet, David

    2014-03-01

    Caffeine and catechins contained in green tea may have a thermogenic effect favoring weight and body fat loss. The aim of this study is to evaluate the magnitude of the effect of green tea or its extracts (caffeine and catechins) on body weight and body composition. A systematic review and meta-analysis was conducted to determine the magnitude of the effect of green tea or its extracts on body weight (kg), body mass index (BMI) (kg/m2), fat mass (%), and waist and hip circumference (cm). We included studies published between 2000 and 2013, retrieved from PubMed/Medline with the following characteristics: randomized controlled trials (RCTs) of parallel groups (intervention and placebo), randomized, double-blind, and a minimum 12-week follow-up, in healthy individuals of either gender, 18 years or older, with a BMI of 25-40 kg/m2. Quality and risk of bias was assessed for every included study, and the statistical analysis was performed with the Crochrane Collaboration RevMan 5.1.6 software, according to the random effects model with a confidence interval of 95% (95%). It was established that the effect was statistically significant at p < 0.05, and the homogeneity of the studies was assessed using the I2 index. The search strategy retrieved 154 studies, of which only five could be included in the quantitative analysis. The analysis revealed a not statistically significant mean difference (MD) in weight loss in the analyzed sample and subgroups: Asian individuals -0.81 kg (95% CI: -2.76 to 1.13; P = 0.41; I2 = 0%, n = 210), Caucasians -0.73 kg (95% CI: -3.22 to 1.75; P = 0.45; I2 = 0%; n = 91), as well as in the sample as a whole: -0.78 kg (95% CI: -2.31 to 0.75; P = 0.32; I2 = 0%; n = 301). No statistically significant decrease was revealed in BMI in the analyzed sample and subgroups: Asian individuals -0.65 (95% CI: -1.85 to 0.54; P = 0.29; I2 = 0%; n = 71), -0.21 Caucasians (95% CI: -0.96 to 0.53; P = 0.58; I2 = 22%; n = 91), as well as in the sample as a whole: -0.31 kg (95% CI: -0.88 to 0.27; P = 0.30; I2 = 0%; n = 162), nor for the waist circumference 0.08 cm (95% CI: -0.39 to 0.55; P = 0.73; I2 = 3%; n = 301) or hip (95% CI: -1.14 to 0.93; P = 0.85; I2 = 0%; n = 210). In the evaluation of the effect on the percentage of fat mass (FM%), MD was found not statistically significant for population subgroups: Asian individuals -0.76 (95% CI: -1.59 to 0.08; P = 0.08; I2 = 0%; n = 169), Caucasians -0.76 (95% CI: -2.22 to 0.70; P = 0.31; I2 = 36%; n = 93), but a small, although statistically significant, decrease in the overall effect was found -0.76 (95% CI: -1.44 to -0.09; P = 0.03; I2 = 0%; n = 260). The statistically significant effect of green tea on the FM% of the entire sample was not clinically relevant, a fact also highlighted in the results of other meta-analysis. CONCLUSION OF THE AUTHORS: Green tea or gree tea extracts intake or its extracts exerts no statistically significant effect on the weight of overweight or obese adults. There is a small effect on the decrease in the percentage of fat mass, but it is not clinically relevant. Copyright AULA MEDICA EDICIONES 2014. Published by AULA MEDICA. All rights reserved.

  7. Graphical augmentations to the funnel plot assess the impact of additional evidence on a meta-analysis.

    PubMed

    Langan, Dean; Higgins, Julian P T; Gregory, Walter; Sutton, Alexander J

    2012-05-01

    We aim to illustrate the potential impact of a new study on a meta-analysis, which gives an indication of the robustness of the meta-analysis. A number of augmentations are proposed to one of the most widely used of graphical displays, the funnel plot. Namely, 1) statistical significance contours, which define regions of the funnel plot in which a new study would have to be located to change the statistical significance of the meta-analysis; and 2) heterogeneity contours, which show how a new study would affect the extent of heterogeneity in a given meta-analysis. Several other features are also described, and the use of multiple features simultaneously is considered. The statistical significance contours suggest that one additional study, no matter how large, may have a very limited impact on the statistical significance of a meta-analysis. The heterogeneity contours illustrate that one outlying study can increase the level of heterogeneity dramatically. The additional features of the funnel plot have applications including 1) informing sample size calculations for the design of future studies eligible for inclusion in the meta-analysis; and 2) informing the updating prioritization of a portfolio of meta-analyses such as those prepared by the Cochrane Collaboration. Copyright © 2012 Elsevier Inc. All rights reserved.

  8. Analysis and interpretation of cost data in randomised controlled trials: review of published studies

    PubMed Central

    Barber, Julie A; Thompson, Simon G

    1998-01-01

    Objective To review critically the statistical methods used for health economic evaluations in randomised controlled trials where an estimate of cost is available for each patient in the study. Design Survey of published randomised trials including an economic evaluation with cost values suitable for statistical analysis; 45 such trials published in 1995 were identified from Medline. Main outcome measures The use of statistical methods for cost data was assessed in terms of the descriptive statistics reported, use of statistical inference, and whether the reported conclusions were justified. Results Although all 45 trials reviewed apparently had cost data for each patient, only 9 (20%) reported adequate measures of variability for these data and only 25 (56%) gave results of statistical tests or a measure of precision for the comparison of costs between the randomised groups. Only 16 (36%) of the articles gave conclusions which were justified on the basis of results presented in the paper. No paper reported sample size calculations for costs. Conclusions The analysis and interpretation of cost data from published trials reveal a lack of statistical awareness. Strong and potentially misleading conclusions about the relative costs of alternative therapies have often been reported in the absence of supporting statistical evidence. Improvements in the analysis and reporting of health economic assessments are urgently required. Health economic guidelines need to be revised to incorporate more detailed statistical advice. Key messagesHealth economic evaluations required for important healthcare policy decisions are often carried out in randomised controlled trialsA review of such published economic evaluations assessed whether statistical methods for cost outcomes have been appropriately used and interpretedFew publications presented adequate descriptive information for costs or performed appropriate statistical analysesIn at least two thirds of the papers, the main conclusions regarding costs were not justifiedThe analysis and reporting of health economic assessments within randomised controlled trials urgently need improving PMID:9794854

  9. Reliable and More Powerful Methods for Power Analysis in Structural Equation Modeling

    ERIC Educational Resources Information Center

    Yuan, Ke-Hai; Zhang, Zhiyong; Zhao, Yanyun

    2017-01-01

    The normal-distribution-based likelihood ratio statistic T[subscript ml] = nF[subscript ml] is widely used for power analysis in structural Equation modeling (SEM). In such an analysis, power and sample size are computed by assuming that T[subscript ml] follows a central chi-square distribution under H[subscript 0] and a noncentral chi-square…

  10. Statistics and quality assurance for the Northern Research Station Forest Inventory and Analysis Program, 2016

    Treesearch

    Dale D. Gormanson; Scott A. Pugh; Charles J. Barnett; Patrick D. Miles; Randall S. Morin; Paul A. Sowers; Jim Westfall

    2017-01-01

    The U.S. Forest Service Forest Inventory and Analysis (FIA) program collects sample plot data on all forest ownerships across the United States. FIA's primary objective is to determine the extent, condition, volume, growth, and use of trees on the Nation's forest land through a comprehensive inventory and analysis of the Nation's forest resources. The...

  11. The space of ultrametric phylogenetic trees.

    PubMed

    Gavryushkin, Alex; Drummond, Alexei J

    2016-08-21

    The reliability of a phylogenetic inference method from genomic sequence data is ensured by its statistical consistency. Bayesian inference methods produce a sample of phylogenetic trees from the posterior distribution given sequence data. Hence the question of statistical consistency of such methods is equivalent to the consistency of the summary of the sample. More generally, statistical consistency is ensured by the tree space used to analyse the sample. In this paper, we consider two standard parameterisations of phylogenetic time-trees used in evolutionary models: inter-coalescent interval lengths and absolute times of divergence events. For each of these parameterisations we introduce a natural metric space on ultrametric phylogenetic trees. We compare the introduced spaces with existing models of tree space and formulate several formal requirements that a metric space on phylogenetic trees must possess in order to be a satisfactory space for statistical analysis, and justify them. We show that only a few known constructions of the space of phylogenetic trees satisfy these requirements. However, our results suggest that these basic requirements are not enough to distinguish between the two metric spaces we introduce and that the choice between metric spaces requires additional properties to be considered. Particularly, that the summary tree minimising the square distance to the trees from the sample might be different for different parameterisations. This suggests that further fundamental insight is needed into the problem of statistical consistency of phylogenetic inference methods. Copyright © 2016 The Authors. Published by Elsevier Ltd.. All rights reserved.

  12. A study of personal and area airborne asbestos concentrations during asbestos abatement: a statistical evaluation of fibre concentration data.

    PubMed

    Lange, J H; Lange, P R; Reinhard, T K; Thomulka, K W

    1996-08-01

    Data were collected and analysed on airborne concentrations of asbestos generated by abatement of different asbestos-containing materials using various removal practices. Airborne concentrations of asbestos are dramatically variable among the types of asbestos-containing material being abated. Abatement practices evaluated in this study were removal of boiler/pipe insulation in a crawl space, ceiling tile, transite, floor tile/mastic with traditional methods, and mastic removal with a high-efficiency particulate air filter blast track (shot-blast) machine. In general, abatement of boiler and pipe insulation produces the highest airborne fibre levels, while abatement of floor tile and mastic was observed to be the lowest. A comparison of matched personal and area samples was not significantly different, and exhibited a good correlation using regression analysis. After adjusting data for outliers, personal sample fibre concentrations were greater than area sample fibre concentrations. Statistical analysis and sample distribution of airborne asbestos concentrations appear to be best represented in a logarithmic form. Area sample fibre concentrations were shown in this study to have a larger variability than personal measurements. Evaluation of outliers in fibre concentration data and the ability of these values to skew sample populations is presented. The use of personal and area samples in determining exposure, selecting personal protective equipment and its historical relevance as related to future abatement projects is discussed.

  13. [Respondent-Driven Sampling: a new sampling method to study visible and hidden populations].

    PubMed

    Mantecón, Alejandro; Juan, Montse; Calafat, Amador; Becoña, Elisardo; Román, Encarna

    2008-01-01

    The paper introduces a variant of chain-referral sampling: respondent-driven sampling (RDS). This sampling method shows that methods based on network analysis can be combined with the statistical validity of standard probability sampling methods. In this sense, RDS appears to be a mathematical improvement of snowball sampling oriented to the study of hidden populations. However, we try to prove its validity with populations that are not within a sampling frame but can nonetheless be contacted without difficulty. The basics of RDS are explained through our research on young people (aged 14 to 25) who go clubbing, consume alcohol and other drugs, and have sex. Fieldwork was carried out between May and July 2007 in three Spanish regions: Baleares, Galicia and Comunidad Valenciana. The presentation of the study shows the utility of this type of sampling when the population is accessible but there is a difficulty deriving from the lack of a sampling frame. However, the sample obtained is not a random representative one in statistical terms of the target population. It must be acknowledged that the final sample is representative of a 'pseudo-population' that approximates to the target population but is not identical to it.

  14. Tools based on multivariate statistical analysis for classification of soil and groundwater in Apulian agricultural sites.

    PubMed

    Ielpo, Pierina; Leardi, Riccardo; Pappagallo, Giuseppe; Uricchio, Vito Felice

    2017-06-01

    In this paper, the results obtained from multivariate statistical techniques such as PCA (Principal component analysis) and LDA (Linear discriminant analysis) applied to a wide soil data set are presented. The results have been compared with those obtained on a groundwater data set, whose samples were collected together with soil ones, within the project "Improvement of the Regional Agro-meteorological Monitoring Network (2004-2007)". LDA, applied to soil data, has allowed to distinguish the geographical origin of the sample from either one of the two macroaeras: Bari and Foggia provinces vs Brindisi, Lecce e Taranto provinces, with a percentage of correct prediction in cross validation of 87%. In the case of the groundwater data set, the best classification was obtained when the samples were grouped into three macroareas: Foggia province, Bari province and Brindisi, Lecce and Taranto provinces, by reaching a percentage of correct predictions in cross validation of 84%. The obtained information can be very useful in supporting soil and water resource management, such as the reduction of water consumption and the reduction of energy and chemical (nutrients and pesticides) inputs in agriculture.

  15. SPICE: exploration and analysis of post-cytometric complex multivariate datasets.

    PubMed

    Roederer, Mario; Nozzi, Joshua L; Nason, Martha C

    2011-02-01

    Polychromatic flow cytometry results in complex, multivariate datasets. To date, tools for the aggregate analysis of these datasets across multiple specimens grouped by different categorical variables, such as demographic information, have not been optimized. Often, the exploration of such datasets is accomplished by visualization of patterns with pie charts or bar charts, without easy access to statistical comparisons of measurements that comprise multiple components. Here we report on algorithms and a graphical interface we developed for these purposes. In particular, we discuss thresholding necessary for accurate representation of data in pie charts, the implications for display and comparison of normalized versus unnormalized data, and the effects of averaging when samples with significant background noise are present. Finally, we define a statistic for the nonparametric comparison of complex distributions to test for difference between groups of samples based on multi-component measurements. While originally developed to support the analysis of T cell functional profiles, these techniques are amenable to a broad range of datatypes. Published 2011 Wiley-Liss, Inc.

  16. Adaptation and validation of the Spanish version of the Dieting Peer Competitiveness Scale to adolescents of both genders.

    PubMed

    Pamies-Aubalat, Lidia; Quiles-Marcos, Yolanda; Núñez-Núñez, Rosa M

    2013-12-01

    This study examined the Dieting Peer Competitiveness Scale; it is an instrument for evaluating this social comparison in young people. This instrumental study has two aims: The objective of the first aim was to present preliminary psychometric data from the Spanish version of the Dieting Peer Competitiveness Scale, including statistical item analysis, research about this instrument's internal structure, and a reliability analysis, from a sample of 1067 secondary school adolescents. The second objective of the study corresponds to confirmatory factor analysis of the scale's internal structure, as well as analysis for evidence of validity from a sample of 1075 adolescents.

  17. Scaled test statistics and robust standard errors for non-normal data in covariance structure analysis: a Monte Carlo study.

    PubMed

    Chou, C P; Bentler, P M; Satorra, A

    1991-11-01

    Research studying robustness of maximum likelihood (ML) statistics in covariance structure analysis has concluded that test statistics and standard errors are biased under severe non-normality. An estimation procedure known as asymptotic distribution free (ADF), making no distributional assumption, has been suggested to avoid these biases. Corrections to the normal theory statistics to yield more adequate performance have also been proposed. This study compares the performance of a scaled test statistic and robust standard errors for two models under several non-normal conditions and also compares these with the results from ML and ADF methods. Both ML and ADF test statistics performed rather well in one model and considerably worse in the other. In general, the scaled test statistic seemed to behave better than the ML test statistic and the ADF statistic performed the worst. The robust and ADF standard errors yielded more appropriate estimates of sampling variability than the ML standard errors, which were usually downward biased, in both models under most of the non-normal conditions. ML test statistics and standard errors were found to be quite robust to the violation of the normality assumption when data had either symmetric and platykurtic distributions, or non-symmetric and zero kurtotic distributions.

  18. Evaluating Composite Sampling Methods of Bacillus Spores at Low Concentrations

    PubMed Central

    Hess, Becky M.; Amidan, Brett G.; Anderson, Kevin K.; Hutchison, Janine R.

    2016-01-01

    Restoring all facility operations after the 2001 Amerithrax attacks took years to complete, highlighting the need to reduce remediation time. Some of the most time intensive tasks were environmental sampling and sample analyses. Composite sampling allows disparate samples to be combined, with only a single analysis needed, making it a promising method to reduce response times. We developed a statistical experimental design to test three different composite sampling methods: 1) single medium single pass composite (SM-SPC): a single cellulose sponge samples multiple coupons with a single pass across each coupon; 2) single medium multi-pass composite: a single cellulose sponge samples multiple coupons with multiple passes across each coupon (SM-MPC); and 3) multi-medium post-sample composite (MM-MPC): a single cellulose sponge samples a single surface, and then multiple sponges are combined during sample extraction. Five spore concentrations of Bacillus atrophaeus Nakamura spores were tested; concentrations ranged from 5 to 100 CFU/coupon (0.00775 to 0.155 CFU/cm2). Study variables included four clean surface materials (stainless steel, vinyl tile, ceramic tile, and painted dry wallboard) and three grime coated/dirty materials (stainless steel, vinyl tile, and ceramic tile). Analysis of variance for the clean study showed two significant factors: composite method (p< 0.0001) and coupon material (p = 0.0006). Recovery efficiency (RE) was higher overall using the MM-MPC method compared to the SM-SPC and SM-MPC methods. RE with the MM-MPC method for concentrations tested (10 to 100 CFU/coupon) was similar for ceramic tile, dry wall, and stainless steel for clean materials. RE was lowest for vinyl tile with both composite methods. Statistical tests for the dirty study showed RE was significantly higher for vinyl and stainless steel materials, but lower for ceramic tile. These results suggest post-sample compositing can be used to reduce sample analysis time when responding to a Bacillus anthracis contamination event of clean or dirty surfaces. PMID:27736999

  19. Evaluating Composite Sampling Methods of Bacillus Spores at Low Concentrations.

    PubMed

    Hess, Becky M; Amidan, Brett G; Anderson, Kevin K; Hutchison, Janine R

    2016-01-01

    Restoring all facility operations after the 2001 Amerithrax attacks took years to complete, highlighting the need to reduce remediation time. Some of the most time intensive tasks were environmental sampling and sample analyses. Composite sampling allows disparate samples to be combined, with only a single analysis needed, making it a promising method to reduce response times. We developed a statistical experimental design to test three different composite sampling methods: 1) single medium single pass composite (SM-SPC): a single cellulose sponge samples multiple coupons with a single pass across each coupon; 2) single medium multi-pass composite: a single cellulose sponge samples multiple coupons with multiple passes across each coupon (SM-MPC); and 3) multi-medium post-sample composite (MM-MPC): a single cellulose sponge samples a single surface, and then multiple sponges are combined during sample extraction. Five spore concentrations of Bacillus atrophaeus Nakamura spores were tested; concentrations ranged from 5 to 100 CFU/coupon (0.00775 to 0.155 CFU/cm2). Study variables included four clean surface materials (stainless steel, vinyl tile, ceramic tile, and painted dry wallboard) and three grime coated/dirty materials (stainless steel, vinyl tile, and ceramic tile). Analysis of variance for the clean study showed two significant factors: composite method (p< 0.0001) and coupon material (p = 0.0006). Recovery efficiency (RE) was higher overall using the MM-MPC method compared to the SM-SPC and SM-MPC methods. RE with the MM-MPC method for concentrations tested (10 to 100 CFU/coupon) was similar for ceramic tile, dry wall, and stainless steel for clean materials. RE was lowest for vinyl tile with both composite methods. Statistical tests for the dirty study showed RE was significantly higher for vinyl and stainless steel materials, but lower for ceramic tile. These results suggest post-sample compositing can be used to reduce sample analysis time when responding to a Bacillus anthracis contamination event of clean or dirty surfaces.

  20. Evaluating Composite Sampling Methods of Bacillus spores at Low Concentrations

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hess, Becky M.; Amidan, Brett G.; Anderson, Kevin K.

    Restoring facility operations after the 2001 Amerithrax attacks took over three months to complete, highlighting the need to reduce remediation time. The most time intensive tasks were environmental sampling and sample analyses. Composite sampling allows disparate samples to be combined, with only a single analysis needed, making it a promising method to reduce response times. We developed a statistical experimental design to test three different composite sampling methods: 1) single medium single pass composite: a single cellulose sponge samples multiple coupons; 2) single medium multi-pass composite: a single cellulose sponge is used to sample multiple coupons; and 3) multi-medium post-samplemore » composite: a single cellulose sponge samples a single surface, and then multiple sponges are combined during sample extraction. Five spore concentrations of Bacillus atrophaeus Nakamura spores were tested; concentrations ranged from 5 to 100 CFU/coupon (0.00775 to 0.155CFU/cm2, respectively). Study variables included four clean surface materials (stainless steel, vinyl tile, ceramic tile, and painted wallboard) and three grime coated/dirty materials (stainless steel, vinyl tile, and ceramic tile). Analysis of variance for the clean study showed two significant factors: composite method (p-value < 0.0001) and coupon material (p-value = 0.0008). Recovery efficiency (RE) was higher overall using the post-sample composite (PSC) method compared to single medium composite from both clean and grime coated materials. RE with the PSC method for concentrations tested (10 to 100 CFU/coupon) was similar for ceramic tile, painted wall board, and stainless steel for clean materials. RE was lowest for vinyl tile with both composite methods. Statistical tests for the dirty study showed RE was significantly higher for vinyl and stainless steel materials, but significantly lower for ceramic tile. These results suggest post-sample compositing can be used to reduce sample analysis time when responding to a Bacillus anthracis contamination event of clean or dirty surfaces.« less

  1. Evaluation of errors in quantitative determination of asbestos in rock

    NASA Astrophysics Data System (ADS)

    Baietto, Oliviero; Marini, Paola; Vitaliti, Martina

    2016-04-01

    The quantitative determination of the content of asbestos in rock matrices is a complex operation which is susceptible to important errors. The principal methodologies for the analysis are Scanning Electron Microscopy (SEM) and Phase Contrast Optical Microscopy (PCOM). Despite the PCOM resolution is inferior to that of SEM, PCOM analysis has several advantages, including more representativity of the analyzed sample, more effective recognition of chrysotile and a lower cost. The DIATI LAA internal methodology for the analysis in PCOM is based on a mild grinding of a rock sample, its subdivision in 5-6 grain size classes smaller than 2 mm and a subsequent microscopic analysis of a portion of each class. The PCOM is based on the optical properties of asbestos and of the liquids with note refractive index in which the particles in analysis are immersed. The error evaluation in the analysis of rock samples, contrary to the analysis of airborne filters, cannot be based on a statistical distribution. In fact for airborne filters a binomial distribution (Poisson), which theoretically defines the variation in the count of fibers resulting from the observation of analysis fields, chosen randomly on the filter, can be applied. The analysis in rock matrices instead cannot lean on any statistical distribution because the most important object of the analysis is the size of the of asbestiform fibers and bundles of fibers observed and the resulting relationship between the weights of the fibrous component compared to the one granular. The error evaluation generally provided by public and private institutions varies between 50 and 150 percent, but there are not, however, specific studies that discuss the origin of the error or that link it to the asbestos content. Our work aims to provide a reliable estimation of the error in relation to the applied methodologies and to the total content of asbestos, especially for the values close to the legal limits. The error assessments must be made through the repetition of the same analysis on the same sample to try to estimate the error on the representativeness of the sample and the error related to the sensitivity of the operator, in order to provide a sufficiently reliable uncertainty of the method. We used about 30 natural rock samples with different asbestos content, performing 3 analysis on each sample to obtain a trend sufficiently representative of the percentage. Furthermore we made on one chosen sample 10 repetition of the analysis to try to define more specifically the error of the methodology.

  2. Evaluation of pesticide residues of organochlorine in vegetables and fruits in Qatar: statistical analysis.

    PubMed

    Al-Shamary, Noora M; Al-Ghouti, Mohammad A; Al-Shaikh, Ismail; Al-Meer, Saeed H; Ahmad, Talaat A

    2016-03-01

    The study aimed to examine the residues of organochlorines pesticides (OCPs) in vegetables and fruits in Qatar. A total of 127 samples was studied. Ninety percent of the imported samples recorded residues above the maximum residue levels (MRLs). The most frequently detected OCP in the samples was heptachlor (found in 75 samples). In the comparisons between the washed and unwashed samples, no significant differences were observed (P > 0.05). However, the effect of washing process with tap water depended on the type of vegetables and fruits.

  3. Statistical Analysis of Adaptive Beam-Forming Methods

    DTIC Science & Technology

    1988-05-01

    minimum amount of computing resources? * What are the tradeoffs being made when a system design selects block averaging over exponential averaging? Will...understood by many signal processing practitioners, however, is how system parameters and the number of sensors effect the distribution of the... system performance improve and if so by how much? b " It is well known that the noise sampled at adjacent sensors is not statistically independent

  4. Spatial Statistical Models and Optimal Survey Design for Rapid Geophysical characterization of UXO Sites

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    G. Ostrouchov; W.E.Doll; D.A.Wolf

    2003-07-01

    Unexploded ordnance(UXO)surveys encompass large areas, and the cost of surveying these areas can be high. Enactment of earlier protocols for sampling UXO sites have shown the shortcomings of these procedures and led to a call for development of scientifically defensible statistical procedures for survey design and analysis. This project is one of three funded by SERDP to address this need.

  5. Surviving and Thriving: The Adaptive Responses of U.S. Four-Year Colleges and Universities during the Great Recession

    ERIC Educational Resources Information Center

    Brint, Steven; Yoshikawa, Sarah R. K.; Rotondi, Matthew B.; Viggiano, Tiffany; Maldonado, John

    2016-01-01

    Press reports and industry statistics both give incomplete pictures of the outcomes of the Great Recession for U.S. four-year colleges and universities. To address these gaps, we conducted a statistical analysis of all articles that appeared in Lexis-Nexis on a sample of more than 300 U.S. colleges and universities during the Recession years. We…

  6. Where statistics and molecular microarray experiments biology meet.

    PubMed

    Kelmansky, Diana M

    2013-01-01

    This review chapter presents a statistical point of view to microarray experiments with the purpose of understanding the apparent contradictions that often appear in relation to their results. We give a brief introduction of molecular biology for nonspecialists. We describe microarray experiments from their construction and the biological principles the experiments rely on, to data acquisition and analysis. The role of epidemiological approaches and sample size considerations are also discussed.

  7. Challenges of Big Data Analysis.

    PubMed

    Fan, Jianqing; Han, Fang; Liu, Han

    2014-06-01

    Big Data bring new opportunities to modern society and challenges to data scientists. On one hand, Big Data hold great promises for discovering subtle population patterns and heterogeneities that are not possible with small-scale data. On the other hand, the massive sample size and high dimensionality of Big Data introduce unique computational and statistical challenges, including scalability and storage bottleneck, noise accumulation, spurious correlation, incidental endogeneity, and measurement errors. These challenges are distinguished and require new computational and statistical paradigm. This article gives overviews on the salient features of Big Data and how these features impact on paradigm change on statistical and computational methods as well as computing architectures. We also provide various new perspectives on the Big Data analysis and computation. In particular, we emphasize on the viability of the sparsest solution in high-confidence set and point out that exogeneous assumptions in most statistical methods for Big Data can not be validated due to incidental endogeneity. They can lead to wrong statistical inferences and consequently wrong scientific conclusions.

  8. Detecting subtle hydrochemical anomalies with multivariate statistics: an example from homogeneous groundwaters in the Great Artesian Basin, Australia

    NASA Astrophysics Data System (ADS)

    O'Shea, Bethany; Jankowski, Jerzy

    2006-12-01

    The major ion composition of Great Artesian Basin groundwater in the lower Namoi River valley is relatively homogeneous in chemical composition. Traditional graphical techniques have been combined with multivariate statistical methods to determine whether subtle differences in the chemical composition of these waters can be delineated. Hierarchical cluster analysis and principal components analysis were successful in delineating minor variations within the groundwaters of the study area that were not visually identified in the graphical techniques applied. Hydrochemical interpretation allowed geochemical processes to be identified in each statistically defined water type and illustrated how these groundwaters differ from one another. Three main geochemical processes were identified in the groundwaters: ion exchange, precipitation, and mixing between waters from different sources. Both statistical methods delineated an anomalous sample suspected of being influenced by magmatic CO2 input. The use of statistical methods to complement traditional graphical techniques for waters appearing homogeneous is emphasized for all investigations of this type. Copyright

  9. Challenges of Big Data Analysis

    PubMed Central

    Fan, Jianqing; Han, Fang; Liu, Han

    2014-01-01

    Big Data bring new opportunities to modern society and challenges to data scientists. On one hand, Big Data hold great promises for discovering subtle population patterns and heterogeneities that are not possible with small-scale data. On the other hand, the massive sample size and high dimensionality of Big Data introduce unique computational and statistical challenges, including scalability and storage bottleneck, noise accumulation, spurious correlation, incidental endogeneity, and measurement errors. These challenges are distinguished and require new computational and statistical paradigm. This article gives overviews on the salient features of Big Data and how these features impact on paradigm change on statistical and computational methods as well as computing architectures. We also provide various new perspectives on the Big Data analysis and computation. In particular, we emphasize on the viability of the sparsest solution in high-confidence set and point out that exogeneous assumptions in most statistical methods for Big Data can not be validated due to incidental endogeneity. They can lead to wrong statistical inferences and consequently wrong scientific conclusions. PMID:25419469

  10. Statistical benchmark for BosonSampling

    NASA Astrophysics Data System (ADS)

    Walschaers, Mattia; Kuipers, Jack; Urbina, Juan-Diego; Mayer, Klaus; Tichy, Malte Christopher; Richter, Klaus; Buchleitner, Andreas

    2016-03-01

    Boson samplers—set-ups that generate complex many-particle output states through the transmission of elementary many-particle input states across a multitude of mutually coupled modes—promise the efficient quantum simulation of a classically intractable computational task, and challenge the extended Church-Turing thesis, one of the fundamental dogmas of computer science. However, as in all experimental quantum simulations of truly complex systems, one crucial problem remains: how to certify that a given experimental measurement record unambiguously results from enforcing the claimed dynamics, on bosons, fermions or distinguishable particles? Here we offer a statistical solution to the certification problem, identifying an unambiguous statistical signature of many-body quantum interference upon transmission across a multimode, random scattering device. We show that statistical analysis of only partial information on the output state allows to characterise the imparted dynamics through particle type-specific features of the emerging interference patterns. The relevant statistical quantifiers are classically computable, define a falsifiable benchmark for BosonSampling, and reveal distinctive features of many-particle quantum dynamics, which go much beyond mere bunching or anti-bunching effects.

  11. Meta-Analytic Derivation

    ERIC Educational Resources Information Center

    Snell, Joel C.; Marsh, Mitchell

    2011-01-01

    The authors have over the years tried to revise meta-analysis because it's basic premise is to add apples and oranges together and analyze. In other words, various data on the same subject are chosen using different samples, research strategies, and number properties. The findings are then homogenized and a statistical analysis is used (Snell, J.…

  12. Content Analysis of Papers Submitted to "Communications in Information Literacy," 2007-2013

    ERIC Educational Resources Information Center

    Hollister, Christopher V.

    2014-01-01

    The author conducted a content analysis of papers submitted to the journal, "Communications in Information Literacy," from the years 2007-2013. The purpose was to investigate and report on the overall quality characteristics of a statistically significant sample of papers submitted to a single-topic, open access, library and information…

  13. Analysis Tools (AT)

    Treesearch

    Larry J. Gangi

    2006-01-01

    The FIREMON Analysis Tools program is designed to let the user perform grouped or ungrouped summary calculations of single measurement plot data, or statistical comparisons of grouped or ungrouped plot data taken at different sampling periods. The program allows the user to create reports and graphs, save and print them, or cut and paste them into a word processor....

  14. Estimating annual bole biomass production using uncertainty analysis

    Treesearch

    Travis J. Woolley; Mark E. Harmon; Kari B. O' Connell

    2007-01-01

    Two common sampling methodologies coupled with a simple statistical model were evaluated to determine the accuracy and precision of annual bole biomass production (BBP) and inter-annual variability estimates using this type of approach. We performed an uncertainty analysis using Monte Carlo methods in conjunction with radial growth core data from trees in three Douglas...

  15. Private forest owners and invasive plants: risk perception and management

    Treesearch

    A. Paige Fischer; Susan Charnley

    2012-01-01

    We investigated nonindustrial private forest (NIPF) owners' invasive plant risk perceptions and mitigation practices using statistical analysis of mail survey data and qualitative analysis of interview data collected in Oregon's ponderosa pine zone. We found that 52% of the survey sample was aware of invasive plant species considered problematic by local...

  16. Measuring the statistical validity of summary meta‐analysis and meta‐regression results for use in clinical practice

    PubMed Central

    Riley, Richard D.

    2017-01-01

    An important question for clinicians appraising a meta‐analysis is: are the findings likely to be valid in their own practice—does the reported effect accurately represent the effect that would occur in their own clinical population? To this end we advance the concept of statistical validity—where the parameter being estimated equals the corresponding parameter for a new independent study. Using a simple (‘leave‐one‐out’) cross‐validation technique, we demonstrate how we may test meta‐analysis estimates for statistical validity using a new validation statistic, Vn, and derive its distribution. We compare this with the usual approach of investigating heterogeneity in meta‐analyses and demonstrate the link between statistical validity and homogeneity. Using a simulation study, the properties of Vn and the Q statistic are compared for univariate random effects meta‐analysis and a tailored meta‐regression model, where information from the setting (included as model covariates) is used to calibrate the summary estimate to the setting of application. Their properties are found to be similar when there are 50 studies or more, but for fewer studies Vn has greater power but a higher type 1 error rate than Q. The power and type 1 error rate of Vn are also shown to depend on the within‐study variance, between‐study variance, study sample size, and the number of studies in the meta‐analysis. Finally, we apply Vn to two published meta‐analyses and conclude that it usefully augments standard methods when deciding upon the likely validity of summary meta‐analysis estimates in clinical practice. © 2017 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd. PMID:28620945

  17. Assessment of statistical methods used in library-based approaches to microbial source tracking.

    PubMed

    Ritter, Kerry J; Carruthers, Ethan; Carson, C Andrew; Ellender, R D; Harwood, Valerie J; Kingsley, Kyle; Nakatsu, Cindy; Sadowsky, Michael; Shear, Brian; West, Brian; Whitlock, John E; Wiggins, Bruce A; Wilbur, Jayson D

    2003-12-01

    Several commonly used statistical methods for fingerprint identification in microbial source tracking (MST) were examined to assess the effectiveness of pattern-matching algorithms to correctly identify sources. Although numerous statistical methods have been employed for source identification, no widespread consensus exists as to which is most appropriate. A large-scale comparison of several MST methods, using identical fecal sources, presented a unique opportunity to assess the utility of several popular statistical methods. These included discriminant analysis, nearest neighbour analysis, maximum similarity and average similarity, along with several measures of distance or similarity. Threshold criteria for excluding uncertain or poorly matched isolates from final analysis were also examined for their ability to reduce false positives and increase prediction success. Six independent libraries used in the study were constructed from indicator bacteria isolated from fecal materials of humans, seagulls, cows and dogs. Three of these libraries were constructed using the rep-PCR technique and three relied on antibiotic resistance analysis (ARA). Five of the libraries were constructed using Escherichia coli and one using Enterococcus spp. (ARA). Overall, the outcome of this study suggests a high degree of variability across statistical methods. Despite large differences in correct classification rates among the statistical methods, no single statistical approach emerged as superior. Thresholds failed to consistently increase rates of correct classification and improvement was often associated with substantial effective sample size reduction. Recommendations are provided to aid in selecting appropriate analyses for these types of data.

  18. Multivariate analysis for stormwater quality characteristics identification from different urban surface types in macau.

    PubMed

    Huang, J; Du, P; Ao, C; Ho, M; Lei, M; Zhao, D; Wang, Z

    2007-12-01

    Statistical analysis of stormwater runoff data enables general identification of runoff characteristics. Six catchments with different urban surface type including roofs, roadway, park, and residential/commercial in Macau were selected for sampling and study during the period from June 2005 to September 2006. Based on univariate statistical analysis of data sampled, major pollutants discharged from different urban surface type were identified. As for iron roof runoff, Zn is the most significant pollutant. The major pollutants from urban roadway runoff are TSS and COD. Stormwater runoff from commercial/residential and Park catchments show high level of COD, TN, and TP concentration. Principal component analysis was further done for identification of linkages between stormwater quality and urban surface types. Two potential pollution sources were identified for study catchments with different urban surface types. The first one is referred as nutrients losses, soil losses and organic pollutants discharges, the second is related to heavy metals losses. PCA was proved to be a viable tool to explain the type of pollution sources and its mechanism for different urban surface type catchments.

  19. Assessing the statistical significance of the achieved classification error of classifiers constructed using serum peptide profiles, and a prescription for random sampling repeated studies for massive high-throughput genomic and proteomic studies.

    PubMed

    Lyons-Weiler, James; Pelikan, Richard; Zeh, Herbert J; Whitcomb, David C; Malehorn, David E; Bigbee, William L; Hauskrecht, Milos

    2005-01-01

    Peptide profiles generated using SELDI/MALDI time of flight mass spectrometry provide a promising source of patient-specific information with high potential impact on the early detection and classification of cancer and other diseases. The new profiling technology comes, however, with numerous challenges and concerns. Particularly important are concerns of reproducibility of classification results and their significance. In this work we describe a computational validation framework, called PACE (Permutation-Achieved Classification Error), that lets us assess, for a given classification model, the significance of the Achieved Classification Error (ACE) on the profile data. The framework compares the performance statistic of the classifier on true data samples and checks if these are consistent with the behavior of the classifier on the same data with randomly reassigned class labels. A statistically significant ACE increases our belief that a discriminative signal was found in the data. The advantage of PACE analysis is that it can be easily combined with any classification model and is relatively easy to interpret. PACE analysis does not protect researchers against confounding in the experimental design, or other sources of systematic or random error. We use PACE analysis to assess significance of classification results we have achieved on a number of published data sets. The results show that many of these datasets indeed possess a signal that leads to a statistically significant ACE.

  20. Experimental design, power and sample size for animal reproduction experiments.

    PubMed

    Chapman, Phillip L; Seidel, George E

    2008-01-01

    The present paper concerns statistical issues in the design of animal reproduction experiments, with emphasis on the problems of sample size determination and power calculations. We include examples and non-technical discussions aimed at helping researchers avoid serious errors that may invalidate or seriously impair the validity of conclusions from experiments. Screen shots from interactive power calculation programs and basic SAS power calculation programs are presented to aid in understanding statistical power and computing power in some common experimental situations. Practical issues that are common to most statistical design problems are briefly discussed. These include one-sided hypothesis tests, power level criteria, equality of within-group variances, transformations of response variables to achieve variance equality, optimal specification of treatment group sizes, 'post hoc' power analysis and arguments for the increased use of confidence intervals in place of hypothesis tests.

  1. An analysis of the cognitive deficit of schizophrenia based on the Piaget developmental theory.

    PubMed

    Torres, Alejandro; Olivares, Jose M; Rodriguez, Angel; Vaamonde, Antonio; Berrios, German E

    2007-01-01

    The objective of the study was to evaluate from the perspective of the Piaget developmental model the cognitive functioning of a sample of patients diagnosed with schizophrenia. Fifty patients with schizophrenia (Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition) and 40 healthy matched controls were evaluated by means of the Longeot Logical Thought Evaluation Scale. Only 6% of the subjects with schizophrenia reached the "formal period," and 70% remained at the "concrete operations" stage. The corresponding figures for the control sample were 25% and 15%, respectively. These differences were statistically significant. The samples were specifically differentiable on the permutation, probabilities, and pendulum tests of the scale. The Longeot Logical Thought Evaluation Scale can discriminate between subjects with schizophrenia and healthy controls.

  2. Cloud-based solution to identify statistically significant MS peaks differentiating sample categories.

    PubMed

    Ji, Jun; Ling, Jeffrey; Jiang, Helen; Wen, Qiaojun; Whitin, John C; Tian, Lu; Cohen, Harvey J; Ling, Xuefeng B

    2013-03-23

    Mass spectrometry (MS) has evolved to become the primary high throughput tool for proteomics based biomarker discovery. Until now, multiple challenges in protein MS data analysis remain: large-scale and complex data set management; MS peak identification, indexing; and high dimensional peak differential analysis with the concurrent statistical tests based false discovery rate (FDR). "Turnkey" solutions are needed for biomarker investigations to rapidly process MS data sets to identify statistically significant peaks for subsequent validation. Here we present an efficient and effective solution, which provides experimental biologists easy access to "cloud" computing capabilities to analyze MS data. The web portal can be accessed at http://transmed.stanford.edu/ssa/. Presented web application supplies large scale MS data online uploading and analysis with a simple user interface. This bioinformatic tool will facilitate the discovery of the potential protein biomarkers using MS.

  3. Compositional Solution Space Quantification for Probabilistic Software Analysis

    NASA Technical Reports Server (NTRS)

    Borges, Mateus; Pasareanu, Corina S.; Filieri, Antonio; d'Amorim, Marcelo; Visser, Willem

    2014-01-01

    Probabilistic software analysis aims at quantifying how likely a target event is to occur during program execution. Current approaches rely on symbolic execution to identify the conditions to reach the target event and try to quantify the fraction of the input domain satisfying these conditions. Precise quantification is usually limited to linear constraints, while only approximate solutions can be provided in general through statistical approaches. However, statistical approaches may fail to converge to an acceptable accuracy within a reasonable time. We present a compositional statistical approach for the efficient quantification of solution spaces for arbitrarily complex constraints over bounded floating-point domains. The approach leverages interval constraint propagation to improve the accuracy of the estimation by focusing the sampling on the regions of the input domain containing the sought solutions. Preliminary experiments show significant improvement on previous approaches both in results accuracy and analysis time.

  4. Sample size in psychological research over the past 30 years.

    PubMed

    Marszalek, Jacob M; Barber, Carolyn; Kohlhart, Julie; Holmes, Cooper B

    2011-04-01

    The American Psychological Association (APA) Task Force on Statistical Inference was formed in 1996 in response to a growing body of research demonstrating methodological issues that threatened the credibility of psychological research, and made recommendations to address them. One issue was the small, even dramatically inadequate, size of samples used in studies published by leading journals. The present study assessed the progress made since the Task Force's final report in 1999. Sample sizes reported in four leading APA journals in 1955, 1977, 1995, and 2006 were compared using nonparametric statistics, while data from the last two waves were fit to a hierarchical generalized linear growth model for more in-depth analysis. Overall, results indicate that the recommendations for increasing sample sizes have not been integrated in core psychological research, although results slightly vary by field. This and other implications are discussed in the context of current methodological critique and practice.

  5. Optimizing Integrated Terminal Airspace Operations Under Uncertainty

    NASA Technical Reports Server (NTRS)

    Bosson, Christabelle; Xue, Min; Zelinski, Shannon

    2014-01-01

    In the terminal airspace, integrated departures and arrivals have the potential to increase operations efficiency. Recent research has developed geneticalgorithm- based schedulers for integrated arrival and departure operations under uncertainty. This paper presents an alternate method using a machine jobshop scheduling formulation to model the integrated airspace operations. A multistage stochastic programming approach is chosen to formulate the problem and candidate solutions are obtained by solving sample average approximation problems with finite sample size. Because approximate solutions are computed, the proposed algorithm incorporates the computation of statistical bounds to estimate the optimality of the candidate solutions. A proof-ofconcept study is conducted on a baseline implementation of a simple problem considering a fleet mix of 14 aircraft evolving in a model of the Los Angeles terminal airspace. A more thorough statistical analysis is also performed to evaluate the impact of the number of scenarios considered in the sampled problem. To handle extensive sampling computations, a multithreading technique is introduced.

  6. Classical Statistics and Statistical Learning in Imaging Neuroscience

    PubMed Central

    Bzdok, Danilo

    2017-01-01

    Brain-imaging research has predominantly generated insight by means of classical statistics, including regression-type analyses and null-hypothesis testing using t-test and ANOVA. Throughout recent years, statistical learning methods enjoy increasing popularity especially for applications in rich and complex data, including cross-validated out-of-sample prediction using pattern classification and sparsity-inducing regression. This concept paper discusses the implications of inferential justifications and algorithmic methodologies in common data analysis scenarios in neuroimaging. It is retraced how classical statistics and statistical learning originated from different historical contexts, build on different theoretical foundations, make different assumptions, and evaluate different outcome metrics to permit differently nuanced conclusions. The present considerations should help reduce current confusion between model-driven classical hypothesis testing and data-driven learning algorithms for investigating the brain with imaging techniques. PMID:29056896

  7. Statistical models for the analysis and design of digital polymerase chain (dPCR) experiments

    USGS Publications Warehouse

    Dorazio, Robert; Hunter, Margaret

    2015-01-01

    Statistical methods for the analysis and design of experiments using digital PCR (dPCR) have received only limited attention and have been misused in many instances. To address this issue and to provide a more general approach to the analysis of dPCR data, we describe a class of statistical models for the analysis and design of experiments that require quantification of nucleic acids. These models are mathematically equivalent to generalized linear models of binomial responses that include a complementary, log–log link function and an offset that is dependent on the dPCR partition volume. These models are both versatile and easy to fit using conventional statistical software. Covariates can be used to specify different sources of variation in nucleic acid concentration, and a model’s parameters can be used to quantify the effects of these covariates. For purposes of illustration, we analyzed dPCR data from different types of experiments, including serial dilution, evaluation of copy number variation, and quantification of gene expression. We also showed how these models can be used to help design dPCR experiments, as in selection of sample sizes needed to achieve desired levels of precision in estimates of nucleic acid concentration or to detect differences in concentration among treatments with prescribed levels of statistical power.

  8. Statistical Models for the Analysis and Design of Digital Polymerase Chain Reaction (dPCR) Experiments.

    PubMed

    Dorazio, Robert M; Hunter, Margaret E

    2015-11-03

    Statistical methods for the analysis and design of experiments using digital PCR (dPCR) have received only limited attention and have been misused in many instances. To address this issue and to provide a more general approach to the analysis of dPCR data, we describe a class of statistical models for the analysis and design of experiments that require quantification of nucleic acids. These models are mathematically equivalent to generalized linear models of binomial responses that include a complementary, log-log link function and an offset that is dependent on the dPCR partition volume. These models are both versatile and easy to fit using conventional statistical software. Covariates can be used to specify different sources of variation in nucleic acid concentration, and a model's parameters can be used to quantify the effects of these covariates. For purposes of illustration, we analyzed dPCR data from different types of experiments, including serial dilution, evaluation of copy number variation, and quantification of gene expression. We also showed how these models can be used to help design dPCR experiments, as in selection of sample sizes needed to achieve desired levels of precision in estimates of nucleic acid concentration or to detect differences in concentration among treatments with prescribed levels of statistical power.

  9. Statistical analysis and interpolation of compositional data in materials science.

    PubMed

    Pesenson, Misha Z; Suram, Santosh K; Gregoire, John M

    2015-02-09

    Compositional data are ubiquitous in chemistry and materials science: analysis of elements in multicomponent systems, combinatorial problems, etc., lead to data that are non-negative and sum to a constant (for example, atomic concentrations). The constant sum constraint restricts the sampling space to a simplex instead of the usual Euclidean space. Since statistical measures such as mean and standard deviation are defined for the Euclidean space, traditional correlation studies, multivariate analysis, and hypothesis testing may lead to erroneous dependencies and incorrect inferences when applied to compositional data. Furthermore, composition measurements that are used for data analytics may not include all of the elements contained in the material; that is, the measurements may be subcompositions of a higher-dimensional parent composition. Physically meaningful statistical analysis must yield results that are invariant under the number of composition elements, requiring the application of specialized statistical tools. We present specifics and subtleties of compositional data processing through discussion of illustrative examples. We introduce basic concepts, terminology, and methods required for the analysis of compositional data and utilize them for the spatial interpolation of composition in a sputtered thin film. The results demonstrate the importance of this mathematical framework for compositional data analysis (CDA) in the fields of materials science and chemistry.

  10. Environmental assessment of Al-Hammar Marsh, Southern Iraq.

    PubMed

    Al-Gburi, Hind Fadhil Abdullah; Al-Tawash, Balsam Salim; Al-Lafta, Hadi Salim

    2017-02-01

    (a) To determine the spatial distributions and levels of major and minor elements, as well as heavy metals, in water, sediment, and biota (plant and fish) in Al-Hammar Marsh, southern Iraq, and ultimately to supply more comprehensive information for policy-makers to manage the contaminants input into the marsh so that their concentrations do not reach toxic levels. (b) to characterize the seasonal changes in the marsh surface water quality. (c) to address the potential environmental risk of these elements by comparison with the historical levels and global quality guidelines (i.e., World Health Organization (WHO) standard limits). (d) to define the sources of these elements (i.e., natural and/or anthropogenic) using combined multivariate statistical techniques such as Principal Component Analysis (PCA) and Agglomerative Hierarchical Cluster Analysis (AHCA) along with pollution analysis (i.e., enrichment factor analysis). Water, sediment, plant, and fish samples were collected from the marsh, and analyzed for major and minor ions, as well as heavy metals, and then compared to historical levels and global quality guidelines (WHO guidelines). Then, multivariate statistical techniques, such as PCA and AHCA, were used to determine the element sourcing. Water analyses revealed unacceptable values for almost all physio-chemical and biological properties, according to WHO standard limits for drinking water. Almost all major ions and heavy metal concentrations in water showed a distinct decreasing trend at the marsh outlet station compared to other stations. In general, major and minor ions, as well as heavy metals exhibit higher concentrations in winter than in summer. Sediment analyses using multivariate statistical techniques revealed that Mg, Fe, S, P, V, Zn, As, Se, Mo, Co, Ni, Cu, Sr, Br, Cd, Ca, N, Mn, Cr, and Pb were derived from anthropogenic sources, while Al, Si, Ti, K, and Zr were primarily derived from natural sources. Enrichment factor analysis gave results compatible with multivariate statistical techniques findings. Analysis of heavy metals in plant samples revealed that there is no pollution in plants in Al-Hammar Marsh. However, the concentrations of heavy metals in fish samples showed that all samples were contaminated by Pb, Mn, and Ni, while some samples were contaminated by Pb, Mn, and Ni. Decreasing of Tigris and Euphrates discharges during the past decades due to drought conditions and upstream damming, as well as the increasing stress of wastewater effluents from anthropogenic activities, led to degradation of the downstream Al-Hammar Marsh water quality in terms of physical, chemical, and biological properties. As such properties were found to consistently exceed the historical and global quality objectives. However, element concentration decreasing trend at the marsh outlet station compared to other stations indicate that the marsh plays an important role as a natural filtration and bioremediation system. Higher element concentrations in winter were due to runoff from the washing of the surrounding Sabkha during flooding by winter rainstorms. Finally, the high concentrations of heavy metals in fish samples can be attributed to bioaccumulation and biomagnification processes.

  11. Pots, plates and provenance: sourcing Indian coarse wares from Mleiha using X-ray fluorescence (XRF) spectrometry analysis

    NASA Astrophysics Data System (ADS)

    Reddy, A.; Attaelmanan, A. G.; Mouton, M.

    2012-07-01

    The identification of more than 25% of the pottery sherds from the late PIR.D period (ca. 2nd - mid. 3rd c. AD) assemblage from the recently excavated building H at Mleiha as Indian is based on form and fabric, but using only visual assessment. Petrographic analysis of the fabrics can provide more precise indicators of the geographical origin of the wares. In this study, a total of 21 sherds from various key sites in Western India were compared with 7 different 'Indian' coarse-ware vessels sampled at Mleiha using X-ray fluorescence (XRF) spectrometry. The analyses were conducted on powdered samples collected from the core of each sherd. Each sample was irradiated for 1000 seconds using a 1.2 mm diameter X-ray beam. The resulting spectra were used for quantification of the X-ray intensity and elemental concentration. Levels of correlation in the elemental ratios of the sherds were statistically tested using an F-test as well as a Chi-test. Initial review of the XRF results indicates that the Maharashtra and Gujarat regions of India are probable source areas for at least two of the types of wares. Collection of additional samples from these areas and other regions of India, and further statistical analysis through methods such as Principal Component Analysis will help to isolate groups of wares from India and correlate them with types of vessels imported into the Oman peninsula in antiquity.

  12. rMATS: robust and flexible detection of differential alternative splicing from replicate RNA-Seq data.

    PubMed

    Shen, Shihao; Park, Juw Won; Lu, Zhi-xiang; Lin, Lan; Henry, Michael D; Wu, Ying Nian; Zhou, Qing; Xing, Yi

    2014-12-23

    Ultra-deep RNA sequencing (RNA-Seq) has become a powerful approach for genome-wide analysis of pre-mRNA alternative splicing. We previously developed multivariate analysis of transcript splicing (MATS), a statistical method for detecting differential alternative splicing between two RNA-Seq samples. Here we describe a new statistical model and computer program, replicate MATS (rMATS), designed for detection of differential alternative splicing from replicate RNA-Seq data. rMATS uses a hierarchical model to simultaneously account for sampling uncertainty in individual replicates and variability among replicates. In addition to the analysis of unpaired replicates, rMATS also includes a model specifically designed for paired replicates between sample groups. The hypothesis-testing framework of rMATS is flexible and can assess the statistical significance over any user-defined magnitude of splicing change. The performance of rMATS is evaluated by the analysis of simulated and real RNA-Seq data. rMATS outperformed two existing methods for replicate RNA-Seq data in all simulation settings, and RT-PCR yielded a high validation rate (94%) in an RNA-Seq dataset of prostate cancer cell lines. Our data also provide guiding principles for designing RNA-Seq studies of alternative splicing. We demonstrate that it is essential to incorporate biological replicates in the study design. Of note, pooling RNAs or merging RNA-Seq data from multiple replicates is not an effective approach to account for variability, and the result is particularly sensitive to outliers. The rMATS source code is freely available at rnaseq-mats.sourceforge.net/. As the popularity of RNA-Seq continues to grow, we expect rMATS will be useful for studies of alternative splicing in diverse RNA-Seq projects.

  13. Use of multivariate statistics to identify unreliable data obtained using CASA.

    PubMed

    Martínez, Luis Becerril; Crispín, Rubén Huerta; Mendoza, Maximino Méndez; Gallegos, Oswaldo Hernández; Martínez, Andrés Aragón

    2013-06-01

    In order to identify unreliable data in a dataset of motility parameters obtained from a pilot study acquired by a veterinarian with experience in boar semen handling, but without experience in the operation of a computer assisted sperm analysis (CASA) system, a multivariate graphical and statistical analysis was performed. Sixteen boar semen samples were aliquoted then incubated with varying concentrations of progesterone from 0 to 3.33 µg/ml and analyzed in a CASA system. After standardization of the data, Chernoff faces were pictured for each measurement, and a principal component analysis (PCA) was used to reduce the dimensionality and pre-process the data before hierarchical clustering. The first twelve individual measurements showed abnormal features when Chernoff faces were drawn. PCA revealed that principal components 1 and 2 explained 63.08% of the variance in the dataset. Values of principal components for each individual measurement of semen samples were mapped to identify differences among treatment or among boars. Twelve individual measurements presented low values of principal component 1. Confidence ellipses on the map of principal components showed no statistically significant effects for treatment or boar. Hierarchical clustering realized on two first principal components produced three clusters. Cluster 1 contained evaluations of the two first samples in each treatment, each one of a different boar. With the exception of one individual measurement, all other measurements in cluster 1 were the same as observed in abnormal Chernoff faces. Unreliable data in cluster 1 are probably related to the operator inexperience with a CASA system. These findings could be used to objectively evaluate the skill level of an operator of a CASA system. This may be particularly useful in the quality control of semen analysis using CASA systems.

  14. Uncertainty Analysis of Seebeck Coefficient and Electrical Resistivity Characterization

    NASA Technical Reports Server (NTRS)

    Mackey, Jon; Sehirlioglu, Alp; Dynys, Fred

    2014-01-01

    In order to provide a complete description of a materials thermoelectric power factor, in addition to the measured nominal value, an uncertainty interval is required. The uncertainty may contain sources of measurement error including systematic bias error and precision error of a statistical nature. The work focuses specifically on the popular ZEM-3 (Ulvac Technologies) measurement system, but the methods apply to any measurement system. The analysis accounts for sources of systematic error including sample preparation tolerance, measurement probe placement, thermocouple cold-finger effect, and measurement parameters; in addition to including uncertainty of a statistical nature. Complete uncertainty analysis of a measurement system allows for more reliable comparison of measurement data between laboratories.

  15. Got Power? A Systematic Review of Sample Size Adequacy in Health Professions Education Research

    ERIC Educational Resources Information Center

    Cook, David A.; Hatala, Rose

    2015-01-01

    Many education research studies employ small samples, which in turn lowers statistical power. We re-analyzed the results of a meta-analysis of simulation-based education to determine study power across a range of effect sizes, and the smallest effect that could be plausibly excluded. We systematically searched multiple databases through May 2011,…

  16. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kreuzer-Martin, Helen W.; Hegg, Eric L.

    The use of isotopic signatures for forensic analysis of biological materials is well-established, and the same general principles that apply to interpretation of stable isotope content of C, N, O, and H apply to the analysis of microorganisms. Heterotrophic microorganisms derive their isotopic content from their growth substrates, which are largely plant and animal products, and the water in their culture medium. Thus the isotope signatures of microbes are tied to their growth environment. The C, N, O, and H isotope ratios of spores have been demonstrated to constitute highly discriminating signatures for sample matching. They can rule out specificmore » samples of media and/or water as possible production media, and can predict isotope ratio ranges of the culture media and water used to produce a given sample. These applications have been developed and tested through analyses of approximately 250 samples of Bacillus subtilis spores and over 500 samples of culture media, providing a strong statistical basis for data interpretation. A Bayesian statistical framework for integrating stable isotope data with other types of signatures derived from microorganisms has been able to characterize the culture medium used to produce spores of various Bacillus species, leveraging isotopic differences in different medium types and demonstrating the power of data integration for forensic investigations.« less

  17. Corrosion Analysis of an Experimental Noble Alloy on Commercially Pure Titanium Dental Implants

    PubMed Central

    Bortagaray, Manuel Alberto; Ibañez, Claudio Arturo Antonio; Ibañez, Maria Constanza; Ibañez, Juan Carlos

    2016-01-01

    Objective: To determine whether the Noble Bond® Argen® alloy was electrochemically suitable for the manufacturing of prosthetic superstructures over commercially pure titanium (c.p. Ti) implants. Also, the electrolytic corrosion effects over three types of materials used on prosthetic suprastructures that were coupled with titanium implants were analysed: Noble Bond® (Argen®), Argelite 76sf +® (Argen®), and commercially pure titanium. Materials and Methods: 15 samples were studied, consisting in 1 abutment and one c.p. titanium implant each. They were divided into three groups, namely: Control group: five c.p Titanium abutments (B&W®), Test group 1: five Noble Bond® (Argen®) cast abutments and, Test group 2: five Argelite 76sf +® (Argen®) abutments. In order to observe the corrosion effects, the surface topography was imaged using a confocal microscope. Thus, three metric parameters (Sa: Arithmetical mean height of the surface. Sp: Maximum height of peaks. Sv: Maximum height of valleys.), were measured at three different areas: abutment neck, implant neck and implant body. The samples were immersed in artificial saliva for 3 months, after which the procedure was repeated. The metric parameters were compared by statistical analysis. Results: The analysis of the Sa at the level of the implant neck, abutment neck and implant body, showed no statistically significant differences on combining c.p. Ti implants with the three studied alloys. The Sp showed no statistically significant differences between the three alloys. The Sv showed no statistically significant differences between the three alloys. Conclusion: The effects of electrogalvanic corrosion on each of the materials used when they were in contact with c.p. Ti showed no statistically significant differences. PMID:27733875

  18. Experimental testing of hot mix asphalt mixture made of recycled aggregates.

    PubMed

    Rafi, Muhammad Masood; Qadir, Adnan; Siddiqui, Salman Hameed

    2011-12-01

    The migration of population towards big cities generates rapid construction activities. These activities not only put pressure on natural resources but also produce construction, renovation and demolition waste. There is an urgent need to find out ways to handle this waste owing to growing environmental concerns. This can reduce pressure on natural resources as well. This paper presents the results of experimental studies which were carried out on hot mix asphalt mixture samples. These samples were manufactured by adding recycled aggregates (RA) with natural crushed stone aggregates (CSA). Three levels of addition of RA were considered in the presented studies. RA were obtained from both the concrete waste of construction, renovation and demolition activities and reclaimed asphalt pavement. Separate samples were manufactured with the coarse and fine aggregate fractions of both types of RA. Samples made with CSA were used as control specimens. The samples were prepared and tested using the Marshall method. The performance of the samples was investigated in terms of density-void and stability/flow analysis and was compared with the performance criteria as given by National Highway Authority for wearing course material in Pakistan. Based on this data optimum asphalt contents were determined. All the samples made by adding up to 50% RA conform to the specification requirements of wearing course material as given by National Highway Authority in terms of optimum asphalt contents, voids in mineral aggregates and stability/flow. A statistical analysis of variation of these samples confirmed that addition is also possible statistically.

  19. Statistical analysis of radioimmunoassay. In comparison with bioassay (in Japanese)

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Nakano, R.

    1973-01-01

    Using the data of RIA (radioimmunoassay), statistical procedures for dealing with two problems of the linearization of dose response curve and calculation of relative potency were described. There were three methods for linearization of dose response curve of RIA. In each method, the following parameters were shown on the horizontal and vertical axis: dose x, (B/T)/sup -1/; c/x + c, B/T (C: dose which makes B/T 50%); log x, logit B/T. Among them, the last method seems to be most practical. The statistical procedures for bioassay were employed for calculating the relative potency of unknown samples compared to the standardmore » samples from dose response curves of standand and unknown samples using regression coefficient. It is desirable that relative potency is calculated by plotting more than 5 points in the standard curve and plotting more than 2 points in unknow samples. For examining the statistical limit of precision of measuremert, LH activity of gonadotropin in urine was measured and relative potency, precision coefficient and the upper and lower limits of relative potency at 95% confidence limit were calculated. On the other hand, bioassay (by the ovarian ascorbic acid reduction method and anteriol lobe of prostate weighing method) was done in the same samples, and the precision was compared with that of RIA. In these examinations, the upper and lower limits of the relative potency at 95% confidence limit were near each other, while in bioassay, a considerable difference was observed between the upper and lower limits. The necessity of standardization and systematization of the statistical procedures for increasing the precision of RIA was pointed out. (JA)« less

  20. Spectral analysis of groove spacing on Ganymede

    NASA Technical Reports Server (NTRS)

    Grimm, R. E.

    1984-01-01

    The technique used to analyze groove spacing on Ganymede is presented. Data from Voyager images are used determine the surface topography and position of the grooves. Power spectal estimates are statistically analyzed and sample data is included.

  1. Use of Whatman-41 filters in air quality sampling networks (with applications to elemental analysis)

    NASA Technical Reports Server (NTRS)

    Neustadter, H. E.; Sidik, S. M.; King, R. B.; Fordyce, J. S.; Burr, J. C.

    1974-01-01

    The operation of a 16-site parallel high volume air sampling network with glass fiber filters on one unit and Whatman-41 filters on the other is reported. The network data and data from several other experiments indicate that (1) Sampler-to-sampler and filter-to-filter variabilities are small; (2) hygroscopic affinity of Whatman-41 filters need not introduce errors; and (3) suspended particulate samples from glass fiber filters averaged slightly, but not statistically significantly, higher than from Whatman-41-filters. The results obtained demonstrate the practicability of Whatman-41 filters for air quality monitoring and elemental analysis.

  2. Difference between healthy children and ADHD based on wavelet spectral analysis of nuclear magnetic resonance images

    NASA Astrophysics Data System (ADS)

    González Gómez, Dulce I.; Moreno Barbosa, E.; Martínez Hernández, Mario Iván; Ramos Méndez, José; Hidalgo Tobón, Silvia; Dies Suarez, Pilar; Barragán Pérez, Eduardo; De Celis Alonso, Benito

    2014-11-01

    The main goal of this project was to create a computer algorithm based on wavelet analysis of region of homogeneity images obtained during resting state studies. Ideally it would automatically diagnose ADHD. Because the cerebellum is an area known to be affected by ADHD, this study specifically analysed this region. Male right handed volunteers (infants with ages between 7 and 11 years old) were studied and compared with age matched controls. Statistical differences between the values of the absolute integrated wavelet spectrum were found and showed significant differences (p<0.0015) between groups. This difference might help in the future to distinguish healthy from ADHD patients and therefore diagnose ADHD. Even if results were statistically significant, the small size of the sample limits the applicability of this methods as it is presented here, and further work with larger samples and using freely available datasets must be done.

  3. Anisotropic analysis of trabecular architecture in human femur bone radiographs using quaternion wavelet transforms.

    PubMed

    Sangeetha, S; Sujatha, C M; Manamalli, D

    2014-01-01

    In this work, anisotropy of compressive and tensile strength regions of femur trabecular bone are analysed using quaternion wavelet transforms. The normal and abnormal femur trabecular bone radiographic images are considered for this study. The sub-anatomic regions, which include compressive and tensile regions, are delineated using pre-processing procedures. These delineated regions are subjected to quaternion wavelet transforms and statistical parameters are derived from the transformed images. These parameters are correlated with apparent porosity, which is derived from the strength regions. Further, anisotropy is also calculated from the transformed images and is analyzed. Results show that the anisotropy values derived from second and third phase components of quaternion wavelet transform are found to be distinct for normal and abnormal samples with high statistical significance for both compressive and tensile regions. These investigations demonstrate that architectural anisotropy derived from QWT analysis is able to differentiate normal and abnormal samples.

  4. Statistical theory and methodology for remote sensing data analysis

    NASA Technical Reports Server (NTRS)

    Odell, P. L.

    1974-01-01

    A model is developed for the evaluation of acreages (proportions) of different crop-types over a geographical area using a classification approach and methods for estimating the crop acreages are given. In estimating the acreages of a specific croptype such as wheat, it is suggested to treat the problem as a two-crop problem: wheat vs. nonwheat, since this simplifies the estimation problem considerably. The error analysis and the sample size problem is investigated for the two-crop approach. Certain numerical results for sample sizes are given for a JSC-ERTS-1 data example on wheat identification performance in Hill County, Montana and Burke County, North Dakota. Lastly, for a large area crop acreages inventory a sampling scheme is suggested for acquiring sample data and the problem of crop acreage estimation and the error analysis is discussed.

  5. The Investigation of Construct Validity of Diagnostic and Statistical Manual of Mental Disorder-5 Personality Traits on Iranian sample with Antisocial and Borderline Personality Disorders.

    PubMed

    Amini, Mehdi; Pourshahbaz, Abbas; Mohammadkhani, Parvaneh; Ardakani, Mohammad-Reza Khodaie; Lotfi, Mozhgan

    2014-12-01

    The goal of this study was to examine the construct validity of the diagnostic and statistical manual of mental disorder-5 (DSM-5) conceptual model of antisocial and borderline personality disorders (PDs). More specifically, the aim was to determine whether the DSM-5 five-factor structure of pathological personality trait domains replicated in an independently collected sample that differs culturally from the derivation sample. This study was on a sample of 346 individuals with antisocial (n = 122) and borderline PD (n = 130), and nonclinical subjects (n = 94). Participants randomly selected from prisoners, out-patient, and in-patient clients. Participants were recruited from Tehran prisoners, and clinical psychology and psychiatry clinics of Razi and Taleghani Hospital, Tehran, Iran. The SCID-II-PQ, SCID-II, DSM-5 Personality Trait Rating Form (Clinician's PTRF) were used to diagnosis of PD and to assessment of pathological traits. The data were analyzed by exploratory factor analysis. Factor analysis revealed a 5-factor solution for DSM-5 personality traits. Results showed that DSM-5 has adequate construct validity in Iranian sample with antisocial and borderline PDs. Factors similar in number with the other studies, but different in the content. Exploratory factor analysis revealed five homogeneous components of antisocial and borderline PDs. That may represent personality, behavioral, and affective features central to the disorder. Furthermore, the present study helps understand the adequacy of DSM-5 dimensional approach to evaluation of personality pathology, specifically on Iranian sample.

  6. Proteomic Analysis of Matched Formalin-Fixed, Paraffin-Embedded Specimens in Patients with Advanced Serous Ovarian Carcinoma

    PubMed Central

    Smith, Ashlee L.; Sun, Mai; Bhargava, Rohit; Stewart, Nicolas A.; Flint, Melanie S.; Bigbee, William L.; Krivak, Thomas C.; Strange, Mary A.; Cooper, Kristine L.; Zorn, Kristin K.

    2013-01-01

    Objective: The biology of high grade serous ovarian carcinoma (HGSOC) is poorly understood. Little has been reported on intratumoral homogeneity or heterogeneity of primary HGSOC tumors and their metastases. We evaluated the global protein expression profiles of paired primary and metastatic HGSOC from formalin-fixed, paraffin-embedded (FFPE) tissue samples. Methods: After IRB approval, six patients with advanced HGSOC were identified with tumor in both ovaries at initial surgery. Laser capture microdissection (LCM) was used to extract tumor for protein digestion. Peptides were extracted and analyzed by reversed-phase liquid chromatography coupled to a linear ion trap mass spectrometer. Tandem mass spectra were searched against the UniProt human protein database. Differences in protein abundance between samples were assessed and analyzed by Ingenuity Pathway Analysis software. Immunohistochemistry (IHC) for select proteins from the original and an additional validation set of five patients was performed. Results: Unsupervised clustering of the abundance profiles placed the paired specimens adjacent to each other. IHC H-score analysis of the validation set revealed a strong correlation between paired samples for all proteins. For the similarly expressed proteins, the estimated correlation coefficients in two of three experimental samples and all validation samples were statistically significant (p < 0.05). The estimated correlation coefficients in the experimental sample proteins classified as differentially expressed were not statistically significant. Conclusion: A global proteomic screen of primary HGSOC tumors and their metastatic lesions identifies tumoral homogeneity and heterogeneity and provides preliminary insight into these protein profiles and the cellular pathways they constitute. PMID:28250404

  7. Comparative Analysis of Academic Grades in Compulsory Secondary Education in Spain Using Statistical Techniques

    ERIC Educational Resources Information Center

    Veas, Alejandro; Gilar, Raquel; Miñano, Pablo; Castejón, Juan Luis

    2017-01-01

    The present study, based on the construct comparability approach, performs a comparative analysis of general points average for seven courses, using exploratory factor analysis (EFA) and the Partial Credit model (PCM) with a sample of 1398 student subjects (M = 12.5, SD = 0.67) from 8 schools in the province of Alicante (Spain). EFA confirmed a…

  8. A Preliminary Bayesian Analysis of Incomplete Longitudinal Data from a Small Sample: Methodological Advances in an International Comparative Study of Educational Inequality

    ERIC Educational Resources Information Center

    Hsieh, Chueh-An; Maier, Kimberly S.

    2009-01-01

    The capacity of Bayesian methods in estimating complex statistical models is undeniable. Bayesian data analysis is seen as having a range of advantages, such as an intuitive probabilistic interpretation of the parameters of interest, the efficient incorporation of prior information to empirical data analysis, model averaging and model selection.…

  9. Comparison of Sample Size by Bootstrap and by Formulas Based on Normal Distribution Assumption.

    PubMed

    Wang, Zuozhen

    2018-01-01

    Bootstrapping technique is distribution-independent, which provides an indirect way to estimate the sample size for a clinical trial based on a relatively smaller sample. In this paper, sample size estimation to compare two parallel-design arms for continuous data by bootstrap procedure are presented for various test types (inequality, non-inferiority, superiority, and equivalence), respectively. Meanwhile, sample size calculation by mathematical formulas (normal distribution assumption) for the identical data are also carried out. Consequently, power difference between the two calculation methods is acceptably small for all the test types. It shows that the bootstrap procedure is a credible technique for sample size estimation. After that, we compared the powers determined using the two methods based on data that violate the normal distribution assumption. To accommodate the feature of the data, the nonparametric statistical method of Wilcoxon test was applied to compare the two groups in the data during the process of bootstrap power estimation. As a result, the power estimated by normal distribution-based formula is far larger than that by bootstrap for each specific sample size per group. Hence, for this type of data, it is preferable that the bootstrap method be applied for sample size calculation at the beginning, and that the same statistical method as used in the subsequent statistical analysis is employed for each bootstrap sample during the course of bootstrap sample size estimation, provided there is historical true data available that can be well representative of the population to which the proposed trial is planning to extrapolate.

  10. Quality of statistical reporting in developmental disability journals.

    PubMed

    Namasivayam, Aravind K; Yan, Tina; Wong, Wing Yiu Stephanie; van Lieshout, Pascal

    2015-12-01

    Null hypothesis significance testing (NHST) dominates quantitative data analysis, but its use is controversial and has been heavily criticized. The American Psychological Association has advocated the reporting of effect sizes (ES), confidence intervals (CIs), and statistical power analysis to complement NHST results to provide a more comprehensive understanding of research findings. The aim of this paper is to carry out a sample survey of statistical reporting practices in two journals with the highest h5-index scores in the areas of developmental disability and rehabilitation. Using a checklist that includes critical recommendations by American Psychological Association, we examined 100 randomly selected articles out of 456 articles reporting inferential statistics in the year 2013 in the Journal of Autism and Developmental Disorders (JADD) and Research in Developmental Disabilities (RDD). The results showed that for both journals, ES were reported only half the time (JADD 59.3%; RDD 55.87%). These findings are similar to psychology journals, but are in stark contrast to ES reporting in educational journals (73%). Furthermore, a priori power and sample size determination (JADD 10%; RDD 6%), along with reporting and interpreting precision measures (CI: JADD 13.33%; RDD 16.67%), were the least reported metrics in these journals, but not dissimilar to journals in other disciplines. To advance the science in developmental disability and rehabilitation and to bridge the research-to-practice divide, reforms in statistical reporting, such as providing supplemental measures to NHST, are clearly needed.

  11. What influences the choice of assessment methods in health technology assessments? Statistical analysis of international health technology assessments from 1989 to 2002.

    PubMed

    Draborg, Eva; Andersen, Christian Kronborg

    2006-01-01

    Health technology assessment (HTA) has been used as input in decision making worldwide for more than 25 years. However, no uniform definition of HTA or agreement on assessment methods exists, leaving open the question of what influences the choice of assessment methods in HTAs. The objective of this study is to analyze statistically a possible relationship between methods of assessment used in practical HTAs, type of assessed technology, type of assessors, and year of publication. A sample of 433 HTAs published by eleven leading institutions or agencies in nine countries was reviewed and analyzed by multiple logistic regression. The study shows that outsourcing of HTA reports to external partners is associated with a higher likelihood of using assessment methods, such as meta-analysis, surveys, economic evaluations, and randomized controlled trials; and with a lower likelihood of using assessment methods, such as literature reviews and "other methods". The year of publication was statistically related to the inclusion of economic evaluations and shows a decreasing likelihood during the year span. The type of assessed technology was related to economic evaluations with a decreasing likelihood, to surveys, and to "other methods" with a decreasing likelihood when pharmaceuticals were the assessed type of technology. During the period from 1989 to 2002, no major developments in assessment methods used in practical HTAs were shown statistically in a sample of 433 HTAs worldwide. Outsourcing to external assessors has a statistically significant influence on choice of assessment methods.

  12. A Non-Intrusive Algorithm for Sensitivity Analysis of Chaotic Flow Simulations

    NASA Technical Reports Server (NTRS)

    Blonigan, Patrick J.; Wang, Qiqi; Nielsen, Eric J.; Diskin, Boris

    2017-01-01

    We demonstrate a novel algorithm for computing the sensitivity of statistics in chaotic flow simulations to parameter perturbations. The algorithm is non-intrusive but requires exposing an interface. Based on the principle of shadowing in dynamical systems, this algorithm is designed to reduce the effect of the sampling error in computing sensitivity of statistics in chaotic simulations. We compare the effectiveness of this method to that of the conventional finite difference method.

  13. Seven ways to increase power without increasing N.

    PubMed

    Hansen, W B; Collins, L M

    1994-01-01

    Many readers of this monograph may wonder why a chapter on statistical power was included. After all, by now the issue of statistical power is in many respects mundane. Everyone knows that statistical power is a central research consideration, and certainly most National Institute on Drug Abuse grantees or prospective grantees understand the importance of including a power analysis in research proposals. However, there is ample evidence that, in practice, prevention researchers are not paying sufficient attention to statistical power. If they were, the findings observed by Hansen (1992) in a recent review of the prevention literature would not have emerged. Hansen (1992) examined statistical power based on 46 cohorts followed longitudinally, using nonparametric assumptions given the subjects' age at posttest and the numbers of subjects. Results of this analysis indicated that, in order for a study to attain 80-percent power for detecting differences between treatment and control groups, the difference between groups at posttest would need to be at least 8 percent (in the best studies) and as much as 16 percent (in the weakest studies). In order for a study to attain 80-percent power for detecting group differences in pre-post change, 22 of the 46 cohorts would have needed relative pre-post reductions of greater than 100 percent. Thirty-three of the 46 cohorts had less than 50-percent power to detect a 50-percent relative reduction in substance use. These results are consistent with other review findings (e.g., Lipsey 1990) that have shown a similar lack of power in a broad range of research topics. Thus, it seems that, although researchers are aware of the importance of statistical power (particularly of the necessity for calculating it when proposing research), they somehow are failing to end up with adequate power in their completed studies. This chapter argues that the failure of many prevention studies to maintain adequate statistical power is due to an overemphasis on sample size (N) as the only, or even the best, way to increase statistical power. It is easy to see how this overemphasis has come about. Sample size is easy to manipulate, has the advantage of being related to power in a straight-forward way, and usually is under the direct control of the researcher, except for limitations imposed by finances or subject availability. Another option for increasing power is to increase the alpha used for hypothesis-testing but, as very few researchers seriously consider significance levels much larger than the traditional .05, this strategy seldom is used. Of course, sample size is important, and the authors of this chapter are not recommending that researchers cease choosing sample sizes carefully. Rather, they argue that researchers should not confine themselves to increasing N to enhance power. It is important to take additional measures to maintain and improve power over and above making sure the initial sample size is sufficient. The authors recommend two general strategies. One strategy involves attempting to maintain the effective initial sample size so that power is not lost needlessly. The other strategy is to take measures to maximize the third factor that determines statistical power: effect size.

  14. QNB: differential RNA methylation analysis for count-based small-sample sequencing data with a quad-negative binomial model.

    PubMed

    Liu, Lian; Zhang, Shao-Wu; Huang, Yufei; Meng, Jia

    2017-08-31

    As a newly emerged research area, RNA epigenetics has drawn increasing attention recently for the participation of RNA methylation and other modifications in a number of crucial biological processes. Thanks to high throughput sequencing techniques, such as, MeRIP-Seq, transcriptome-wide RNA methylation profile is now available in the form of count-based data, with which it is often of interests to study the dynamics at epitranscriptomic layer. However, the sample size of RNA methylation experiment is usually very small due to its costs; and additionally, there usually exist a large number of genes whose methylation level cannot be accurately estimated due to their low expression level, making differential RNA methylation analysis a difficult task. We present QNB, a statistical approach for differential RNA methylation analysis with count-based small-sample sequencing data. Compared with previous approaches such as DRME model based on a statistical test covering the IP samples only with 2 negative binomial distributions, QNB is based on 4 independent negative binomial distributions with their variances and means linked by local regressions, and in the way, the input control samples are also properly taken care of. In addition, different from DRME approach, which relies only the input control sample only for estimating the background, QNB uses a more robust estimator for gene expression by combining information from both input and IP samples, which could largely improve the testing performance for very lowly expressed genes. QNB showed improved performance on both simulated and real MeRIP-Seq datasets when compared with competing algorithms. And the QNB model is also applicable to other datasets related RNA modifications, including but not limited to RNA bisulfite sequencing, m 1 A-Seq, Par-CLIP, RIP-Seq, etc.

  15. Assessment of groundwater quality of Ballia district, Uttar Pradesh, India, with reference to arsenic contamination using multivariate statistical analysis

    NASA Astrophysics Data System (ADS)

    Singh, Asha Lata; Singh, Vipin Kumar

    2018-06-01

    A total of 22 water quality parameters were selected for the analysis of groundwater samples with reference to arsenic contamination. Samples were collected in the pre-monsoon and monsoon seasons of the year 2013. The maximum arsenic concentration in both the pre-monsoon and monsoon seasons was approximately the same, i.e., the maximum arsenic concentration being 75.60 and 74.46 µg/L in pre-monsoon and monsoon, respectively. Out of 72 collected samples, three were below the WHO guideline value of 10 µg/L for arsenic concentration. In 95.83% of the groundwater samples, the arsenic concentration was above the permissible limit. Nickel, manganese, and chromium concentrations were above the permissible limits in nearly all samples except for chromium concentration in a few pre-monsoon samples. However, the total iron concentrations in 23 samples (31.94%) were above the permissible limit. A total of six and seven principal components (PCs) were extracted using principal component analysis during the pre-monsoon and monsoon seasons, respectively, accounting for 76.25 and 78.52% of the total variation during two consecutive seasons. Correlation statistics revealed that the arsenic concentration was positively correlated with phosphate, iron, ammonium, bicarbonate, and manganese concentrations but negatively correlated with oxidation reduction potential (ORP), sulfate concentration, electrical conductivity, and total dissolved solids concentration. The negative correlation of arsenic with ORP suggested reducing conditions prevailing in the groundwater. The trilinear Piper diagram revealed calcium and magnesium enrichment of groundwater with an abundance of chloride ions but no predominance of bicarbonate ions. Thus, the groundwater fell into Ca2+ - Mg2+ - Cl- - SO4 2- category.

  16. Trends in groundwater quality in principal aquifers of the United States, 1988-2012

    USGS Publications Warehouse

    Lindsey, Bruce D.; Rupert, Michael G.

    2014-01-01

    The U.S. Geological Survey (USGS) National Water-Quality Assessment (NAWQA) Program analyzed trends in groundwater quality throughout the nation for the sampling period of 1988-2012. Trends were determined for networks (sets of wells routinely monitored by the USGS) for a subset of constituents by statistical analysis of paired water-quality measurements collected on a near-decadal time scale. The data set for chloride, dissolved solids, and nitrate consisted of 1,511 wells in 67 networks, whereas the data set for methyl tert-butyl ether (MTBE) consisted of 1, 013 wells in 46 networks. The 25 principal aquifers represented by these networks account for about 75 percent of withdrawals of groundwater used for drinking-water supply for the nation. Statistically significant changes in chloride, dissolved-solids, or nitrate concentrations were found in many well networks over a decadal period. Concentrations increased significantly in 48 percent of networks for chloride, 42 percent of networks for dissolved solids, and 21 percent of networks for nitrate. Chloride, dissolved solids, and nitrate concentrations decreased significantly in 3, 3, and 10 percent of the networks, respectively. The magnitude of change in concentrations was typically small in most networks; however, the magnitude of change in networks with statistically significant increases was typically much larger than the magnitude of change in networks with statistically significant decreases. The largest increases of chloride concentrations were in urban areas in the northeastern and north central United States. The largest increases of nitrate concentrations were in networks in agricultural areas. Statistical analysis showed 42 or the 46 networks had no statistically significant changes in MTBE concentrations. The four networks with statistically significant changes in MTBE concentrations were in the northeastern United States, where MTBE was widely used. Two networks had increasing concentrations, and two networks had decreasing concentrations. Production and use of MTBE peaked in about 2000 and has been effectively banned in many areas since about 2006. The two networks that had increasing concentrations were sampled for the second time close to the peak of MTBE production, whereas the two networks that had decreasing concentrations were sampled for the second time 10 years after the peak of MTBE production.

  17. 45 CFR 160.536 - Statistical sampling.

    Code of Federal Regulations, 2010 CFR

    2010-10-01

    ... 45 Public Welfare 1 2010-10-01 2010-10-01 false Statistical sampling. 160.536 Section 160.536... REQUIREMENTS GENERAL ADMINISTRATIVE REQUIREMENTS Procedures for Hearings § 160.536 Statistical sampling. (a) In... statistical sampling study as evidence of the number of violations under § 160.406 of this part, or the...

  18. 42 CFR 1003.133 - Statistical sampling.

    Code of Federal Regulations, 2011 CFR

    2011-10-01

    ... 42 Public Health 5 2011-10-01 2011-10-01 false Statistical sampling. 1003.133 Section 1003.133... AUTHORITIES CIVIL MONEY PENALTIES, ASSESSMENTS AND EXCLUSIONS § 1003.133 Statistical sampling. (a) In meeting... statistical sampling study as evidence of the number and amount of claims and/or requests for payment as...

  19. 45 CFR 160.536 - Statistical sampling.

    Code of Federal Regulations, 2011 CFR

    2011-10-01

    ... 45 Public Welfare 1 2011-10-01 2011-10-01 false Statistical sampling. 160.536 Section 160.536... REQUIREMENTS GENERAL ADMINISTRATIVE REQUIREMENTS Procedures for Hearings § 160.536 Statistical sampling. (a) In... statistical sampling study as evidence of the number of violations under § 160.406 of this part, or the...

  20. 42 CFR 1003.133 - Statistical sampling.

    Code of Federal Regulations, 2010 CFR

    2010-10-01

    ... 42 Public Health 5 2010-10-01 2010-10-01 false Statistical sampling. 1003.133 Section 1003.133... AUTHORITIES CIVIL MONEY PENALTIES, ASSESSMENTS AND EXCLUSIONS § 1003.133 Statistical sampling. (a) In meeting... statistical sampling study as evidence of the number and amount of claims and/or requests for payment as...

  1. 42 CFR 405.1064 - ALJ decisions involving statistical samples.

    Code of Federal Regulations, 2011 CFR

    2011-10-01

    ... 42 Public Health 2 2011-10-01 2011-10-01 false ALJ decisions involving statistical samples. 405... Medicare Coverage Policies § 405.1064 ALJ decisions involving statistical samples. When an appeal from the QIC involves an overpayment issue and the QIC used a statistical sample in reaching its...

  2. 42 CFR 405.1064 - ALJ decisions involving statistical samples.

    Code of Federal Regulations, 2010 CFR

    2010-10-01

    ... 42 Public Health 2 2010-10-01 2010-10-01 false ALJ decisions involving statistical samples. 405... Medicare Coverage Policies § 405.1064 ALJ decisions involving statistical samples. When an appeal from the QIC involves an overpayment issue and the QIC used a statistical sample in reaching its...

  3. Accuracy of metric sex analysis of skeletal remains using Fordisc based on a recent skull collection.

    PubMed

    Ramsthaler, F; Kreutz, K; Verhoff, M A

    2007-11-01

    It has been generally accepted in skeletal sex determination that the use of metric methods is limited due to the population dependence of the multivariate algorithms. The aim of the study was to verify the applicability of software-based sex estimations outside the reference population group for which discriminant equations have been developed. We examined 98 skulls from recent forensic cases of known age, sex, and Caucasian ancestry from cranium collections in Frankfurt and Mainz (Germany) to determine the accuracy of sex determination using the statistical software solution Fordisc which derives its database and functions from the US American Forensic Database. In a comparison between metric analysis using Fordisc and morphological determination of sex, average accuracy for both sexes was 86 vs 94%, respectively, and males were identified more accurately than females. The ratio of the true test result rate to the false test result rate was not statistically different for the two methodological approaches at a significance level of 0.05 but was statistically different at a level of 0.10 (p=0.06). Possible explanations for this difference comprise different ancestry, age distribution, and socio-economic status compared to the Fordisc reference sample. It is likely that a discriminant function analysis on the basis of more similar European reference samples will lead to more valid and reliable sexing results. The use of Fordisc as a single method for the estimation of sex of recent skeletal remains in Europe cannot be recommended without additional morphological assessment and without a built-in software update based on modern European reference samples.

  4. A comprehensive review of arsenic levels in the semiconductor manufacturing industry.

    PubMed

    Park, Donguk; Yang, Haengsun; Jeong, Jeeyeon; Ha, Kwonchul; Choi, Sangjun; Kim, Chinyon; Yoon, Chungsik; Park, Dooyong; Paek, Domyung

    2010-11-01

    This paper presents a summary of arsenic level statistics from air and wipe samples taken from studies conducted in fabrication operations. The main objectives of this study were not only to describe arsenic measurement data but also, through a literature review, to categorize fabrication workers in accordance with observed arsenic levels. All airborne arsenic measurements reported were included in the summary statistics for analysis of the measurement data. The arithmetic mean was estimated assuming a lognormal distribution from the geometric mean and the geometric standard deviation or the range. In addition, weighted arithmetic means (WAMs) were calculated based on the number of measurements reported for each mean. Analysis of variance (ANOVA) was employed to compare arsenic levels classified according to several categories such as the year, sampling type, location sampled, operation type, and cleaning technique. Nine papers were found reporting airborne arsenic measurement data from maintenance workers or maintenance areas in semiconductor chip-making plants. A total of 40 statistical summaries from seven articles were identified that represented a total of 423 airborne arsenic measurements. Arsenic exposure levels taken during normal operating activities in implantation operations (WAM = 1.6 μg m⁻³, no. of samples = 77, no. of statistical summaries = 2) were found to be lower than exposure levels of engineers who were involved in maintenance works (7.7 μg m⁻³, no. of samples = 181, no. of statistical summaries = 19). The highest level (WAM = 218.6 μg m⁻³) was associated with various maintenance works performed inside an ion implantation chamber. ANOVA revealed no significant differences in the WAM arsenic levels among the categorizations based on operation and sampling characteristics. Arsenic levels (56.4 μg m⁻³) recorded during maintenance works performed in dry conditions were found to be much higher than those from maintenance works in wet conditions (0.6 μg m⁻³). Arsenic levels from wipe samples in process areas after maintenance activities ranged from non-detectable to 146 μg cm⁻², indicating the potential for dispersion into the air and hence inhalation. We conclude that workers who are regularly or occasionally involved in maintenance work have higher potential for occupational exposure than other employees who are in charge of routine production work. In addition, fabrication workers can be classified into two groups based on the reviewed arsenic exposure levels: operators with potential for low levels of exposure and maintenance engineers with high levels of exposure. These classifications could be used as a basis for a qualitative ordinal ranking of exposure in an epidemiological study.

  5. Identification of the country of growth of Sophora flavescens using direct analysis in real time mass spectrometry (DART-MS).

    PubMed

    Fukuda, Eriko; Uesawa, Yoshihiro; Baba, Masaki; Suzuki, Ryuichiro; Fukuda, Tatsuo; Shirataki, Yoshiaki; Okada, Yoshihito

    2014-11-01

    In order to identify the country of growth of Sophora flavescens by chemical fingerprinting, extracts of plants grown in China and Japan were analyzed using direct analysis in real time mass spectrometry (DART)-MS. The peaks characteristic of each country of growth were statistically analyzed using a volcano plot to summarize the relationship between the p-values of a statistical test and the magnitude of the difference in the peak intensities of the samples in the groups. Peaks with ap value < 0.05 in the t-test and a ≥ 2 absolute difference were defined as characteristic. Peaks characteristic of Chinese S. flavescens were found at m/z 439 and 440. In contrast, peaks characteristic of Japanese S. flavescens were found at m/z 313, 423, 437 and 441. The intensity of the selected peaks was similar in Japanese samples, whereas the m/z 439 peak had a significantly higher intensity than the other peaks in Chinese samples. Therefore, differences in selected peak patterns may allow identification of the country of growth of S. flavescens.

  6. National mandatory motorcycle helmet laws may save $2.2 billion annually: An inpatient and value of statistical life analysis.

    PubMed

    Dua, Anahita; Wei, Shuyan; Safarik, Justin; Furlough, Courtney; Desai, Sapan S

    2015-06-01

    While statistics exist regarding the overall rate of fatalities in motorcyclists with and without helmets, a combined inpatient and value of statistical life (VSL) analysis has not previously been reported. Statistical data of motorcycle collisions were obtained from the Centers for Disease Control, National Highway Transportation Safety Board, and Governors Highway Safety Association. The VSL estimate was obtained from the 2002 Department of Transportation calculation. Statistics on helmeted versus nonhelmeted motorcyclists, death at the scene, and inpatient death were obtained using the 2010 National Trauma Data Bank. Inpatient costs were obtained from the 2010 National Inpatient Sample. Population estimates were generated using weighted samples, and all costs are reported using 2010 US dollars using the Consumer Price Index. A total of 3,951 fatal motorcycle collisions were reported in 2010, of which 77% of patients died at the scene, 10% in the emergency department, and 13% as inpatients. Thirty-seven percent of all riders did not wear a helmet but accounted for 69% of all deaths. Of those motorcyclists who survived to the hospital, the odds ratio of surviving with a helmet was 1.51 compared with those without a helmet (p < 0.001). Total costs for nonhelmeted motorcyclists were 66% greater at $5.5 billion, compared with $3.3 billion for helmeted motorcyclists (p < 0.001). Direct inpatient costs were 16% greater for helmeted riders ($203,248 vs. $175,006) but led to more than 50% greater VSL generated (absolute benefit, $602,519 per helmeted survivor). A cost analysis of inpatient care and indirect costs of motorcycle riders who do not wear helmets leads to nearly $2.2 billion in losses per year, with almost 1.9 times as many deaths compared with helmeted motorcyclists. The per capita cost per fatality is more than $800,000. Institution of a mandatory helmet law could lead to an annual cost savings of almost $2.2 billion. Economic analysis, level III.

  7. Forensic Comparison of Soil Samples Using Nondestructive Elemental Analysis.

    PubMed

    Uitdehaag, Stefan; Wiarda, Wim; Donders, Timme; Kuiper, Irene

    2017-07-01

    Soil can play an important role in forensic cases in linking suspects or objects to a crime scene by comparing samples from the crime scene with samples derived from items. This study uses an adapted ED-XRF analysis (sieving instead of grinding to prevent destruction of microfossils) to produce elemental composition data of 20 elements. Different data processing techniques and statistical distances were evaluated using data from 50 samples and the log-LR cost (C llr ). The best performing combination, Canberra distance, relative data, and square root values, is used to construct a discriminative model. Examples of the spatial resolution of the method in crime scenes are shown for three locations, and sampling strategy is discussed. Twelve test cases were analyzed, and results showed that the method is applicable. The study shows how the combination of an analysis technique, a database, and a discriminative model can be used to compare multiple soil samples quickly. © 2016 American Academy of Forensic Sciences.

  8. Comparison of two headspace sampling techniques for the analysis of off-flavour volatiles from oat based products.

    PubMed

    Cognat, Claudine; Shepherd, Tom; Verrall, Susan R; Stewart, Derek

    2012-10-01

    Two different headspace sampling techniques were compared for analysis of aroma volatiles from freshly produced and aged plain oatcakes. Solid phase microextraction (SPME) using a Carboxen-Polydimethylsiloxane (PDMS) fibre and entrainment on Tenax TA within an adsorbent tube were used for collection of volatiles. The effects of variation in the sampling method were also considered using SPME. The data obtained using both techniques were processed by multivariate statistical analysis (PCA). Both techniques showed similar capacities to discriminate between the samples at different ages. Discrimination between fresh and rancid samples could be made on the basis of changes in the relative abundances of 14-15 of the constituents in the volatile profiles. A significant effect on the detection level of volatile compounds was observed when samples were crushed and analysed by SPME-GC-MS, in comparison to undisturbed product. The applicability and cost effectiveness of both methods were considered. Copyright © 2012 Elsevier Ltd. All rights reserved.

  9. Just add water: Accuracy of analysis of diluted human milk samples using mid-infrared spectroscopy.

    PubMed

    Smith, R W; Adamkin, D H; Farris, A; Radmacher, P G

    2017-01-01

    To determine the maximum dilution of human milk (HM) that yields reliable results for protein, fat and lactose when analyzed by mid-infrared spectroscopy. De-identified samples of frozen HM were obtained. Milk was thawed and warmed (40°C) prior to analysis. Undiluted (native) HM was analyzed by mid-infrared spectroscopy for macronutrient composition: total protein (P), fat (F), carbohydrate (C); Energy (E) was calculated from the macronutrient results. Subsequent analyses were done with 1 : 2, 1 : 3, 1 : 5 and 1 : 10 dilutions of each sample with distilled water. Additional samples were sent to a certified lab for external validation. Quantitatively, F and P showed statistically significant but clinically non-critical differences in 1 : 2 and 1 : 3 dilutions. Differences at higher dilutions were statistically significant and deviated from native values enough to render those dilutions unreliable. External validation studies also showed statistically significant but clinically unimportant differences at 1 : 2 and 1 : 3 dilutions. The Calais Human Milk Analyzer can be used with HM samples diluted 1 : 2 and 1 : 3 and return results within 5% of values from undiluted HM. At a 1 : 5 or 1 : 10 dilution, however, results vary as much as 10%, especially with P and F. At the 1 : 2 and 1 : 3 dilutions these differences appear to be insignificant in the context of nutritional management. However, the accuracy and reliability of the 1 : 5 and 1 : 10 dilutions are questionable.

  10. Statistical analysis of hydrological response in urbanising catchments based on adaptive sampling using inter-amount times

    NASA Astrophysics Data System (ADS)

    ten Veldhuis, Marie-Claire; Schleiss, Marc

    2017-04-01

    Urban catchments are typically characterised by a more flashy nature of the hydrological response compared to natural catchments. Predicting flow changes associated with urbanisation is not straightforward, as they are influenced by interactions between impervious cover, basin size, drainage connectivity and stormwater management infrastructure. In this study, we present an alternative approach to statistical analysis of hydrological response variability and basin flashiness, based on the distribution of inter-amount times. We analyse inter-amount time distributions of high-resolution streamflow time series for 17 (semi-)urbanised basins in North Carolina, USA, ranging from 13 to 238 km2 in size. We show that in the inter-amount-time framework, sampling frequency is tuned to the local variability of the flow pattern, resulting in a different representation and weighting of high and low flow periods in the statistical distribution. This leads to important differences in the way the distribution quantiles, mean, coefficient of variation and skewness vary across scales and results in lower mean intermittency and improved scaling. Moreover, we show that inter-amount-time distributions can be used to detect regulation effects on flow patterns, identify critical sampling scales and characterise flashiness of hydrological response. The possibility to use both the classical approach and the inter-amount-time framework to identify minimum observable scales and analyse flow data opens up interesting areas for future research.

  11. A Class of Population Covariance Matrices in the Bootstrap Approach to Covariance Structure Analysis

    ERIC Educational Resources Information Center

    Yuan, Ke-Hai; Hayashi, Kentaro; Yanagihara, Hirokazu

    2007-01-01

    Model evaluation in covariance structure analysis is critical before the results can be trusted. Due to finite sample sizes and unknown distributions of real data, existing conclusions regarding a particular statistic may not be applicable in practice. The bootstrap procedure automatically takes care of the unknown distribution and, for a given…

  12. An Analysis of Construction Contractor Performance Evaluation System

    DTIC Science & Technology

    2009-03-01

    65 8. Summary of Determinant and KMO Values for Finalized...principle component analysis output is the KMO and Bartlett‘s Test. KMO or Kaiser-Meyer-Olkin measure of sampling adequacy is used to identify if a...set of variables, when factored together, yield distinct and reliable factors (Field, 2005). KMO statistics vary between values of 0 to 1. Kaiser

  13. Water-quality trends and constituent-transport analysis for selected sampling sites in the Milltown Reservoir/Clark Fork River Superfund Site in the upper Clark Fork Basin, Montana, water years 1996–2015

    USGS Publications Warehouse

    Sando, Steven K.; Vecchia, Aldo V.

    2016-07-20

    During the extended history of mining in the upper Clark Fork Basin in Montana, large amounts of waste materials enriched with metallic contaminants (cadmium, copper, lead, and zinc) and the metalloid trace element arsenic were generated from mining operations near Butte and milling and smelting operations near Anaconda. Extensive deposition of mining wastes in the Silver Bow Creek and Clark Fork channels and flood plains had substantial effects on water quality. Federal Superfund remediation activities in the upper Clark Fork Basin began in 1983 and have included substantial remediation near Butte and removal of the former Milltown Dam near Missoula. To aid in evaluating the effects of remediation activities on water quality, the U.S. Geological Survey began collecting streamflow and water-quality data in the upper Clark Fork Basin in the 1980s.Trend analysis was done on specific conductance, selected trace elements (arsenic, copper, and zinc), and suspended sediment for seven sampling sites in the Milltown Reservoir/Clark Fork River Superfund Site for water years 1996–2015. The most upstream site included in trend analysis is Silver Bow Creek at Warm Springs, Montana (sampling site 8), and the most downstream site is Clark Fork above Missoula, Montana (sampling site 22), which is just downstream from the former Milltown Dam. Water year is the 12-month period from October 1 through September 30 and is designated by the year in which it ends. Trend analysis was done by using a joint time-series model for concentration and streamflow. To provide temporal resolution of changes in water quality, trend analysis was conducted for four sequential 5-year periods: period 1 (water years 1996–2000), period 2 (water years 2001–5), period 3 (water years 2006–10), and period 4 (water years 2011–15). Because of the substantial effect of the intentional breach of Milltown Dam on March 28, 2008, period 3 was subdivided into period 3A (October 1, 2005–March 27, 2008) and period 3B (March 28, 2008–September 30, 2010) for the Clark Fork above Missoula (sampling site 22). Trend results were considered statistically significant when the statistical probability level was less than 0.01.In conjunction with the trend analysis, estimated normalized constituent loads (hereinafter referred to as “loads”) were calculated and presented within the framework of a constituent-transport analysis to assess the temporal trends in flow-adjusted concentrations (FACs) in the context of sources and transport. The transport analysis allows assessment of temporal changes in relative contributions from upstream source areas to loads transported past each reach outflow.Trend results indicate that FACs of unfiltered-recoverable copper decreased at the sampling sites from the start of period 1 through the end of period 4; the decreases ranged from large for one sampling site (Silver Bow Creek at Warm Springs [sampling site 8]) to moderate for two sampling sites (Clark Fork near Galen, Montana [sampling site 11] and Clark Fork above Missoula [sampling site 22]) to small for four sampling sites (Clark Fork at Deer Lodge, Montana [sampling site 14], Clark Fork at Goldcreek, Montana [sampling site 16], Clark Fork near Drummond, Montana [sampling site 18], and Clark Fork at Turah Bridge near Bonner, Montana [sampling site 20]). For period 4 (water years 2011–15), the most notable changes indicated for the Milltown Reservoir/Clark Fork River Superfund Site were statistically significant decreases in FACs and loads of unfiltered-recoverable copper for sampling sites 8 and 22. The period 4 changes in FACs of unfiltered-recoverable copper for all other sampling sites were not statistically significant.Trend results indicate that FACs of unfiltered-recoverable arsenic decreased at the sampling sites from period 1 through period 4 (water years 1996–2015); the decreases ranged from minor (sampling sites 8–20) to small (sampling site 22). For period 4 (water years 2011–15), the most notable changes indicated for the Milltown Reservoir/Clark Fork River Superfund Site were statistically significant decreases in FACs and loads of unfiltered-recoverable arsenic for sampling site 8 and near statistically significant decreases for sampling site 22. The period 4 changes in FACs of unfiltered-recoverable arsenic for all other sampling sites were not statistically significant.Trend results indicate that FACs of suspended sediment decreased at the sampling sites from period 1 through period 4 (water years 1996–2015); the decreases ranged from moderate (sampling site 8) to small (sampling sites 11–22). For period 4 (water years 2011–15), the changes in FACs of suspended sediment were not statistically significant for any sampling sites.The reach of the Clark Fork from Galen to Deer Lodge is a large source of metallic contaminants and suspended sediment, which strongly affects downstream transport of those constituents. Mobilization of copper and suspended sediment from flood-plain tailings and the streambed of the Clark Fork and its tributaries within the reach results in a contribution of those constituents that is proportionally much larger than the contribution of streamflow from within the reach. Within the reach from Galen to Deer Lodge, unfiltered-recoverable copper loads increased by a factor of about 4 and suspended-sediment loads increased by a factor of about 5, whereas streamflow increased by a factor of slightly less than 2. For period 4 (water years 2011–15), unfiltered-recoverable copper and suspended-sediment loads sourced from within the reach accounted for about 41 and 14 percent, respectively, of the loads at Clark Fork above Missoula (sampling site 22), whereas streamflow sourced from within the reach accounted for about 4 percent of the streamflow at sampling site 22. During water years 1996–2015, decreases in FACs and loads of unfiltered-recoverable copper and suspended sediment for the reach generally were proportionally smaller than for most other reaches.Unfiltered-recoverable copper loads sourced within the reaches of the Clark Fork between Deer Lodge and Turah Bridge near Bonner (just upstream from the former Milltown Dam) were proportionally smaller than contributions of streamflow sourced from within the reaches; these reaches contributed proportionally much less to copper loading in the Clark Fork than the reach between Galen and Deer Lodge. Although substantial decreases in FACs and loads of unfiltered-recoverable copper and suspended sediment were indicated for Silver Bow Creek at Warm Springs (sampling site 8), those substantial decreases were not translated to downstream reaches between Deer Lodge and Turah Bridge near Bonner. The effect of the reach of the Clark Fork from Galen to Deer Lodge as a large source of copper and suspended sediment, in combination with little temporal change in those constituents for the reach, contributes to this pattern.With the removal of the former Milltown Dam in 2008, substantial amounts of contaminated sediments that remained in the Clark Fork channel and flood plain in reach 9 (downstream from Turah Bridge near Bonner) became more available for mobilization and transport than before the dam removal. After the removal of the former Milltown Dam, the Clark Fork above Missoula (sampling site 22) had statistically significant decreases in FACs of unfiltered-recoverable copper in period 3B (March 28, 2008, through water year 2010) that continued in period 4 (water years 2011–15). Also, decreases in FACs of unfiltered-recoverable arsenic and suspended sediment were indicated for period 4 at this site. The decrease in FACs of unfiltered-recoverable copper for sampling site 22 during period 4 was proportionally much larger than the decrease for the Clark Fork at Turah Bridge near Bonner (sampling site 20). Net mobilization of unfiltered-recoverable copper and arsenic from sources within reach 9 are smaller for period 4 than for period 1 when the former Milltown Dam was in place, providing evidence that contaminant source materials have been substantially reduced in reach 9.

  14. Preservation of Mercury in Polyethylene Containers.

    ERIC Educational Resources Information Center

    Piccolino, Samuel Paul

    1983-01-01

    Reports results of experiments favoring use of 0.5 percent nitric acid with an oxidant (potassium dichromate or potassium permanganate) to preserve samples in polyethylene containers for mercury analysis. Includes procedures used and statistical data obtained from the experiments. (JN)

  15. Spectral and cross-spectral analysis of uneven time series with the smoothed Lomb-Scargle periodogram and Monte Carlo evaluation of statistical significance

    NASA Astrophysics Data System (ADS)

    Pardo-Igúzquiza, Eulogio; Rodríguez-Tovar, Francisco J.

    2012-12-01

    Many spectral analysis techniques have been designed assuming sequences taken with a constant sampling interval. However, there are empirical time series in the geosciences (sediment cores, fossil abundance data, isotope analysis, …) that do not follow regular sampling because of missing data, gapped data, random sampling or incomplete sequences, among other reasons. In general, interpolating an uneven series in order to obtain a succession with a constant sampling interval alters the spectral content of the series. In such cases it is preferable to follow an approach that works with the uneven data directly, avoiding the need for an explicit interpolation step. The Lomb-Scargle periodogram is a popular choice in such circumstances, as there are programs available in the public domain for its computation. One new computer program for spectral analysis improves the standard Lomb-Scargle periodogram approach in two ways: (1) It explicitly adjusts the statistical significance to any bias introduced by variance reduction smoothing, and (2) it uses a permutation test to evaluate confidence levels, which is better suited than parametric methods when neighbouring frequencies are highly correlated. Another novel program for cross-spectral analysis offers the advantage of estimating the Lomb-Scargle cross-periodogram of two uneven time series defined on the same interval, and it evaluates the confidence levels of the estimated cross-spectra by a non-parametric computer intensive permutation test. Thus, the cross-spectrum, the squared coherence spectrum, the phase spectrum, and the Monte Carlo statistical significance of the cross-spectrum and the squared-coherence spectrum can be obtained. Both of the programs are written in ANSI Fortran 77, in view of its simplicity and compatibility. The program code is of public domain, provided on the website of the journal (http://www.iamg.org/index.php/publisher/articleview/frmArticleID/112/). Different examples (with simulated and real data) are described in this paper to corroborate the methodology and the implementation of these two new programs.

  16. PIXE analysis of cystic fibrosis sweat samples with an external proton beam

    NASA Astrophysics Data System (ADS)

    Sommer, F.; Massonnet, B.

    1987-03-01

    PIXE analysis with an external proton beam is used to study, in four control and five cystic fibrosis children, the elemental composition of sweat samples collected from different parts of the body during entire body hyperthermia. We observe no significant difference of sweat rates and of temperature variations between the two groups during sweat test. The statistical study of results obtained by PIXE analysis allows us to pick out amongst 8 elements studied, 6 elements (Na, Cl, Ca, Mn, Cu, Br) significatively different between the two groups of subjects. Using regression analysis, Na, Cl and Br concentrations could be used in a predictive equation of the state of health.

  17. The skeletal maturation status estimated by statistical shape analysis: axial images of Japanese cervical vertebra.

    PubMed

    Shin, S M; Kim, Y-I; Choi, Y-S; Yamaguchi, T; Maki, K; Cho, B-H; Park, S-B

    2015-01-01

    To evaluate axial cervical vertebral (ACV) shape quantitatively and to build a prediction model for skeletal maturation level using statistical shape analysis for Japanese individuals. The sample included 24 female and 19 male patients with hand-wrist radiographs and CBCT images. Through generalized Procrustes analysis and principal components (PCs) analysis, the meaningful PCs were extracted from each ACV shape and analysed for the estimation regression model. Each ACV shape had meaningful PCs, except for the second axial cervical vertebra. Based on these models, the smallest prediction intervals (PIs) were from the combination of the shape space PCs, age and gender. Overall, the PIs of the male group were smaller than those of the female group. There was no significant correlation between centroid size as a size factor and skeletal maturation level. Our findings suggest that the ACV maturation method, which was applied by statistical shape analysis, could confirm information about skeletal maturation in Japanese individuals as an available quantifier of skeletal maturation and could be as useful a quantitative method as the skeletal maturation index.

  18. The skeletal maturation status estimated by statistical shape analysis: axial images of Japanese cervical vertebra

    PubMed Central

    Shin, S M; Choi, Y-S; Yamaguchi, T; Maki, K; Cho, B-H; Park, S-B

    2015-01-01

    Objectives: To evaluate axial cervical vertebral (ACV) shape quantitatively and to build a prediction model for skeletal maturation level using statistical shape analysis for Japanese individuals. Methods: The sample included 24 female and 19 male patients with hand–wrist radiographs and CBCT images. Through generalized Procrustes analysis and principal components (PCs) analysis, the meaningful PCs were extracted from each ACV shape and analysed for the estimation regression model. Results: Each ACV shape had meaningful PCs, except for the second axial cervical vertebra. Based on these models, the smallest prediction intervals (PIs) were from the combination of the shape space PCs, age and gender. Overall, the PIs of the male group were smaller than those of the female group. There was no significant correlation between centroid size as a size factor and skeletal maturation level. Conclusions: Our findings suggest that the ACV maturation method, which was applied by statistical shape analysis, could confirm information about skeletal maturation in Japanese individuals as an available quantifier of skeletal maturation and could be as useful a quantitative method as the skeletal maturation index. PMID:25411713

  19. Creation of a virtual cutaneous tissue bank

    NASA Astrophysics Data System (ADS)

    LaFramboise, William A.; Shah, Sujal; Hoy, R. W.; Letbetter, D.; Petrosko, P.; Vennare, R.; Johnson, Peter C.

    2000-04-01

    Cellular and non-cellular constituents of skin contain fundamental morphometric features and structural patterns that correlate with tissue function. High resolution digital image acquisitions performed using an automated system and proprietary software to assemble adjacent images and create a contiguous, lossless, digital representation of individual microscope slide specimens. Serial extraction, evaluation and statistical analysis of cutaneous feature is performed utilizing an automated analysis system, to derive normal cutaneous parameters comprising essential structural skin components. Automated digital cutaneous analysis allows for fast extraction of microanatomic dat with accuracy approximating manual measurement. The process provides rapid assessment of feature both within individual specimens and across sample populations. The images, component data, and statistical analysis comprise a bioinformatics database to serve as an architectural blueprint for skin tissue engineering and as a diagnostic standard of comparison for pathologic specimens.

  20. Expected values and variances of Bragg peak intensities measured in a nanocrystalline powder diffraction experiment

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Öztürk, Hande; Noyan, I. Cevdet

    A rigorous study of sampling and intensity statistics applicable for a powder diffraction experiment as a function of crystallite size is presented. Our analysis yields approximate equations for the expected value, variance and standard deviations for both the number of diffracting grains and the corresponding diffracted intensity for a given Bragg peak. The classical formalism published in 1948 by Alexander, Klug & Kummer [J. Appl. Phys.(1948),19, 742–753] appears as a special case, limited to large crystallite sizes, here. It is observed that both the Lorentz probability expression and the statistics equations used in the classical formalism are inapplicable for nanocrystallinemore » powder samples.« less

Top