Sample records for statistical analysis package

  1. General specifications for the development of a USL NASA PC R and D statistical analysis support package

    NASA Technical Reports Server (NTRS)

    Dominick, Wayne D. (Editor); Bassari, Jinous; Triantafyllopoulos, Spiros

    1984-01-01

    The University of Southwestern Louisiana (USL) NASA PC R and D statistical analysis support package is designed to be a three-level package to allow statistical analysis for a variety of applications within the USL Data Base Management System (DBMS) contract work. The design addresses usage of the statistical facilities as a library package, as an interactive statistical analysis system, and as a batch processing package.

  2. Statistical principle and methodology in the NISAN system.

    PubMed Central

    Asano, C

    1979-01-01

    The NISAN system is a new interactive statistical analysis program package constructed by an organization of Japanese statisticans. The package is widely available for both statistical situations, confirmatory analysis and exploratory analysis, and is planned to obtain statistical wisdom and to choose optimal process of statistical analysis for senior statisticians. PMID:540594

  3. Analysis of Variance: What Is Your Statistical Software Actually Doing?

    ERIC Educational Resources Information Center

    Li, Jian; Lomax, Richard G.

    2011-01-01

    Users assume statistical software packages produce accurate results. In this article, the authors systematically examined Statistical Package for the Social Sciences (SPSS) and Statistical Analysis System (SAS) for 3 analysis of variance (ANOVA) designs, mixed-effects ANOVA, fixed-effects analysis of covariance (ANCOVA), and nested ANOVA. For each…

  4. A comparison of InVivoStat with other statistical software packages for analysis of data generated from animal experiments.

    PubMed

    Clark, Robin A; Shoaib, Mohammed; Hewitt, Katherine N; Stanford, S Clare; Bate, Simon T

    2012-08-01

    InVivoStat is a free-to-use statistical software package for analysis of data generated from animal experiments. The package is designed specifically for researchers in the behavioural sciences, where exploiting the experimental design is crucial for reliable statistical analyses. This paper compares the analysis of three experiments conducted using InVivoStat with other widely used statistical packages: SPSS (V19), PRISM (V5), UniStat (V5.6) and Statistica (V9). We show that InVivoStat provides results that are similar to those from the other packages and, in some cases, are more advanced. This investigation provides evidence of further validation of InVivoStat and should strengthen users' confidence in this new software package.

  5. Resilience Among Students at the Basic Enlisted Submarine School

    DTIC Science & Technology

    2016-12-01

    reported resilience. The Hayes’ Macro in the Statistical Package for the Social Sciences (SSPS) was used to uncover factors relevant to mediation analysis... Statistical Package for the Social Sciences (SPSS) was used to uncover factors relevant to mediation analysis. Findings suggest that the encouragement of...to Stressful Experiences Scale RTC Recruit Training Command SPSS Statistical Package for the Social Sciences SS Social Support SWB Subjective Well

  6. Interfaces between statistical analysis packages and the ESRI geographic information system

    NASA Technical Reports Server (NTRS)

    Masuoka, E.

    1980-01-01

    Interfaces between ESRI's geographic information system (GIS) data files and real valued data files written to facilitate statistical analysis and display of spatially referenced multivariable data are described. An example of data analysis which utilized the GIS and the statistical analysis system is presented to illustrate the utility of combining the analytic capability of a statistical package with the data management and display features of the GIS.

  7. Network Meta-Analysis Using R: A Review of Currently Available Automated Packages

    PubMed Central

    Neupane, Binod; Richer, Danielle; Bonner, Ashley Joel; Kibret, Taddele; Beyene, Joseph

    2014-01-01

    Network meta-analysis (NMA) – a statistical technique that allows comparison of multiple treatments in the same meta-analysis simultaneously – has become increasingly popular in the medical literature in recent years. The statistical methodology underpinning this technique and software tools for implementing the methods are evolving. Both commercial and freely available statistical software packages have been developed to facilitate the statistical computations using NMA with varying degrees of functionality and ease of use. This paper aims to introduce the reader to three R packages, namely, gemtc, pcnetmeta, and netmeta, which are freely available software tools implemented in R. Each automates the process of performing NMA so that users can perform the analysis with minimal computational effort. We present, compare and contrast the availability and functionality of different important features of NMA in these three packages so that clinical investigators and researchers can determine which R packages to implement depending on their analysis needs. Four summary tables detailing (i) data input and network plotting, (ii) modeling options, (iii) assumption checking and diagnostic testing, and (iv) inference and reporting tools, are provided, along with an analysis of a previously published dataset to illustrate the outputs available from each package. We demonstrate that each of the three packages provides a useful set of tools, and combined provide users with nearly all functionality that might be desired when conducting a NMA. PMID:25541687

  8. Network meta-analysis using R: a review of currently available automated packages.

    PubMed

    Neupane, Binod; Richer, Danielle; Bonner, Ashley Joel; Kibret, Taddele; Beyene, Joseph

    2014-01-01

    Network meta-analysis (NMA)--a statistical technique that allows comparison of multiple treatments in the same meta-analysis simultaneously--has become increasingly popular in the medical literature in recent years. The statistical methodology underpinning this technique and software tools for implementing the methods are evolving. Both commercial and freely available statistical software packages have been developed to facilitate the statistical computations using NMA with varying degrees of functionality and ease of use. This paper aims to introduce the reader to three R packages, namely, gemtc, pcnetmeta, and netmeta, which are freely available software tools implemented in R. Each automates the process of performing NMA so that users can perform the analysis with minimal computational effort. We present, compare and contrast the availability and functionality of different important features of NMA in these three packages so that clinical investigators and researchers can determine which R packages to implement depending on their analysis needs. Four summary tables detailing (i) data input and network plotting, (ii) modeling options, (iii) assumption checking and diagnostic testing, and (iv) inference and reporting tools, are provided, along with an analysis of a previously published dataset to illustrate the outputs available from each package. We demonstrate that each of the three packages provides a useful set of tools, and combined provide users with nearly all functionality that might be desired when conducting a NMA.

  9. A Review of Meta-Analysis Packages in R

    ERIC Educational Resources Information Center

    Polanin, Joshua R.; Hennessy, Emily A.; Tanner-Smith, Emily E.

    2017-01-01

    Meta-analysis is a statistical technique that allows an analyst to synthesize effect sizes from multiple primary studies. To estimate meta-analysis models, the open-source statistical environment R is quickly becoming a popular choice. The meta-analytic community has contributed to this growth by developing numerous packages specific to…

  10. INTERFACING SAS TO ORACLE IN THE UNIX ENVIRONMENT

    EPA Science Inventory

    SAS is an EPA standard data and statistical analysis software package while ORACLE is EPA's standard data base management system software package. RACLE has the advantage over SAS in data retrieval and storage capabilities but has limited data and statistical analysis capability....

  11. Powerlaw: a Python package for analysis of heavy-tailed distributions.

    PubMed

    Alstott, Jeff; Bullmore, Ed; Plenz, Dietmar

    2014-01-01

    Power laws are theoretically interesting probability distributions that are also frequently used to describe empirical data. In recent years, effective statistical methods for fitting power laws have been developed, but appropriate use of these techniques requires significant programming and statistical insight. In order to greatly decrease the barriers to using good statistical methods for fitting power law distributions, we developed the powerlaw Python package. This software package provides easy commands for basic fitting and statistical analysis of distributions. Notably, it also seeks to support a variety of user needs by being exhaustive in the options available to the user. The source code is publicly available and easily extensible.

  12. MWASTools: an R/bioconductor package for metabolome-wide association studies.

    PubMed

    Rodriguez-Martinez, Andrea; Posma, Joram M; Ayala, Rafael; Neves, Ana L; Anwar, Maryam; Petretto, Enrico; Emanueli, Costanza; Gauguier, Dominique; Nicholson, Jeremy K; Dumas, Marc-Emmanuel

    2018-03-01

    MWASTools is an R package designed to provide an integrated pipeline to analyse metabonomic data in large-scale epidemiological studies. Key functionalities of our package include: quality control analysis; metabolome-wide association analysis using various models (partial correlations, generalized linear models); visualization of statistical outcomes; metabolite assignment using statistical total correlation spectroscopy (STOCSY); and biological interpretation of metabolome-wide association studies results. The MWASTools R package is implemented in R (version  > =3.4) and is available from Bioconductor: https://bioconductor.org/packages/MWASTools/. m.dumas@imperial.ac.uk. Supplementary data are available at Bioinformatics online. © The Author(s) 2017. Published by Oxford University Press.

  13. Analysis of reference transactions using packaged computer programs.

    PubMed

    Calabretta, N; Ross, R

    1984-01-01

    Motivated by a continuing education class attended by the authors on the measurement of reference desk activities, the reference department at Scott Memorial Library initiated a project to gather data on reference desk transactions and to analyze the data by using packaged computer programs. The programs utilized for the project were SPSS (Statistical Package for the Social Sciences) and SAS (Statistical Analysis System). The planning, implementation and development of the project are described.

  14. MethVisual - visualization and exploratory statistical analysis of DNA methylation profiles from bisulfite sequencing.

    PubMed

    Zackay, Arie; Steinhoff, Christine

    2010-12-15

    Exploration of DNA methylation and its impact on various regulatory mechanisms has become a very active field of research. Simultaneously there is an arising need for tools to process and analyse the data together with statistical investigation and visualisation. MethVisual is a new application that enables exploratory analysis and intuitive visualization of DNA methylation data as is typically generated by bisulfite sequencing. The package allows the import of DNA methylation sequences, aligns them and performs quality control comparison. It comprises basic analysis steps as lollipop visualization, co-occurrence display of methylation of neighbouring and distant CpG sites, summary statistics on methylation status, clustering and correspondence analysis. The package has been developed for methylation data but can be also used for other data types for which binary coding can be inferred. The application of the package, as well as a comparison to existing DNA methylation analysis tools and its workflow based on two datasets is presented in this paper. The R package MethVisual offers various analysis procedures for data that can be binarized, in particular for bisulfite sequenced methylation data. R/Bioconductor has become one of the most important environments for statistical analysis of various types of biological and medical data. Therefore, any data analysis within R that allows the integration of various data types as provided from different technological platforms is convenient. It is the first and so far the only specific package for DNA methylation analysis, in particular for bisulfite sequenced data available in R/Bioconductor enviroment. The package is available for free at http://methvisual.molgen.mpg.de/ and from the Bioconductor Consortium http://www.bioconductor.org.

  15. MethVisual - visualization and exploratory statistical analysis of DNA methylation profiles from bisulfite sequencing

    PubMed Central

    2010-01-01

    Background Exploration of DNA methylation and its impact on various regulatory mechanisms has become a very active field of research. Simultaneously there is an arising need for tools to process and analyse the data together with statistical investigation and visualisation. Findings MethVisual is a new application that enables exploratory analysis and intuitive visualization of DNA methylation data as is typically generated by bisulfite sequencing. The package allows the import of DNA methylation sequences, aligns them and performs quality control comparison. It comprises basic analysis steps as lollipop visualization, co-occurrence display of methylation of neighbouring and distant CpG sites, summary statistics on methylation status, clustering and correspondence analysis. The package has been developed for methylation data but can be also used for other data types for which binary coding can be inferred. The application of the package, as well as a comparison to existing DNA methylation analysis tools and its workflow based on two datasets is presented in this paper. Conclusions The R package MethVisual offers various analysis procedures for data that can be binarized, in particular for bisulfite sequenced methylation data. R/Bioconductor has become one of the most important environments for statistical analysis of various types of biological and medical data. Therefore, any data analysis within R that allows the integration of various data types as provided from different technological platforms is convenient. It is the first and so far the only specific package for DNA methylation analysis, in particular for bisulfite sequenced data available in R/Bioconductor enviroment. The package is available for free at http://methvisual.molgen.mpg.de/ and from the Bioconductor Consortium http://www.bioconductor.org. PMID:21159174

  16. Image-analysis library

    NASA Technical Reports Server (NTRS)

    1980-01-01

    MATHPAC image-analysis library is collection of general-purpose mathematical and statistical routines and special-purpose data-analysis and pattern-recognition routines for image analysis. MATHPAC library consists of Linear Algebra, Optimization, Statistical-Summary, Densities and Distribution, Regression, and Statistical-Test packages.

  17. MSUSTAT.

    ERIC Educational Resources Information Center

    Mauriello, David

    1984-01-01

    Reviews an interactive statistical analysis package (designed to run on 8- and 16-bit machines that utilize CP/M 80 and MS-DOS operating systems), considering its features and uses, documentation, operation, and performance. The package consists of 40 general purpose statistical procedures derived from the classic textbook "Statistical…

  18. Analysis of half diallel mating designs I: a practical analysis procedure for ANOVA approximation.

    Treesearch

    G.R. Johnson; J.N. King

    1998-01-01

    Procedures to analyze half-diallel mating designs using the SAS statistical package are presented. The procedure requires two runs of PROC and VARCOMP and results in estimates of additive and non-additive genetic variation. The procedures described can be modified to work on most statistical software packages which can compute variance component estimates. The...

  19. Analysis of counting data: Development of the SATLAS Python package

    NASA Astrophysics Data System (ADS)

    Gins, W.; de Groote, R. P.; Bissell, M. L.; Granados Buitrago, C.; Ferrer, R.; Lynch, K. M.; Neyens, G.; Sels, S.

    2018-01-01

    For the analysis of low-statistics counting experiments, a traditional nonlinear least squares minimization routine may not always provide correct parameter and uncertainty estimates due to the assumptions inherent in the algorithm(s). In response to this, a user-friendly Python package (SATLAS) was written to provide an easy interface between the data and a variety of minimization algorithms which are suited for analyzinglow, as well as high, statistics data. The advantage of this package is that it allows the user to define their own model function and then compare different minimization routines to determine the optimal parameter values and their respective (correlated) errors. Experimental validation of the different approaches in the package is done through analysis of hyperfine structure data of 203Fr gathered by the CRIS experiment at ISOLDE, CERN.

  20. SimHap GUI: an intuitive graphical user interface for genetic association analysis.

    PubMed

    Carter, Kim W; McCaskie, Pamela A; Palmer, Lyle J

    2008-12-25

    Researchers wishing to conduct genetic association analysis involving single nucleotide polymorphisms (SNPs) or haplotypes are often confronted with the lack of user-friendly graphical analysis tools, requiring sophisticated statistical and informatics expertise to perform relatively straightforward tasks. Tools, such as the SimHap package for the R statistics language, provide the necessary statistical operations to conduct sophisticated genetic analysis, but lacks a graphical user interface that allows anyone but a professional statistician to effectively utilise the tool. We have developed SimHap GUI, a cross-platform integrated graphical analysis tool for conducting epidemiological, single SNP and haplotype-based association analysis. SimHap GUI features a novel workflow interface that guides the user through each logical step of the analysis process, making it accessible to both novice and advanced users. This tool provides a seamless interface to the SimHap R package, while providing enhanced functionality such as sophisticated data checking, automated data conversion, and real-time estimations of haplotype simulation progress. SimHap GUI provides a novel, easy-to-use, cross-platform solution for conducting a range of genetic and non-genetic association analyses. This provides a free alternative to commercial statistics packages that is specifically designed for genetic association analysis.

  1. TRAPR: R Package for Statistical Analysis and Visualization of RNA-Seq Data.

    PubMed

    Lim, Jae Hyun; Lee, Soo Youn; Kim, Ju Han

    2017-03-01

    High-throughput transcriptome sequencing, also known as RNA sequencing (RNA-Seq), is a standard technology for measuring gene expression with unprecedented accuracy. Numerous bioconductor packages have been developed for the statistical analysis of RNA-Seq data. However, these tools focus on specific aspects of the data analysis pipeline, and are difficult to appropriately integrate with one another due to their disparate data structures and processing methods. They also lack visualization methods to confirm the integrity of the data and the process. In this paper, we propose an R-based RNA-Seq analysis pipeline called TRAPR, an integrated tool that facilitates the statistical analysis and visualization of RNA-Seq expression data. TRAPR provides various functions for data management, the filtering of low-quality data, normalization, transformation, statistical analysis, data visualization, and result visualization that allow researchers to build customized analysis pipelines.

  2. An R package for analyzing and modeling ranking data

    PubMed Central

    2013-01-01

    Background In medical informatics, psychology, market research and many other fields, researchers often need to analyze and model ranking data. However, there is no statistical software that provides tools for the comprehensive analysis of ranking data. Here, we present pmr, an R package for analyzing and modeling ranking data with a bundle of tools. The pmr package enables descriptive statistics (mean rank, pairwise frequencies, and marginal matrix), Analytic Hierarchy Process models (with Saaty’s and Koczkodaj’s inconsistencies), probability models (Luce model, distance-based model, and rank-ordered logit model), and the visualization of ranking data with multidimensional preference analysis. Results Examples of the use of package pmr are given using a real ranking dataset from medical informatics, in which 566 Hong Kong physicians ranked the top five incentives (1: competitive pressures; 2: increased savings; 3: government regulation; 4: improved efficiency; 5: improved quality care; 6: patient demand; 7: financial incentives) to the computerization of clinical practice. The mean rank showed that item 4 is the most preferred item and item 3 is the least preferred item, and significance difference was found between physicians’ preferences with respect to their monthly income. A multidimensional preference analysis identified two dimensions that explain 42% of the total variance. The first can be interpreted as the overall preference of the seven items (labeled as “internal/external”), and the second dimension can be interpreted as their overall variance of (labeled as “push/pull factors”). Various statistical models were fitted, and the best were found to be weighted distance-based models with Spearman’s footrule distance. Conclusions In this paper, we presented the R package pmr, the first package for analyzing and modeling ranking data. The package provides insight to users through descriptive statistics of ranking data. Users can also visualize ranking data by applying a thought multidimensional preference analysis. Various probability models for ranking data are also included, allowing users to choose that which is most suitable to their specific situations. PMID:23672645

  3. An R package for analyzing and modeling ranking data.

    PubMed

    Lee, Paul H; Yu, Philip L H

    2013-05-14

    In medical informatics, psychology, market research and many other fields, researchers often need to analyze and model ranking data. However, there is no statistical software that provides tools for the comprehensive analysis of ranking data. Here, we present pmr, an R package for analyzing and modeling ranking data with a bundle of tools. The pmr package enables descriptive statistics (mean rank, pairwise frequencies, and marginal matrix), Analytic Hierarchy Process models (with Saaty's and Koczkodaj's inconsistencies), probability models (Luce model, distance-based model, and rank-ordered logit model), and the visualization of ranking data with multidimensional preference analysis. Examples of the use of package pmr are given using a real ranking dataset from medical informatics, in which 566 Hong Kong physicians ranked the top five incentives (1: competitive pressures; 2: increased savings; 3: government regulation; 4: improved efficiency; 5: improved quality care; 6: patient demand; 7: financial incentives) to the computerization of clinical practice. The mean rank showed that item 4 is the most preferred item and item 3 is the least preferred item, and significance difference was found between physicians' preferences with respect to their monthly income. A multidimensional preference analysis identified two dimensions that explain 42% of the total variance. The first can be interpreted as the overall preference of the seven items (labeled as "internal/external"), and the second dimension can be interpreted as their overall variance of (labeled as "push/pull factors"). Various statistical models were fitted, and the best were found to be weighted distance-based models with Spearman's footrule distance. In this paper, we presented the R package pmr, the first package for analyzing and modeling ranking data. The package provides insight to users through descriptive statistics of ranking data. Users can also visualize ranking data by applying a thought multidimensional preference analysis. Various probability models for ranking data are also included, allowing users to choose that which is most suitable to their specific situations.

  4. SimHap GUI: An intuitive graphical user interface for genetic association analysis

    PubMed Central

    Carter, Kim W; McCaskie, Pamela A; Palmer, Lyle J

    2008-01-01

    Background Researchers wishing to conduct genetic association analysis involving single nucleotide polymorphisms (SNPs) or haplotypes are often confronted with the lack of user-friendly graphical analysis tools, requiring sophisticated statistical and informatics expertise to perform relatively straightforward tasks. Tools, such as the SimHap package for the R statistics language, provide the necessary statistical operations to conduct sophisticated genetic analysis, but lacks a graphical user interface that allows anyone but a professional statistician to effectively utilise the tool. Results We have developed SimHap GUI, a cross-platform integrated graphical analysis tool for conducting epidemiological, single SNP and haplotype-based association analysis. SimHap GUI features a novel workflow interface that guides the user through each logical step of the analysis process, making it accessible to both novice and advanced users. This tool provides a seamless interface to the SimHap R package, while providing enhanced functionality such as sophisticated data checking, automated data conversion, and real-time estimations of haplotype simulation progress. Conclusion SimHap GUI provides a novel, easy-to-use, cross-platform solution for conducting a range of genetic and non-genetic association analyses. This provides a free alternative to commercial statistics packages that is specifically designed for genetic association analysis. PMID:19109877

  5. Analysis of USAREUR Family Housing.

    DTIC Science & Technology

    1985-04-01

    Standard Installation/Division Personnel System SJA ................ Staff Judge Advocate SPSS ............... Statistical Package for the...for Projecting Family Housing Requirements. a. Attempts to define USAREUR’s programmable family housing deficit Sbased on the FHS have caused anguish ...responses using the Statistical Package for the Social Sciences ( SPSS ) computer program. E-2 ANNEX E RESPONSE TO ESC HOUSING QUESTIONNAIRE Section Page I

  6. A Computer Evolution in Teaching Undergraduate Time Series

    ERIC Educational Resources Information Center

    Hodgess, Erin M.

    2004-01-01

    In teaching undergraduate time series courses, we have used a mixture of various statistical packages. We have finally been able to teach all of the applied concepts within one statistical package; R. This article describes the process that we use to conduct a thorough analysis of a time series. An example with a data set is provided. We compare…

  7. dartr: An r package to facilitate analysis of SNP data generated from reduced representation genome sequencing.

    PubMed

    Gruber, Bernd; Unmack, Peter J; Berry, Oliver F; Georges, Arthur

    2018-05-01

    Although vast technological advances have been made and genetic software packages are growing in number, it is not a trivial task to analyse SNP data. We announce a new r package, dartr, enabling the analysis of single nucleotide polymorphism data for population genomic and phylogenomic applications. dartr provides user-friendly functions for data quality control and marker selection, and permits rigorous evaluations of conformation to Hardy-Weinberg equilibrium, gametic-phase disequilibrium and neutrality. The package reports standard descriptive statistics, permits exploration of patterns in the data through principal components analysis and conducts standard F-statistics, as well as basic phylogenetic analyses, population assignment, isolation by distance and exports data to a variety of commonly used downstream applications (e.g., newhybrids, faststructure and phylogeny applications) outside of the r environment. The package serves two main purposes: first, a user-friendly approach to lower the hurdle to analyse such data-therefore, the package comes with a detailed tutorial targeted to the r beginner to allow data analysis without requiring deep knowledge of r. Second, we use a single, well-established format-genlight from the adegenet package-as input for all our functions to avoid data reformatting. By strictly using the genlight format, we hope to facilitate this format as the de facto standard of future software developments and hence reduce the format jungle of genetic data sets. The dartr package is available via the r CRAN network and GitHub. © 2017 John Wiley & Sons Ltd.

  8. Regional Morphology Analysis Package (RMAP): Empirical Orthogonal Function Analysis, Background and Examples

    DTIC Science & Technology

    2007-10-01

    1984. Complex principal component analysis : Theory and examples. Journal of Climate and Applied Meteorology 23: 1660-1673. Hotelling, H. 1933...Sediments 99. ASCE: 2,566-2,581. Von Storch, H., and A. Navarra. 1995. Analysis of climate variability. Applications of statistical techniques. Berlin...ERDC TN-SWWRP-07-9 October 2007 Regional Morphology Empirical Analysis Package (RMAP): Orthogonal Function Analysis , Background and Examples by

  9. Orchestrating high-throughput genomic analysis with Bioconductor

    PubMed Central

    Huber, Wolfgang; Carey, Vincent J.; Gentleman, Robert; Anders, Simon; Carlson, Marc; Carvalho, Benilton S.; Bravo, Hector Corrada; Davis, Sean; Gatto, Laurent; Girke, Thomas; Gottardo, Raphael; Hahne, Florian; Hansen, Kasper D.; Irizarry, Rafael A.; Lawrence, Michael; Love, Michael I.; MacDonald, James; Obenchain, Valerie; Oleś, Andrzej K.; Pagès, Hervé; Reyes, Alejandro; Shannon, Paul; Smyth, Gordon K.; Tenenbaum, Dan; Waldron, Levi; Morgan, Martin

    2015-01-01

    Bioconductor is an open-source, open-development software project for the analysis and comprehension of high-throughput data in genomics and molecular biology. The project aims to enable interdisciplinary research, collaboration and rapid development of scientific software. Based on the statistical programming language R, Bioconductor comprises 934 interoperable packages contributed by a large, diverse community of scientists. Packages cover a range of bioinformatic and statistical applications. They undergo formal initial review and continuous automated testing. We present an overview for prospective users and contributors. PMID:25633503

  10. Application of the Linux cluster for exhaustive window haplotype analysis using the FBAT and Unphased programs.

    PubMed

    Mishima, Hiroyuki; Lidral, Andrew C; Ni, Jun

    2008-05-28

    Genetic association studies have been used to map disease-causing genes. A newly introduced statistical method, called exhaustive haplotype association study, analyzes genetic information consisting of different numbers and combinations of DNA sequence variations along a chromosome. Such studies involve a large number of statistical calculations and subsequently high computing power. It is possible to develop parallel algorithms and codes to perform the calculations on a high performance computing (HPC) system. However, most existing commonly-used statistic packages for genetic studies are non-parallel versions. Alternatively, one may use the cutting-edge technology of grid computing and its packages to conduct non-parallel genetic statistical packages on a centralized HPC system or distributed computing systems. In this paper, we report the utilization of a queuing scheduler built on the Grid Engine and run on a Rocks Linux cluster for our genetic statistical studies. Analysis of both consecutive and combinational window haplotypes was conducted by the FBAT (Laird et al., 2000) and Unphased (Dudbridge, 2003) programs. The dataset consisted of 26 loci from 277 extended families (1484 persons). Using the Rocks Linux cluster with 22 compute-nodes, FBAT jobs performed about 14.4-15.9 times faster, while Unphased jobs performed 1.1-18.6 times faster compared to the accumulated computation duration. Execution of exhaustive haplotype analysis using non-parallel software packages on a Linux-based system is an effective and efficient approach in terms of cost and performance.

  11. Application of the Linux cluster for exhaustive window haplotype analysis using the FBAT and Unphased programs

    PubMed Central

    Mishima, Hiroyuki; Lidral, Andrew C; Ni, Jun

    2008-01-01

    Background Genetic association studies have been used to map disease-causing genes. A newly introduced statistical method, called exhaustive haplotype association study, analyzes genetic information consisting of different numbers and combinations of DNA sequence variations along a chromosome. Such studies involve a large number of statistical calculations and subsequently high computing power. It is possible to develop parallel algorithms and codes to perform the calculations on a high performance computing (HPC) system. However, most existing commonly-used statistic packages for genetic studies are non-parallel versions. Alternatively, one may use the cutting-edge technology of grid computing and its packages to conduct non-parallel genetic statistical packages on a centralized HPC system or distributed computing systems. In this paper, we report the utilization of a queuing scheduler built on the Grid Engine and run on a Rocks Linux cluster for our genetic statistical studies. Results Analysis of both consecutive and combinational window haplotypes was conducted by the FBAT (Laird et al., 2000) and Unphased (Dudbridge, 2003) programs. The dataset consisted of 26 loci from 277 extended families (1484 persons). Using the Rocks Linux cluster with 22 compute-nodes, FBAT jobs performed about 14.4–15.9 times faster, while Unphased jobs performed 1.1–18.6 times faster compared to the accumulated computation duration. Conclusion Execution of exhaustive haplotype analysis using non-parallel software packages on a Linux-based system is an effective and efficient approach in terms of cost and performance. PMID:18541045

  12. pROC: an open-source package for R and S+ to analyze and compare ROC curves.

    PubMed

    Robin, Xavier; Turck, Natacha; Hainard, Alexandre; Tiberti, Natalia; Lisacek, Frédérique; Sanchez, Jean-Charles; Müller, Markus

    2011-03-17

    Receiver operating characteristic (ROC) curves are useful tools to evaluate classifiers in biomedical and bioinformatics applications. However, conclusions are often reached through inconsistent use or insufficient statistical analysis. To support researchers in their ROC curves analysis we developed pROC, a package for R and S+ that contains a set of tools displaying, analyzing, smoothing and comparing ROC curves in a user-friendly, object-oriented and flexible interface. With data previously imported into the R or S+ environment, the pROC package builds ROC curves and includes functions for computing confidence intervals, statistical tests for comparing total or partial area under the curve or the operating points of different classifiers, and methods for smoothing ROC curves. Intermediary and final results are visualised in user-friendly interfaces. A case study based on published clinical and biomarker data shows how to perform a typical ROC analysis with pROC. pROC is a package for R and S+ specifically dedicated to ROC analysis. It proposes multiple statistical tests to compare ROC curves, and in particular partial areas under the curve, allowing proper ROC interpretation. pROC is available in two versions: in the R programming language or with a graphical user interface in the S+ statistical software. It is accessible at http://expasy.org/tools/pROC/ under the GNU General Public License. It is also distributed through the CRAN and CSAN public repositories, facilitating its installation.

  13. Quantitative Methods for Analysing Joint Questionnaire Data: Exploring the Role of Joint in Force Design

    DTIC Science & Technology

    2015-08-01

    the nine questions. The Statistical Package for the Social Sciences ( SPSS ) [11] was used to conduct statistical analysis on the sample. Two types...constructs. SPSS was again used to conduct statistical analysis on the sample. This time factor analysis was conducted. Factor analysis attempts to...Business Research Methods and Statistics using SPSS . P432. 11 IBM SPSS Statistics . (2012) 12 Burns, R.B., Burns, R.A. (2008) ‘Business Research

  14. Contrast Analysis: A Tutorial

    ERIC Educational Resources Information Center

    Haans, Antal

    2018-01-01

    Contrast analysis is a relatively simple but effective statistical method for testing theoretical predictions about differences between group means against the empirical data. Despite its advantages, contrast analysis is hardly used to date, perhaps because it is not implemented in a convenient manner in many statistical software packages. This…

  15. Comparison of requirements and capabilities of major multipurpose software packages.

    PubMed

    Igo, Robert P; Schnell, Audrey H

    2012-01-01

    The aim of this chapter is to introduce the reader to commonly used software packages and illustrate their input requirements, analysis options, strengths, and limitations. We focus on packages that perform more than one function and include a program for quality control, linkage, and association analyses. Additional inclusion criteria were (1) programs that are free to academic users and (2) currently supported, maintained, and developed. Using those criteria, we chose to review three programs: Statistical Analysis for Genetic Epidemiology (S.A.G.E.), PLINK, and Merlin. We will describe the required input format and analysis options. We will not go into detail about every possible program in the packages, but we will give an overview of the packages requirements and capabilities.

  16. Logistic Regression with Multiple Random Effects: A Simulation Study of Estimation Methods and Statistical Packages.

    PubMed

    Kim, Yoonsang; Choi, Young-Ku; Emery, Sherry

    2013-08-01

    Several statistical packages are capable of estimating generalized linear mixed models and these packages provide one or more of three estimation methods: penalized quasi-likelihood, Laplace, and Gauss-Hermite. Many studies have investigated these methods' performance for the mixed-effects logistic regression model. However, the authors focused on models with one or two random effects and assumed a simple covariance structure between them, which may not be realistic. When there are multiple correlated random effects in a model, the computation becomes intensive, and often an algorithm fails to converge. Moreover, in our analysis of smoking status and exposure to anti-tobacco advertisements, we have observed that when a model included multiple random effects, parameter estimates varied considerably from one statistical package to another even when using the same estimation method. This article presents a comprehensive review of the advantages and disadvantages of each estimation method. In addition, we compare the performances of the three methods across statistical packages via simulation, which involves two- and three-level logistic regression models with at least three correlated random effects. We apply our findings to a real dataset. Our results suggest that two packages-SAS GLIMMIX Laplace and SuperMix Gaussian quadrature-perform well in terms of accuracy, precision, convergence rates, and computing speed. We also discuss the strengths and weaknesses of the two packages in regard to sample sizes.

  17. Constructing and Modifying Sequence Statistics for relevent Using informR in 𝖱

    PubMed Central

    Marcum, Christopher Steven; Butts, Carter T.

    2015-01-01

    The informR package greatly simplifies the analysis of complex event histories in 𝖱 by providing user friendly tools to build sufficient statistics for the relevent package. Historically, building sufficient statistics to model event sequences (of the form a→b) using the egocentric generalization of Butts’ (2008) relational event framework for modeling social action has been cumbersome. The informR package simplifies the construction of the complex list of arrays needed by the rem() model fitting for a variety of cases involving egocentric event data, multiple event types, and/or support constraints. This paper introduces these tools using examples from real data extracted from the American Time Use Survey. PMID:26185488

  18. Advanced statistical methods for improved data analysis of NASA astrophysics missions

    NASA Technical Reports Server (NTRS)

    Feigelson, Eric D.

    1992-01-01

    The investigators under this grant studied ways to improve the statistical analysis of astronomical data. They looked at existing techniques, the development of new techniques, and the production and distribution of specialized software to the astronomical community. Abstracts of nine papers that were produced are included, as well as brief descriptions of four software packages. The articles that are abstracted discuss analytical and Monte Carlo comparisons of six different linear least squares fits, a (second) paper on linear regression in astronomy, two reviews of public domain software for the astronomer, subsample and half-sample methods for estimating sampling distributions, a nonparametric estimation of survival functions under dependent competing risks, censoring in astronomical data due to nondetections, an astronomy survival analysis computer package called ASURV, and improving the statistical methodology of astronomical data analysis.

  19. The gputools package enables GPU computing in R.

    PubMed

    Buckner, Joshua; Wilson, Justin; Seligman, Mark; Athey, Brian; Watson, Stanley; Meng, Fan

    2010-01-01

    By default, the R statistical environment does not make use of parallelism. Researchers may resort to expensive solutions such as cluster hardware for large analysis tasks. Graphics processing units (GPUs) provide an inexpensive and computationally powerful alternative. Using R and the CUDA toolkit from Nvidia, we have implemented several functions commonly used in microarray gene expression analysis for GPU-equipped computers. R users can take advantage of the better performance provided by an Nvidia GPU. The package is available from CRAN, the R project's repository of packages, at http://cran.r-project.org/web/packages/gputools More information about our gputools R package is available at http://brainarray.mbni.med.umich.edu/brainarray/Rgpgpu

  20. A statistical package for computing time and frequency domain analysis

    NASA Technical Reports Server (NTRS)

    Brownlow, J.

    1978-01-01

    The spectrum analysis (SPA) program is a general purpose digital computer program designed to aid in data analysis. The program does time and frequency domain statistical analyses as well as some preanalysis data preparation. The capabilities of the SPA program include linear trend removal and/or digital filtering of data, plotting and/or listing of both filtered and unfiltered data, time domain statistical characterization of data, and frequency domain statistical characterization of data.

  1. The Statistical Package for the Social Sciences (SPSS) as an adjunct to pharmacokinetic analysis.

    PubMed

    Mather, L E; Austin, K L

    1983-01-01

    Computer techniques for numerical analysis are well known to pharmacokineticists. Powerful techniques for data file management have been developed by social scientists but have, in general, been ignored by pharmacokineticists because of their apparent lack of ability to interface with pharmacokinetic programs. Extensive use has been made of the Statistical Package for the Social Sciences (SPSS) for its data handling capabilities, but at the same time, techniques have been developed within SPSS to interface with pharmacokinetic programs of the users' choice and to carry out a variety of user-defined pharmacokinetic tasks within SPSS commands, apart from the expected variety of statistical tasks. Because it is based on a ubiquitous package, this methodology has all of the benefits of excellent documentation, interchangeability between different types and sizes of machines and true portability of techniques and data files. An example is given of the total management of a pharmacokinetic study previously reported in the literature by the authors.

  2. Logistic Regression with Multiple Random Effects: A Simulation Study of Estimation Methods and Statistical Packages

    PubMed Central

    Kim, Yoonsang; Emery, Sherry

    2013-01-01

    Several statistical packages are capable of estimating generalized linear mixed models and these packages provide one or more of three estimation methods: penalized quasi-likelihood, Laplace, and Gauss-Hermite. Many studies have investigated these methods’ performance for the mixed-effects logistic regression model. However, the authors focused on models with one or two random effects and assumed a simple covariance structure between them, which may not be realistic. When there are multiple correlated random effects in a model, the computation becomes intensive, and often an algorithm fails to converge. Moreover, in our analysis of smoking status and exposure to anti-tobacco advertisements, we have observed that when a model included multiple random effects, parameter estimates varied considerably from one statistical package to another even when using the same estimation method. This article presents a comprehensive review of the advantages and disadvantages of each estimation method. In addition, we compare the performances of the three methods across statistical packages via simulation, which involves two- and three-level logistic regression models with at least three correlated random effects. We apply our findings to a real dataset. Our results suggest that two packages—SAS GLIMMIX Laplace and SuperMix Gaussian quadrature—perform well in terms of accuracy, precision, convergence rates, and computing speed. We also discuss the strengths and weaknesses of the two packages in regard to sample sizes. PMID:24288415

  3. Visualization and statistical comparisons of microbial communities using R packages on Phylochip data.

    PubMed

    Holmes, Susan; Alekseyenko, Alexander; Timme, Alden; Nelson, Tyrrell; Pasricha, Pankaj Jay; Spormann, Alfred

    2011-01-01

    This article explains the statistical and computational methodology used to analyze species abundances collected using the LNBL Phylochip in a study of Irritable Bowel Syndrome (IBS) in rats. Some tools already available for the analysis of ordinary microarray data are useful in this type of statistical analysis. For instance in correcting for multiple testing we use Family Wise Error rate control and step-down tests (available in the multtest package). Once the most significant species are chosen we use the hypergeometric tests familiar for testing GO categories to test specific phyla and families. We provide examples of normalization, multivariate projections, batch effect detection and integration of phylogenetic covariation, as well as tree equalization and robustification methods.

  4. ShortRead: a bioconductor package for input, quality assessment and exploration of high-throughput sequence data

    PubMed Central

    Morgan, Martin; Anders, Simon; Lawrence, Michael; Aboyoun, Patrick; Pagès, Hervé; Gentleman, Robert

    2009-01-01

    Summary: ShortRead is a package for input, quality assessment, manipulation and output of high-throughput sequencing data. ShortRead is provided in the R and Bioconductor environments, allowing ready access to additional facilities for advanced statistical analysis, data transformation, visualization and integration with diverse genomic resources. Availability and Implementation: This package is implemented in R and available at the Bioconductor web site; the package contains a ‘vignette’ outlining typical work flows. Contact: mtmorgan@fhcrc.org PMID:19654119

  5. Advanced functional network analysis in the geosciences: The pyunicorn package

    NASA Astrophysics Data System (ADS)

    Donges, Jonathan F.; Heitzig, Jobst; Runge, Jakob; Schultz, Hanna C. H.; Wiedermann, Marc; Zech, Alraune; Feldhoff, Jan; Rheinwalt, Aljoscha; Kutza, Hannes; Radebach, Alexander; Marwan, Norbert; Kurths, Jürgen

    2013-04-01

    Functional networks are a powerful tool for analyzing large geoscientific datasets such as global fields of climate time series originating from observations or model simulations. pyunicorn (pythonic unified complex network and recurrence analysis toolbox) is an open-source, fully object-oriented and easily parallelizable package written in the language Python. It allows for constructing functional networks (aka climate networks) representing the structure of statistical interrelationships in large datasets and, subsequently, investigating this structure using advanced methods of complex network theory such as measures for networks of interacting networks, node-weighted statistics or network surrogates. Additionally, pyunicorn allows to study the complex dynamics of geoscientific systems as recorded by time series by means of recurrence networks and visibility graphs. The range of possible applications of the package is outlined drawing on several examples from climatology.

  6. GenomeGraphs: integrated genomic data visualization with R.

    PubMed

    Durinck, Steffen; Bullard, James; Spellman, Paul T; Dudoit, Sandrine

    2009-01-06

    Biological studies involve a growing number of distinct high-throughput experiments to characterize samples of interest. There is a lack of methods to visualize these different genomic datasets in a versatile manner. In addition, genomic data analysis requires integrated visualization of experimental data along with constantly changing genomic annotation and statistical analyses. We developed GenomeGraphs, as an add-on software package for the statistical programming environment R, to facilitate integrated visualization of genomic datasets. GenomeGraphs uses the biomaRt package to perform on-line annotation queries to Ensembl and translates these to gene/transcript structures in viewports of the grid graphics package. This allows genomic annotation to be plotted together with experimental data. GenomeGraphs can also be used to plot custom annotation tracks in combination with different experimental data types together in one plot using the same genomic coordinate system. GenomeGraphs is a flexible and extensible software package which can be used to visualize a multitude of genomic datasets within the statistical programming environment R.

  7. Technological Tools in the Introductory Statistics Classroom: Effects on Student Understanding of Inferential Statistics

    ERIC Educational Resources Information Center

    Meletiou-Mavrotheris, Maria

    2004-01-01

    While technology has become an integral part of introductory statistics courses, the programs typically employed are professional packages designed primarily for data analysis rather than for learning. Findings from several studies suggest that use of such software in the introductory statistics classroom may not be very effective in helping…

  8. Effectiveness of Simulation in a Hybrid and Online Networking Course.

    ERIC Educational Resources Information Center

    Cameron, Brian H.

    2003-01-01

    Reports on a study that compares the performance of students enrolled in two sections of a Web-based computer networking course: one utilizing a simulation package and the second utilizing a static, graphical software package. Analysis shows statistically significant improvements in performance in the simulation group compared to the…

  9. Unified functional network and nonlinear time series analysis for complex systems science: The pyunicorn package

    NASA Astrophysics Data System (ADS)

    Donges, Jonathan F.; Heitzig, Jobst; Beronov, Boyan; Wiedermann, Marc; Runge, Jakob; Feng, Qing Yi; Tupikina, Liubov; Stolbova, Veronika; Donner, Reik V.; Marwan, Norbert; Dijkstra, Henk A.; Kurths, Jürgen

    2015-11-01

    We introduce the pyunicorn (Pythonic unified complex network and recurrence analysis toolbox) open source software package for applying and combining modern methods of data analysis and modeling from complex network theory and nonlinear time series analysis. pyunicorn is a fully object-oriented and easily parallelizable package written in the language Python. It allows for the construction of functional networks such as climate networks in climatology or functional brain networks in neuroscience representing the structure of statistical interrelationships in large data sets of time series and, subsequently, investigating this structure using advanced methods of complex network theory such as measures and models for spatial networks, networks of interacting networks, node-weighted statistics, or network surrogates. Additionally, pyunicorn provides insights into the nonlinear dynamics of complex systems as recorded in uni- and multivariate time series from a non-traditional perspective by means of recurrence quantification analysis, recurrence networks, visibility graphs, and construction of surrogate time series. The range of possible applications of the library is outlined, drawing on several examples mainly from the field of climatology.

  10. Software for the Integration of Multiomics Experiments in Bioconductor.

    PubMed

    Ramos, Marcel; Schiffer, Lucas; Re, Angela; Azhar, Rimsha; Basunia, Azfar; Rodriguez, Carmen; Chan, Tiffany; Chapman, Phil; Davis, Sean R; Gomez-Cabrero, David; Culhane, Aedin C; Haibe-Kains, Benjamin; Hansen, Kasper D; Kodali, Hanish; Louis, Marie S; Mer, Arvind S; Riester, Markus; Morgan, Martin; Carey, Vince; Waldron, Levi

    2017-11-01

    Multiomics experiments are increasingly commonplace in biomedical research and add layers of complexity to experimental design, data integration, and analysis. R and Bioconductor provide a generic framework for statistical analysis and visualization, as well as specialized data classes for a variety of high-throughput data types, but methods are lacking for integrative analysis of multiomics experiments. The MultiAssayExperiment software package, implemented in R and leveraging Bioconductor software and design principles, provides for the coordinated representation of, storage of, and operation on multiple diverse genomics data. We provide the unrestricted multiple 'omics data for each cancer tissue in The Cancer Genome Atlas as ready-to-analyze MultiAssayExperiment objects and demonstrate in these and other datasets how the software simplifies data representation, statistical analysis, and visualization. The MultiAssayExperiment Bioconductor package reduces major obstacles to efficient, scalable, and reproducible statistical analysis of multiomics data and enhances data science applications of multiple omics datasets. Cancer Res; 77(21); e39-42. ©2017 AACR . ©2017 American Association for Cancer Research.

  11. Visual Data Analysis for Satellites

    NASA Technical Reports Server (NTRS)

    Lau, Yee; Bhate, Sachin; Fitzpatrick, Patrick

    2008-01-01

    The Visual Data Analysis Package is a collection of programs and scripts that facilitate visual analysis of data available from NASA and NOAA satellites, as well as dropsonde, buoy, and conventional in-situ observations. The package features utilities for data extraction, data quality control, statistical analysis, and data visualization. The Hierarchical Data Format (HDF) satellite data extraction routines from NASA's Jet Propulsion Laboratory were customized for specific spatial coverage and file input/output. Statistical analysis includes the calculation of the relative error, the absolute error, and the root mean square error. Other capabilities include curve fitting through the data points to fill in missing data points between satellite passes or where clouds obscure satellite data. For data visualization, the software provides customizable Generic Mapping Tool (GMT) scripts to generate difference maps, scatter plots, line plots, vector plots, histograms, timeseries, and color fill images.

  12. HydroApps: An R package for statistical simulation to use in regional analysis

    NASA Astrophysics Data System (ADS)

    Ganora, D.

    2013-12-01

    The HydroApps package is a newborn R extension initially developed to support the use of a recent model for flood frequency estimation developed for applications in Northwestern Italy; it also contains some general tools for regional analyses and can be easily extended to include other statistical models. The package is currently at an experimental level of development. The HydroApps is a corollary of the SSEM project for regional flood frequency analysis, although it was developed independently to support various instances of regional analyses. Its aim is to provide a basis for interplay between statistical simulation and practical operational use. In particular, the main module of the package deals with the building of the confidence bands of flood frequency curves expressed by means of their L-moments. Other functions include pre-processing and visualization of hydrologic time series, analysis of the optimal design-flood under uncertainty, but also tools useful in water resources management for the estimation of flow duration curves and their sensitivity to water withdrawals. Particular attention is devoted to the code granularity, i.e. the level of detail and aggregation of the code: a greater detail means more low-level functions, which entails more flexibility but reduces the ease of use for practical use. A balance between detail and simplicity is necessary and can be resolved with appropriate wrapping functions and specific help pages for each working block. From a more general viewpoint, the package has not really and user-friendly interface, but runs on multiple operating systems and it's easy to update, as many other open-source projects., The HydroApps functions and their features are reported in order to share ideas and materials to improve the ';technological' and information transfer between scientist communities and final users like policy makers.

  13. Twice random, once mixed: applying mixed models to simultaneously analyze random effects of language and participants.

    PubMed

    Janssen, Dirk P

    2012-03-01

    Psychologists, psycholinguists, and other researchers using language stimuli have been struggling for more than 30 years with the problem of how to analyze experimental data that contain two crossed random effects (items and participants). The classical analysis of variance does not apply; alternatives have been proposed but have failed to catch on, and a statistically unsatisfactory procedure of using two approximations (known as F(1) and F(2)) has become the standard. A simple and elegant solution using mixed model analysis has been available for 15 years, and recent improvements in statistical software have made mixed models analysis widely available. The aim of this article is to increase the use of mixed models by giving a concise practical introduction and by giving clear directions for undertaking the analysis in the most popular statistical packages. The article also introduces the DJMIXED: add-on package for SPSS, which makes entering the models and reporting their results as straightforward as possible.

  14. WebArray: an online platform for microarray data analysis

    PubMed Central

    Xia, Xiaoqin; McClelland, Michael; Wang, Yipeng

    2005-01-01

    Background Many cutting-edge microarray analysis tools and algorithms, including commonly used limma and affy packages in Bioconductor, need sophisticated knowledge of mathematics, statistics and computer skills for implementation. Commercially available software can provide a user-friendly interface at considerable cost. To facilitate the use of these tools for microarray data analysis on an open platform we developed an online microarray data analysis platform, WebArray, for bench biologists to utilize these tools to explore data from single/dual color microarray experiments. Results The currently implemented functions were based on limma and affy package from Bioconductor, the spacings LOESS histogram (SPLOSH) method, PCA-assisted normalization method and genome mapping method. WebArray incorporates these packages and provides a user-friendly interface for accessing a wide range of key functions of limma and others, such as spot quality weight, background correction, graphical plotting, normalization, linear modeling, empirical bayes statistical analysis, false discovery rate (FDR) estimation, chromosomal mapping for genome comparison. Conclusion WebArray offers a convenient platform for bench biologists to access several cutting-edge microarray data analysis tools. The website is freely available at . It runs on a Linux server with Apache and MySQL. PMID:16371165

  15. pcr: an R package for quality assessment, analysis and testing of qPCR data

    PubMed Central

    Ahmed, Mahmoud

    2018-01-01

    Background Real-time quantitative PCR (qPCR) is a broadly used technique in the biomedical research. Currently, few different analysis models are used to determine the quality of data and to quantify the mRNA level across the experimental conditions. Methods We developed an R package to implement methods for quality assessment, analysis and testing qPCR data for statistical significance. Double Delta CT and standard curve models were implemented to quantify the relative expression of target genes from CT in standard qPCR control-group experiments. In addition, calculation of amplification efficiency and curves from serial dilution qPCR experiments are used to assess the quality of the data. Finally, two-group testing and linear models were used to test for significance of the difference in expression control groups and conditions of interest. Results Using two datasets from qPCR experiments, we applied different quality assessment, analysis and statistical testing in the pcr package and compared the results to the original published articles. The final relative expression values from the different models, as well as the intermediary outputs, were checked against the expected results in the original papers and were found to be accurate and reliable. Conclusion The pcr package provides an intuitive and unified interface for its main functions to allow biologist to perform all necessary steps of qPCR analysis and produce graphs in a uniform way. PMID:29576953

  16. Clonality: an R package for testing clonal relatedness of two tumors from the same patient based on their genomic profiles.

    PubMed

    Ostrovnaya, Irina; Seshan, Venkatraman E; Olshen, Adam B; Begg, Colin B

    2011-06-15

    If a cancer patient develops multiple tumors, it is sometimes impossible to determine whether these tumors are independent or clonal based solely on pathological characteristics. Investigators have studied how to improve this diagnostic challenge by comparing the presence of loss of heterozygosity (LOH) at selected genetic locations of tumor samples, or by comparing genomewide copy number array profiles. We have previously developed statistical methodology to compare such genomic profiles for an evidence of clonality. We assembled the software for these tests in a new R package called 'Clonality'. For LOH profiles, the package contains significance tests. The analysis of copy number profiles includes a likelihood ratio statistic and reference distribution, as well as an option to produce various plots that summarize the results. Bioconductor (http://bioconductor.org/packages/release/bioc/html/Clonality.html) and http://www.mskcc.org/mskcc/html/13287.cfm.

  17. regioneR: an R/Bioconductor package for the association analysis of genomic regions based on permutation tests.

    PubMed

    Gel, Bernat; Díez-Villanueva, Anna; Serra, Eduard; Buschbeck, Marcus; Peinado, Miguel A; Malinverni, Roberto

    2016-01-15

    Statistically assessing the relation between a set of genomic regions and other genomic features is a common challenging task in genomic and epigenomic analyses. Randomization based approaches implicitly take into account the complexity of the genome without the need of assuming an underlying statistical model. regioneR is an R package that implements a permutation test framework specifically designed to work with genomic regions. In addition to the predefined randomization and evaluation strategies, regioneR is fully customizable allowing the use of custom strategies to adapt it to specific questions. Finally, it also implements a novel function to evaluate the local specificity of the detected association. regioneR is an R package released under Artistic-2.0 License. The source code and documents are freely available through Bioconductor (http://www.bioconductor.org/packages/regioneR). rmalinverni@carrerasresearch.org. © The Author 2015. Published by Oxford University Press.

  18. Evaluation of Solid Rocket Motor Component Data Using a Commercially Available Statistical Software Package

    NASA Technical Reports Server (NTRS)

    Stefanski, Philip L.

    2015-01-01

    Commercially available software packages today allow users to quickly perform the routine evaluations of (1) descriptive statistics to numerically and graphically summarize both sample and population data, (2) inferential statistics that draws conclusions about a given population from samples taken of it, (3) probability determinations that can be used to generate estimates of reliability allowables, and finally (4) the setup of designed experiments and analysis of their data to identify significant material and process characteristics for application in both product manufacturing and performance enhancement. This paper presents examples of analysis and experimental design work that has been conducted using Statgraphics®(Registered Trademark) statistical software to obtain useful information with regard to solid rocket motor propellants and internal insulation material. Data were obtained from a number of programs (Shuttle, Constellation, and Space Launch System) and sources that include solid propellant burn rate strands, tensile specimens, sub-scale test motors, full-scale operational motors, rubber insulation specimens, and sub-scale rubber insulation analog samples. Besides facilitating the experimental design process to yield meaningful results, statistical software has demonstrated its ability to quickly perform complex data analyses and yield significant findings that might otherwise have gone unnoticed. One caveat to these successes is that useful results not only derive from the inherent power of the software package, but also from the skill and understanding of the data analyst.

  19. Clustangles: An Open Library for Clustering Angular Data.

    PubMed

    Sargsyan, Karen; Hua, Yun Hao; Lim, Carmay

    2015-08-24

    Dihedral angles are good descriptors of the numerous conformations visited by large, flexible systems, but their analysis requires directional statistics. A single package including the various multivariate statistical methods for angular data that accounts for the distinct topology of such data does not exist. Here, we present a lightweight standalone, operating-system independent package called Clustangles to fill this gap. Clustangles will be useful in analyzing the ever-increasing number of structures in the Protein Data Bank and clustering the copious conformations from increasingly long molecular dynamics simulations.

  20. PIVOT: platform for interactive analysis and visualization of transcriptomics data.

    PubMed

    Zhu, Qin; Fisher, Stephen A; Dueck, Hannah; Middleton, Sarah; Khaladkar, Mugdha; Kim, Junhyong

    2018-01-05

    Many R packages have been developed for transcriptome analysis but their use often requires familiarity with R and integrating results of different packages requires scripts to wrangle the datatypes. Furthermore, exploratory data analyses often generate multiple derived datasets such as data subsets or data transformations, which can be difficult to track. Here we present PIVOT, an R-based platform that wraps open source transcriptome analysis packages with a uniform user interface and graphical data management that allows non-programmers to interactively explore transcriptomics data. PIVOT supports more than 40 popular open source packages for transcriptome analysis and provides an extensive set of tools for statistical data manipulations. A graph-based visual interface is used to represent the links between derived datasets, allowing easy tracking of data versions. PIVOT further supports automatic report generation, publication-quality plots, and program/data state saving, such that all analysis can be saved, shared and reproduced. PIVOT will allow researchers with broad background to easily access sophisticated transcriptome analysis tools and interactively explore transcriptome datasets.

  1. Unified functional network and nonlinear time series analysis for complex systems science: The pyunicorn package

    NASA Astrophysics Data System (ADS)

    Donges, Jonathan; Heitzig, Jobst; Beronov, Boyan; Wiedermann, Marc; Runge, Jakob; Feng, Qing Yi; Tupikina, Liubov; Stolbova, Veronika; Donner, Reik; Marwan, Norbert; Dijkstra, Henk; Kurths, Jürgen

    2016-04-01

    We introduce the pyunicorn (Pythonic unified complex network and recurrence analysis toolbox) open source software package for applying and combining modern methods of data analysis and modeling from complex network theory and nonlinear time series analysis. pyunicorn is a fully object-oriented and easily parallelizable package written in the language Python. It allows for the construction of functional networks such as climate networks in climatology or functional brain networks in neuroscience representing the structure of statistical interrelationships in large data sets of time series and, subsequently, investigating this structure using advanced methods of complex network theory such as measures and models for spatial networks, networks of interacting networks, node-weighted statistics, or network surrogates. Additionally, pyunicorn provides insights into the nonlinear dynamics of complex systems as recorded in uni- and multivariate time series from a non-traditional perspective by means of recurrence quantification analysis, recurrence networks, visibility graphs, and construction of surrogate time series. The range of possible applications of the library is outlined, drawing on several examples mainly from the field of climatology. pyunicorn is available online at https://github.com/pik-copan/pyunicorn. Reference: J.F. Donges, J. Heitzig, B. Beronov, M. Wiedermann, J. Runge, Q.-Y. Feng, L. Tupikina, V. Stolbova, R.V. Donner, N. Marwan, H.A. Dijkstra, and J. Kurths, Unified functional network and nonlinear time series analysis for complex systems science: The pyunicorn package, Chaos 25, 113101 (2015), DOI: 10.1063/1.4934554, Preprint: arxiv.org:1507.01571 [physics.data-an].

  2. Statistical assessment on a combined analysis of GRYN-ROMN-UCBN upland vegetation vital signs

    USGS Publications Warehouse

    Irvine, Kathryn M.; Rodhouse, Thomas J.

    2014-01-01

    As of 2013, Rocky Mountain and Upper Columbia Basin Inventory and Monitoring Networks have multiple years of vegetation data and Greater Yellowstone Network has three years of vegetation data and monitoring is ongoing in all three networks. Our primary objective is to assess whether a combined analysis of these data aimed at exploring correlations with climate and weather data is feasible. We summarize the core survey design elements across protocols and point out the major statistical challenges for a combined analysis at present. The dissimilarity in response designs between ROMN and UCBN-GRYN network protocols presents a statistical challenge that has not been resolved yet. However, the UCBN and GRYN data are compatible as they implement a similar response design; therefore, a combined analysis is feasible and will be pursued in future. When data collected by different networks are combined, the survey design describing the merged dataset is (likely) a complex survey design. A complex survey design is the result of combining datasets from different sampling designs. A complex survey design is characterized by unequal probability sampling, varying stratification, and clustering (see Lohr 2010 Chapter 7 for general overview). Statistical analysis of complex survey data requires modifications to standard methods, one of which is to include survey design weights within a statistical model. We focus on this issue for a combined analysis of upland vegetation from these networks, leaving other topics for future research. We conduct a simulation study on the possible effects of equal versus unequal probability selection of points on parameter estimates of temporal trend using available packages within the R statistical computing package. We find that, as written, using lmer or lm for trend detection in a continuous response and clm and clmm for visually estimated cover classes with “raw” GRTS design weights specified for the weight argument leads to substantially different results and/or computational instability. However, when only fixed effects are of interest, the survey package (svyglm and svyolr) may be suitable for a model-assisted analysis for trend. We provide possible directions for future research into combined analysis for ordinal and continuous vital sign indictors.

  3. SSD for R: A Comprehensive Statistical Package to Analyze Single-System Data

    ERIC Educational Resources Information Center

    Auerbach, Charles; Schudrich, Wendy Zeitlin

    2013-01-01

    The need for statistical analysis in single-subject designs presents a challenge, as analytical methods that are applied to group comparison studies are often not appropriate in single-subject research. "SSD for R" is a robust set of statistical functions with wide applicability to single-subject research. It is a comprehensive package…

  4. MORTICIA, a statistical analysis software package for determining optical surveillance system effectiveness.

    NASA Astrophysics Data System (ADS)

    Ramkilowan, A.; Griffith, D. J.

    2017-10-01

    Surveillance modelling in terms of the standard Detect, Recognise and Identify (DRI) thresholds remains a key requirement for determining the effectiveness of surveillance sensors. With readily available computational resources it has become feasible to perform statistically representative evaluations of the effectiveness of these sensors. A new capability for performing this Monte-Carlo type analysis is demonstrated in the MORTICIA (Monte- Carlo Optical Rendering for Theatre Investigations of Capability under the Influence of the Atmosphere) software package developed at the Council for Scientific and Industrial Research (CSIR). This first generation, python-based open-source integrated software package, currently in the alpha stage of development aims to provide all the functionality required to perform statistical investigations of the effectiveness of optical surveillance systems in specific or generic deployment theatres. This includes modelling of the mathematical and physical processes that govern amongst other components of a surveillance system; a sensor's detector and optical components, a target and its background as well as the intervening atmospheric influences. In this paper we discuss integral aspects of the bespoke framework that are critical to the longevity of all subsequent modelling efforts. Additionally, some preliminary results are presented.

  5. Trend Analysis Using Microcomputers.

    ERIC Educational Resources Information Center

    Berger, Carl F.

    A trend analysis statistical package and additional programs for the Apple microcomputer are presented. They illustrate strategies of data analysis suitable to the graphics and processing capabilities of the microcomputer. The programs analyze data sets using examples of: (1) analysis of variance with multiple linear regression; (2) exponential…

  6. EvolQG - An R package for evolutionary quantitative genetics

    PubMed Central

    Melo, Diogo; Garcia, Guilherme; Hubbe, Alex; Assis, Ana Paula; Marroig, Gabriel

    2016-01-01

    We present an open source package for performing evolutionary quantitative genetics analyses in the R environment for statistical computing. Evolutionary theory shows that evolution depends critically on the available variation in a given population. When dealing with many quantitative traits this variation is expressed in the form of a covariance matrix, particularly the additive genetic covariance matrix or sometimes the phenotypic matrix, when the genetic matrix is unavailable and there is evidence the phenotypic matrix is sufficiently similar to the genetic matrix. Given this mathematical representation of available variation, the \\textbf{EvolQG} package provides functions for calculation of relevant evolutionary statistics; estimation of sampling error; corrections for this error; matrix comparison via correlations, distances and matrix decomposition; analysis of modularity patterns; and functions for testing evolutionary hypotheses on taxa diversification. PMID:27785352

  7. Information Technology.

    ERIC Educational Resources Information Center

    Marcum, Deanna; Boss, Richard

    1983-01-01

    Relates office automation to its application in libraries, discussing computer software packages for microcomputers performing tasks involved in word processing, accounting, statistical analysis, electronic filing cabinets, and electronic mail systems. (EJS)

  8. Introduction to Statistics. Learning Packages in the Policy Sciences Series, PS-26. Revised Edition.

    ERIC Educational Resources Information Center

    Policy Studies Associates, Croton-on-Hudson, NY.

    The primary objective of this booklet is to introduce students to basic statistical skills that are useful in the analysis of public policy data. A few, selected statistical methods are presented, and theory is not emphasized. Chapter 1 provides instruction for using tables, bar graphs, bar graphs with grouped data, trend lines, pie diagrams,…

  9. smwrData—An R package of example hydrologic data, version 1.1.1

    USGS Publications Warehouse

    Lorenz, David L.

    2015-11-06

    A collection of 24 datasets, including streamflow, well characteristics, groundwater elevations, and discrete water-quality concentrations, is provided to produce a consistent set of example data to demonstrate typical data manipulations or statistical analysis of hydrologic data. These example data are provided in an R package called smwrData. The data in the package have been collected by the U.S. Geological Survey or published in its reports, for example Helsel and Hirsch (2002). The R package provides a convenient mechanism for distributing the data to users of R within the U.S. Geological Survey and other users in the R community.

  10. Wood Products Analysis

    NASA Technical Reports Server (NTRS)

    1990-01-01

    Structural Reliability Consultants' computer program creates graphic plots showing the statistical parameters of glue laminated timbers, or 'glulam.' The company president, Dr. Joseph Murphy, read in NASA Tech Briefs about work related to analysis of Space Shuttle surface tile strength performed for Johnson Space Center by Rockwell International Corporation. Analysis led to a theory of 'consistent tolerance bounds' for statistical distributions, applicable in industrial testing where statistical analysis can influence product development and use. Dr. Murphy then obtained the Tech Support Package that covers the subject in greater detail. The TSP became the basis for Dr. Murphy's computer program PC-DATA, which he is marketing commercially.

  11. CellTree: an R/bioconductor package to infer the hierarchical structure of cell populations from single-cell RNA-seq data.

    PubMed

    duVerle, David A; Yotsukura, Sohiya; Nomura, Seitaro; Aburatani, Hiroyuki; Tsuda, Koji

    2016-09-13

    Single-cell RNA sequencing is fast becoming one the standard method for gene expression measurement, providing unique insights into cellular processes. A number of methods, based on general dimensionality reduction techniques, have been suggested to help infer and visualise the underlying structure of cell populations from single-cell expression levels, yet their models generally lack proper biological grounding and struggle at identifying complex differentiation paths. Here we introduce cellTree: an R/Bioconductor package that uses a novel statistical approach, based on document analysis techniques, to produce tree structures outlining the hierarchical relationship between single-cell samples, while identifying latent groups of genes that can provide biological insights. With cellTree, we provide experimentalists with an easy-to-use tool, based on statistically and biologically-sound algorithms, to efficiently explore and visualise single-cell RNA data. The cellTree package is publicly available in the online Bionconductor repository at: http://bioconductor.org/packages/cellTree/ .

  12. Using R to implement spatial analysis in open source environment

    NASA Astrophysics Data System (ADS)

    Shao, Yixi; Chen, Dong; Zhao, Bo

    2007-06-01

    R is an open source (GPL) language and environment for spatial analysis, statistical computing and graphics which provides a wide variety of statistical and graphical techniques, and is highly extensible. In the Open Source environment it plays an important role in doing spatial analysis. So, to implement spatial analysis in the Open Source environment which we called the Open Source geocomputation is using the R data analysis language integrated with GRASS GIS and MySQL or PostgreSQL. This paper explains the architecture of the Open Source GIS environment and emphasizes the role R plays in the aspect of spatial analysis. Furthermore, one apt illustration of the functions of R is given in this paper through the project of constructing CZPGIS (Cheng Zhou Population GIS) supported by Changzhou Government, China. In this project we use R to implement the geostatistics in the Open Source GIS environment to evaluate the spatial correlation of land price and estimate it by Kriging Interpolation. We also use R integrated with MapServer and php to show how R and other Open Source software cooperate with each other in WebGIS environment, which represents the advantages of using R to implement spatial analysis in Open Source GIS environment. And in the end, we points out that the packages for spatial analysis in R is still scattered and the limited memory is still a bottleneck when large sum of clients connect at the same time. Therefore further work is to group the extensive packages in order or design normative packages and make R cooperate better with other commercial software such as ArcIMS. Also we look forward to developing packages for land price evaluation.

  13. MNE software for processing MEG and EEG data

    PubMed Central

    Gramfort, A.; Luessi, M.; Larson, E.; Engemann, D.; Strohmeier, D.; Brodbeck, C.; Parkkonen, L.; Hämäläinen, M.

    2013-01-01

    Magnetoencephalography and electroencephalography (M/EEG) measure the weak electromagnetic signals originating from neural currents in the brain. Using these signals to characterize and locate brain activity is a challenging task, as evidenced by several decades of methodological contributions. MNE, whose name stems from its capability to compute cortically-constrained minimum-norm current estimates from M/EEG data, is a software package that provides comprehensive analysis tools and workflows including preprocessing, source estimation, time–frequency analysis, statistical analysis, and several methods to estimate functional connectivity between distributed brain regions. The present paper gives detailed information about the MNE package and describes typical use cases while also warning about potential caveats in analysis. The MNE package is a collaborative effort of multiple institutes striving to implement and share best methods and to facilitate distribution of analysis pipelines to advance reproducibility of research. Full documentation is available at http://martinos.org/mne. PMID:24161808

  14. Report: Scientific Software.

    ERIC Educational Resources Information Center

    Borman, Stuart A.

    1985-01-01

    Discusses various aspects of scientific software, including evaluation and selection of commercial software products; program exchanges, catalogs, and other information sources; major data analysis packages; statistics and chemometrics software; and artificial intelligence. (JN)

  15. Integration of modern statistical tools for the analysis of climate extremes into the web-GIS “CLIMATE”

    NASA Astrophysics Data System (ADS)

    Ryazanova, A. A.; Okladnikov, I. G.; Gordov, E. P.

    2017-11-01

    The frequency of occurrence and magnitude of precipitation and temperature extreme events show positive trends in several geographical regions. These events must be analyzed and studied in order to better understand their impact on the environment, predict their occurrences, and mitigate their effects. For this purpose, we augmented web-GIS called “CLIMATE” to include a dedicated statistical package developed in the R language. The web-GIS “CLIMATE” is a software platform for cloud storage processing and visualization of distributed archives of spatial datasets. It is based on a combined use of web and GIS technologies with reliable procedures for searching, extracting, processing, and visualizing the spatial data archives. The system provides a set of thematic online tools for the complex analysis of current and future climate changes and their effects on the environment. The package includes new powerful methods of time-dependent statistics of extremes, quantile regression and copula approach for the detailed analysis of various climate extreme events. Specifically, the very promising copula approach allows obtaining the structural connections between the extremes and the various environmental characteristics. The new statistical methods integrated into the web-GIS “CLIMATE” can significantly facilitate and accelerate the complex analysis of climate extremes using only a desktop PC connected to the Internet.

  16. A Novel Method to Decontaminate Surgical Instruments for Operational and Austere Environments.

    PubMed

    Knox, Randy W; Demons, Samandra T; Cunningham, Cord W

    2015-12-01

    The purpose of this investigation was to test a field-expedient, cost-effective method to decontaminate, sterilize, and package surgical instruments in an operational (combat) or austere environment using chlorhexidine sponges, ultraviolet C (UVC) light, and commercially available vacuum sealing. This was a bench study of 4 experimental groups and 1 control group of 120 surgical instruments. Experimental groups were inoculated with a 10(6) concentration of common wound bacteria. The control group was vacuum sealed without inoculum. Groups 1, 2, and 3 were first scrubbed with a chlorhexidine sponge, rinsed, and dried. Group 1 was then packaged; group 2 was irradiated with UVC light, then packaged; group 3 was packaged, then irradiated with UVC light through the bag; and group 4 was packaged without chlorhexidine scrubbing or UVC irradiation. The UVC was not tested by itself, as it does not grossly clean. The instruments were stored overnight and tested for remaining colony forming units (CFU). Data analysis was conducted using analysis of variance and group comparisons using the Tukey method. Group 4 CFU was statistically greater (P < .001) than the control group and groups 1 through 3. There was no statistically significant difference between the control group and groups 1 through 3. Vacuum sealing of chlorhexidine-scrubbed contaminated instruments with and without handheld UVC irradiation appears to be an acceptable method of field decontamination. Chlorhexidine scrubbing alone achieved a 99.9% reduction in CFU, whereas adding UVC before packaging achieved sterilization or 100% reduction in CFU, and UVC through the bag achieved disinfection. Published by Elsevier Inc.

  17. Some Experience with Interactive Computing in Teaching Introductory Statistics.

    ERIC Educational Resources Information Center

    Diegert, Carl

    Students in two biostatistics courses at the Cornell Medical College and in a course in applications of computer science given in Cornell's School of Industrial Engineering were given access to an interactive package of computer programs enabling them to perform statistical analysis without the burden of hand computation. After a general…

  18. Integrating Statistical Visualization Research into the Political Science Classroom

    ERIC Educational Resources Information Center

    Draper, Geoffrey M.; Liu, Baodong; Riesenfeld, Richard F.

    2011-01-01

    The use of computer software to facilitate learning in political science courses is well established. However, the statistical software packages used in many political science courses can be difficult to use and counter-intuitive. We describe the results of a preliminary user study suggesting that visually-oriented analysis software can help…

  19. Data Processing System (DPS) software with experimental design, statistical analysis and data mining developed for use in entomological research.

    PubMed

    Tang, Qi-Yi; Zhang, Chuan-Xi

    2013-04-01

    A comprehensive but simple-to-use software package called DPS (Data Processing System) has been developed to execute a range of standard numerical analyses and operations used in experimental design, statistics and data mining. This program runs on standard Windows computers. Many of the functions are specific to entomological and other biological research and are not found in standard statistical software. This paper presents applications of DPS to experimental design, statistical analysis and data mining in entomology. © 2012 The Authors Insect Science © 2012 Institute of Zoology, Chinese Academy of Sciences.

  20. Using SPSS to Analyze Book Collection Data.

    ERIC Educational Resources Information Center

    Townley, Charles T.

    1981-01-01

    Describes and illustrates Statistical Package for the Social Sciences (SPSS) procedures appropriate for book collection data analysis. Several different procedures for univariate, bivariate, and multivariate analysis are discussed, and applications of procedures for book collection studies are presented. Included are 24 tables illustrating output…

  1. Prototyping with Data Dictionaries for Requirements Analysis.

    DTIC Science & Technology

    1985-03-01

    statistical packages and software for screen layout. These items work at a higher level than another category of prototyping tool, program generators... Program generators are software packages which, when given specifications, produce source listings, usually in a high order language such as COBCL...with users and this will not happen if he must stop to develcp a detailed program . [Ref. 241] Hardware as well as software should be considered in

  2. esATAC: An Easy-to-use Systematic pipeline for ATAC-seq data analysis.

    PubMed

    Wei, Zheng; Zhang, Wei; Fang, Huan; Li, Yanda; Wang, Xiaowo

    2018-03-07

    ATAC-seq is rapidly emerging as one of the major experimental approaches to probe chromatin accessibility genome-wide. Here, we present "esATAC", a highly integrated easy-to-use R/Bioconductor package, for systematic ATAC-seq data analysis. It covers essential steps for full analyzing procedure, including raw data processing, quality control and downstream statistical analysis such as peak calling, enrichment analysis and transcription factor footprinting. esATAC supports one command line execution for preset pipelines, and provides flexible interfaces for building customized pipelines. esATAC package is open source under the GPL-3.0 license. It is implemented in R and C ++. Source code and binaries for Linux, MAC OS X and Windows are available through Bioconductor https://www.bioconductor.org/packages/release/bioc/html/esATAC.html). xwwang@tsinghua.edu.cn. Supplementary data are available at Bioinformatics online.

  3. deltaGseg: macrostate estimation via molecular dynamics simulations and multiscale time series analysis.

    PubMed

    Low, Diana H P; Motakis, Efthymios

    2013-10-01

    Binding free energy calculations obtained through molecular dynamics simulations reflect intermolecular interaction states through a series of independent snapshots. Typically, the free energies of multiple simulated series (each with slightly different starting conditions) need to be estimated. Previous approaches carry out this task by moving averages at certain decorrelation times, assuming that the system comes from a single conformation description of binding events. Here, we discuss a more general approach that uses statistical modeling, wavelets denoising and hierarchical clustering to estimate the significance of multiple statistically distinct subpopulations, reflecting potential macrostates of the system. We present the deltaGseg R package that performs macrostate estimation from multiple replicated series and allows molecular biologists/chemists to gain physical insight into the molecular details that are not easily accessible by experimental techniques. deltaGseg is a Bioconductor R package available at http://bioconductor.org/packages/release/bioc/html/deltaGseg.html.

  4. Human Deception Detection from Whole Body Motion Analysis

    DTIC Science & Technology

    2015-12-01

    9.3.2. Prediction Probability The output reports from SPSS detail the stepwise procedures for each series of analyses using Wald statistic values for... statistical significance in determining replication, but instead used a combination of significance and direction of means to determine partial or...and the independents need not be unbound. All data were analyzed utilizing the Statistical Package for Social Sciences ( SPSS , v.19.0, Chicago, IL

  5. An automated normative-based fluorodeoxyglucose positron emission tomography image-analysis procedure to aid Alzheimer disease diagnosis using statistical parametric mapping and interactive image display

    NASA Astrophysics Data System (ADS)

    Chen, Kewei; Ge, Xiaolin; Yao, Li; Bandy, Dan; Alexander, Gene E.; Prouty, Anita; Burns, Christine; Zhao, Xiaojie; Wen, Xiaotong; Korn, Ronald; Lawson, Michael; Reiman, Eric M.

    2006-03-01

    Having approved fluorodeoxyglucose positron emission tomography (FDG PET) for the diagnosis of Alzheimer's disease (AD) in some patients, the Centers for Medicare and Medicaid Services suggested the need to develop and test analysis techniques to optimize diagnostic accuracy. We developed an automated computer package comparing an individual's FDG PET image to those of a group of normal volunteers. The normal control group includes FDG-PET images from 82 cognitively normal subjects, 61.89+/-5.67 years of age, who were characterized demographically, clinically, neuropsychologically, and by their apolipoprotein E genotype (known to be associated with a differential risk for AD). In addition, AD-affected brain regions functionally defined as based on a previous study (Alexander, et al, Am J Psychiatr, 2002) were also incorporated. Our computer package permits the user to optionally select control subjects, matching the individual patient for gender, age, and educational level. It is fully streamlined to require minimal user intervention. With one mouse click, the program runs automatically, normalizing the individual patient image, setting up a design matrix for comparing the single subject to a group of normal controls, performing the statistics, calculating the glucose reduction overlap index of the patient with the AD-affected brain regions, and displaying the findings in reference to the AD regions. In conclusion, the package automatically contrasts a single patient to a normal subject database using sound statistical procedures. With further validation, this computer package could be a valuable tool to assist physicians in decision making and communicating findings with patients and patient families.

  6. From reads to genes to pathways: differential expression analysis of RNA-Seq experiments using Rsubread and the edgeR quasi-likelihood pipeline.

    PubMed

    Chen, Yunshun; Lun, Aaron T L; Smyth, Gordon K

    2016-01-01

    In recent years, RNA sequencing (RNA-seq) has become a very widely used technology for profiling gene expression. One of the most common aims of RNA-seq profiling is to identify genes or molecular pathways that are differentially expressed (DE) between two or more biological conditions. This article demonstrates a computational workflow for the detection of DE genes and pathways from RNA-seq data by providing a complete analysis of an RNA-seq experiment profiling epithelial cell subsets in the mouse mammary gland. The workflow uses R software packages from the open-source Bioconductor project and covers all steps of the analysis pipeline, including alignment of read sequences, data exploration, differential expression analysis, visualization and pathway analysis. Read alignment and count quantification is conducted using the Rsubread package and the statistical analyses are performed using the edgeR package. The differential expression analysis uses the quasi-likelihood functionality of edgeR.

  7. AOP: An R Package For Sufficient Causal Analysis in Pathway ...

    EPA Pesticide Factsheets

    Summary: How can I quickly find the key events in a pathway that I need to monitor to predict that a/an beneficial/adverse event/outcome will occur? This is a key question when using signaling pathways for drug/chemical screening in pharma-cology, toxicology and risk assessment. By identifying these sufficient causal key events, we have fewer events to monitor for a pathway, thereby decreasing assay costs and time, while maximizing the value of the information. I have developed the “aop” package which uses backdoor analysis of causal net-works to identify these minimal sets of key events that are suf-ficient for making causal predictions. Availability and Implementation: The source and binary are available online through the Bioconductor project (http://www.bioconductor.org/) as an R package titled “aop”. The R/Bioconductor package runs within the R statistical envi-ronment. The package has functions that can take pathways (as directed graphs) formatted as a Cytoscape JSON file as input, or pathways can be represented as directed graphs us-ing the R/Bioconductor “graph” package. The “aop” package has functions that can perform backdoor analysis to identify the minimal set of key events for making causal predictions.Contact: burgoon.lyle@epa.gov This paper describes an R/Bioconductor package that was developed to facilitate the identification of key events within an AOP that are the minimal set of sufficient key events that need to be tested/monit

  8. A Survey of Popular R Packages for Cluster Analysis

    ERIC Educational Resources Information Center

    Flynt, Abby; Dean, Nema

    2016-01-01

    Cluster analysis is a set of statistical methods for discovering new group/class structure when exploring data sets. This article reviews the following popular libraries/commands in the R software language for applying different types of cluster analysis: from the stats library, the kmeans, and hclust functions; the mclust library; the poLCA…

  9. micromap: A Package for Linked Micromaps

    EPA Science Inventory

    The R package micromap is used to create linked micromaps, which display statistical summaries associated with areal units, or polygons. Linked micromaps provide a means to simultaneously summarize and display both statistical and geographic distributions by linking statistical ...

  10. MDTraj: A Modern Open Library for the Analysis of Molecular Dynamics Trajectories.

    PubMed

    McGibbon, Robert T; Beauchamp, Kyle A; Harrigan, Matthew P; Klein, Christoph; Swails, Jason M; Hernández, Carlos X; Schwantes, Christian R; Wang, Lee-Ping; Lane, Thomas J; Pande, Vijay S

    2015-10-20

    As molecular dynamics (MD) simulations continue to evolve into powerful computational tools for studying complex biomolecular systems, the necessity of flexible and easy-to-use software tools for the analysis of these simulations is growing. We have developed MDTraj, a modern, lightweight, and fast software package for analyzing MD simulations. MDTraj reads and writes trajectory data in a wide variety of commonly used formats. It provides a large number of trajectory analysis capabilities including minimal root-mean-square-deviation calculations, secondary structure assignment, and the extraction of common order parameters. The package has a strong focus on interoperability with the wider scientific Python ecosystem, bridging the gap between MD data and the rapidly growing collection of industry-standard statistical analysis and visualization tools in Python. MDTraj is a powerful and user-friendly software package that simplifies the analysis of MD data and connects these datasets with the modern interactive data science software ecosystem in Python. Copyright © 2015 Biophysical Society. Published by Elsevier Inc. All rights reserved.

  11. SNPassoc: an R package to perform whole genome association studies.

    PubMed

    González, Juan R; Armengol, Lluís; Solé, Xavier; Guinó, Elisabet; Mercader, Josep M; Estivill, Xavier; Moreno, Víctor

    2007-03-01

    The popularization of large-scale genotyping projects has led to the widespread adoption of genetic association studies as the tool of choice in the search for single nucleotide polymorphisms (SNPs) underlying susceptibility to complex diseases. Although the analysis of individual SNPs is a relatively trivial task, when the number is large and multiple genetic models need to be explored it becomes necessary a tool to automate the analyses. In order to address this issue, we developed SNPassoc, an R package to carry out most common analyses in whole genome association studies. These analyses include descriptive statistics and exploratory analysis of missing values, calculation of Hardy-Weinberg equilibrium, analysis of association based on generalized linear models (either for quantitative or binary traits), and analysis of multiple SNPs (haplotype and epistasis analysis). Package SNPassoc is available at CRAN from http://cran.r-project.org. A tutorial is available on Bioinformatics online and in http://davinci.crg.es/estivill_lab/snpassoc.

  12. MDTraj: A Modern Open Library for the Analysis of Molecular Dynamics Trajectories

    PubMed Central

    McGibbon, Robert T.; Beauchamp, Kyle A.; Harrigan, Matthew P.; Klein, Christoph; Swails, Jason M.; Hernández, Carlos X.; Schwantes, Christian R.; Wang, Lee-Ping; Lane, Thomas J.; Pande, Vijay S.

    2015-01-01

    As molecular dynamics (MD) simulations continue to evolve into powerful computational tools for studying complex biomolecular systems, the necessity of flexible and easy-to-use software tools for the analysis of these simulations is growing. We have developed MDTraj, a modern, lightweight, and fast software package for analyzing MD simulations. MDTraj reads and writes trajectory data in a wide variety of commonly used formats. It provides a large number of trajectory analysis capabilities including minimal root-mean-square-deviation calculations, secondary structure assignment, and the extraction of common order parameters. The package has a strong focus on interoperability with the wider scientific Python ecosystem, bridging the gap between MD data and the rapidly growing collection of industry-standard statistical analysis and visualization tools in Python. MDTraj is a powerful and user-friendly software package that simplifies the analysis of MD data and connects these datasets with the modern interactive data science software ecosystem in Python. PMID:26488642

  13. R package to estimate intracluster correlation coefficient with confidence interval for binary data.

    PubMed

    Chakraborty, Hrishikesh; Hossain, Akhtar

    2018-03-01

    The Intracluster Correlation Coefficient (ICC) is a major parameter of interest in cluster randomized trials that measures the degree to which responses within the same cluster are correlated. There are several types of ICC estimators and its confidence intervals (CI) suggested in the literature for binary data. Studies have compared relative weaknesses and advantages of ICC estimators as well as its CI for binary data and suggested situations where one is advantageous in practical research. The commonly used statistical computing systems currently facilitate estimation of only a very few variants of ICC and its CI. To address the limitations of current statistical packages, we developed an R package, ICCbin, to facilitate estimating ICC and its CI for binary responses using different methods. The ICCbin package is designed to provide estimates of ICC in 16 different ways including analysis of variance methods, moments based estimation, direct probabilistic methods, correlation based estimation, and resampling method. CI of ICC is estimated using 5 different methods. It also generates cluster binary data using exchangeable correlation structure. ICCbin package provides two functions for users. The function rcbin() generates cluster binary data and the function iccbin() estimates ICC and it's CI. The users can choose appropriate ICC and its CI estimate from the wide selection of estimates from the outputs. The R package ICCbin presents very flexible and easy to use ways to generate cluster binary data and to estimate ICC and it's CI for binary response using different methods. The package ICCbin is freely available for use with R from the CRAN repository (https://cran.r-project.org/package=ICCbin). We believe that this package can be a very useful tool for researchers to design cluster randomized trials with binary outcome. Copyright © 2017 Elsevier B.V. All rights reserved.

  14. MPTinR: analysis of multinomial processing tree models in R.

    PubMed

    Singmann, Henrik; Kellen, David

    2013-06-01

    We introduce MPTinR, a software package developed for the analysis of multinomial processing tree (MPT) models. MPT models represent a prominent class of cognitive measurement models for categorical data with applications in a wide variety of fields. MPTinR is the first software for the analysis of MPT models in the statistical programming language R, providing a modeling framework that is more flexible than standalone software packages. MPTinR also introduces important features such as (1) the ability to calculate the Fisher information approximation measure of model complexity for MPT models, (2) the ability to fit models for categorical data outside the MPT model class, such as signal detection models, (3) a function for model selection across a set of nested and nonnested candidate models (using several model selection indices), and (4) multicore fitting. MPTinR is available from the Comprehensive R Archive Network at http://cran.r-project.org/web/packages/MPTinR/ .

  15. LFSTAT - Low-Flow Analysis in R

    NASA Astrophysics Data System (ADS)

    Koffler, Daniel; Laaha, Gregor

    2013-04-01

    The calculation of characteristic stream flow during dry conditions is a basic requirement for many problems in hydrology, ecohydrology and water resources management. As opposed to floods, a number of different indices are used to characterise low flows and streamflow droughts. Although these indices and methods of calculation have been well documented in the WMO Manual on Low-flow Estimation and Prediction [1], a comprehensive software was missing which enables a fast and standardized calculation of low flow statistics. We present the new software package lfstat to fill in this obvious gap. Our software package is based on the statistical open source software R, and expands it to analyse daily stream flow data records focusing on low-flows. As command-line based programs are not everyone's preference, we also offer a plug-in for the R-Commander, an easy to use graphical user interface (GUI) provided for R which is based on tcl/tk. The functionality of lfstat includes estimation methods for low-flow indices, extreme value statistics, deficit characteristics, and additional graphical methods to control the computation of complex indices and to illustrate the data. Beside the basic low flow indices, the baseflow index and recession constants can be computed. For extreme value statistics, state-of-the-art methods for L-moment based local and regional frequency analysis (RFA) are available. The tools for deficit characteristics include various pooling and threshold selection methods to support the calculation of drought duration and deficit indices. The most common graphics for low flow analysis are available, and the plots can be modified according to the user preferences. Graphics include hydrographs for different periods, flexible streamflow deficit plots, baseflow visualisation, recession diagnostic, flow duration curves as well as double mass curves, and many more. From a technical point of view, the package uses a S3-class called lfobj (low-flow objects). This objects are usual R-data-frames including date, flow, hydrological year and possibly baseflow information. Once these objects are created, analysis can be performed by mouse-click and a script can be saved to make the analysis easily reproducible. At the moment we are offering implementation of all major methods proposed in the WMO manual on Low-flow Estimation and Predictions [1]. Future plans include a dynamic low flow report in odt-file format using odf-weave which allows automatic updates if data or analysis change. We hope to offer a tool to ease and structure the analysis of stream flow data focusing on low-flows and to make analysis transparent and communicable. The package can also be used in teaching students the first steps in low-flow hydrology. The software packages can be installed from CRAN (latest stable) and R-Forge: http://r-forge.r-project.org (development version). References: [1] Gustard, Alan; Demuth, Siegfried, (eds.) Manual on Low-flow Estimation and Prediction. Geneva, Switzerland, World Meteorological Organization, (Operational Hydrology Report No. 50, WMO-No. 1029).

  16. An open-source software package for multivariate modeling and clustering: applications to air quality management.

    PubMed

    Wang, Xiuquan; Huang, Guohe; Zhao, Shan; Guo, Junhong

    2015-09-01

    This paper presents an open-source software package, rSCA, which is developed based upon a stepwise cluster analysis method and serves as a statistical tool for modeling the relationships between multiple dependent and independent variables. The rSCA package is efficient in dealing with both continuous and discrete variables, as well as nonlinear relationships between the variables. It divides the sample sets of dependent variables into different subsets (or subclusters) through a series of cutting and merging operations based upon the theory of multivariate analysis of variance (MANOVA). The modeling results are given by a cluster tree, which includes both intermediate and leaf subclusters as well as the flow paths from the root of the tree to each leaf subcluster specified by a series of cutting and merging actions. The rSCA package is a handy and easy-to-use tool and is freely available at http://cran.r-project.org/package=rSCA . By applying the developed package to air quality management in an urban environment, we demonstrate its effectiveness in dealing with the complicated relationships among multiple variables in real-world problems.

  17. The Equivalence of Three Statistical Packages for Performing Hierarchical Cluster Analysis

    ERIC Educational Resources Information Center

    Blashfield, Roger

    1977-01-01

    Three different software programs which contain hierarchical agglomerative cluster analysis procedures were shown to generate different solutions on the same data set using apparently the same options. The basis for the differences in the solutions was the formulae used to calculate Euclidean distance. (Author/JKS)

  18. Cellular Consequences of Telomere Shortening in Histologically Normal Breast Tissues

    DTIC Science & Technology

    2013-09-01

    using the open source, JAVA -based image analysis software package ImageJ (http://rsb.info.nih.gov/ij/) and a custom designed plugin (“Telometer...Tabulated data were stored in a MySQL (http://www.mysql.com) database and viewed through Microsoft Access (Microsoft Corp.). Statistical Analysis For

  19. Effect of Table Tennis Trainings on Biomotor Capacities in Boys

    ERIC Educational Resources Information Center

    Tas, Murat

    2017-01-01

    The aim of this study is to investigate whether the biomotor capacities of boys doing table tennis trainings are affected. A total of 40 students, as randomly selected 20 test groups and 20 control groups at an age range of 10-12 participated in the research. Statistical analysis of data was performed using Statistic Package for Social Science…

  20. rpsftm: An R Package for Rank Preserving Structural Failure Time Models

    PubMed Central

    Allison, Annabel; White, Ian R; Bond, Simon

    2018-01-01

    Treatment switching in a randomised controlled trial occurs when participants change from their randomised treatment to the other trial treatment during the study. Failure to account for treatment switching in the analysis (i.e. by performing a standard intention-to-treat analysis) can lead to biased estimates of treatment efficacy. The rank preserving structural failure time model (RPSFTM) is a method used to adjust for treatment switching in trials with survival outcomes. The RPSFTM is due to Robins and Tsiatis (1991) and has been developed by White et al. (1997, 1999). The method is randomisation based and uses only the randomised treatment group, observed event times, and treatment history in order to estimate a causal treatment effect. The treatment effect, ψ, is estimated by balancing counter-factual event times (that would be observed if no treatment were received) between treatment groups. G-estimation is used to find the value of ψ such that a test statistic Z(ψ) = 0. This is usually the test statistic used in the intention-to-treat analysis, for example, the log rank test statistic. We present an R package that implements the method of rpsftm. PMID:29564164

  1. rpsftm: An R Package for Rank Preserving Structural Failure Time Models.

    PubMed

    Allison, Annabel; White, Ian R; Bond, Simon

    2017-12-04

    Treatment switching in a randomised controlled trial occurs when participants change from their randomised treatment to the other trial treatment during the study. Failure to account for treatment switching in the analysis (i.e. by performing a standard intention-to-treat analysis) can lead to biased estimates of treatment efficacy. The rank preserving structural failure time model (RPSFTM) is a method used to adjust for treatment switching in trials with survival outcomes. The RPSFTM is due to Robins and Tsiatis (1991) and has been developed by White et al. (1997, 1999). The method is randomisation based and uses only the randomised treatment group, observed event times, and treatment history in order to estimate a causal treatment effect. The treatment effect, ψ , is estimated by balancing counter-factual event times (that would be observed if no treatment were received) between treatment groups. G-estimation is used to find the value of ψ such that a test statistic Z ( ψ ) = 0. This is usually the test statistic used in the intention-to-treat analysis, for example, the log rank test statistic. We present an R package that implements the method of rpsftm.

  2. VoxelStats: A MATLAB Package for Multi-Modal Voxel-Wise Brain Image Analysis.

    PubMed

    Mathotaarachchi, Sulantha; Wang, Seqian; Shin, Monica; Pascoal, Tharick A; Benedet, Andrea L; Kang, Min Su; Beaudry, Thomas; Fonov, Vladimir S; Gauthier, Serge; Labbe, Aurélie; Rosa-Neto, Pedro

    2016-01-01

    In healthy individuals, behavioral outcomes are highly associated with the variability on brain regional structure or neurochemical phenotypes. Similarly, in the context of neurodegenerative conditions, neuroimaging reveals that cognitive decline is linked to the magnitude of atrophy, neurochemical declines, or concentrations of abnormal protein aggregates across brain regions. However, modeling the effects of multiple regional abnormalities as determinants of cognitive decline at the voxel level remains largely unexplored by multimodal imaging research, given the high computational cost of estimating regression models for every single voxel from various imaging modalities. VoxelStats is a voxel-wise computational framework to overcome these computational limitations and to perform statistical operations on multiple scalar variables and imaging modalities at the voxel level. VoxelStats package has been developed in Matlab(®) and supports imaging formats such as Nifti-1, ANALYZE, and MINC v2. Prebuilt functions in VoxelStats enable the user to perform voxel-wise general and generalized linear models and mixed effect models with multiple volumetric covariates. Importantly, VoxelStats can recognize scalar values or image volumes as response variables and can accommodate volumetric statistical covariates as well as their interaction effects with other variables. Furthermore, this package includes built-in functionality to perform voxel-wise receiver operating characteristic analysis and paired and unpaired group contrast analysis. Validation of VoxelStats was conducted by comparing the linear regression functionality with existing toolboxes such as glim_image and RMINC. The validation results were identical to existing methods and the additional functionality was demonstrated by generating feature case assessments (t-statistics, odds ratio, and true positive rate maps). In summary, VoxelStats expands the current methods for multimodal imaging analysis by allowing the estimation of advanced regional association metrics at the voxel level.

  3. R-Based Software for the Integration of Pathway Data into Bioinformatic Algorithms

    PubMed Central

    Kramer, Frank; Bayerlová, Michaela; Beißbarth, Tim

    2014-01-01

    Putting new findings into the context of available literature knowledge is one approach to deal with the surge of high-throughput data results. Furthermore, prior knowledge can increase the performance and stability of bioinformatic algorithms, for example, methods for network reconstruction. In this review, we examine software packages for the statistical computing framework R, which enable the integration of pathway data for further bioinformatic analyses. Different approaches to integrate and visualize pathway data are identified and packages are stratified concerning their features according to a number of different aspects: data import strategies, the extent of available data, dependencies on external tools, integration with further analysis steps and visualization options are considered. A total of 12 packages integrating pathway data are reviewed in this manuscript. These are supplemented by five R-specific packages for visualization and six connector packages, which provide access to external tools. PMID:24833336

  4. Analyzing longitudinal data with the linear mixed models procedure in SPSS.

    PubMed

    West, Brady T

    2009-09-01

    Many applied researchers analyzing longitudinal data share a common misconception: that specialized statistical software is necessary to fit hierarchical linear models (also known as linear mixed models [LMMs], or multilevel models) to longitudinal data sets. Although several specialized statistical software programs of high quality are available that allow researchers to fit these models to longitudinal data sets (e.g., HLM), rapid advances in general purpose statistical software packages have recently enabled analysts to fit these same models when using preferred packages that also enable other more common analyses. One of these general purpose statistical packages is SPSS, which includes a very flexible and powerful procedure for fitting LMMs to longitudinal data sets with continuous outcomes. This article aims to present readers with a practical discussion of how to analyze longitudinal data using the LMMs procedure in the SPSS statistical software package.

  5. New generation of exploration tools: interactive modeling software and microcomputers

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Krajewski, S.A.

    1986-08-01

    Software packages offering interactive modeling techniques are now available for use on microcomputer hardware systems. These packages are reasonably priced for both company and independent explorationists; they do not require users to have high levels of computer literacy; they are capable of rapidly completing complex ranges of sophisticated geologic and geophysical modeling tasks; and they can produce presentation-quality output for comparison with real-world data. For example, interactive packages are available for mapping, log analysis, seismic modeling, reservoir studies, and financial projects as well as for applying a variety of statistical and geostatistical techniques to analysis of exploration data. More importantly,more » these packages enable explorationists to directly apply their geologic expertise when developing and fine-tuning models for identifying new prospects and for extending producing fields. As a result of these features, microcomputers and interactive modeling software are becoming common tools in many exploration offices. Gravity and magnetics software programs illustrate some of the capabilities of such exploration tools.« less

  6. SuperSegger: robust image segmentation, analysis and lineage tracking of bacterial cells.

    PubMed

    Stylianidou, Stella; Brennan, Connor; Nissen, Silas B; Kuwada, Nathan J; Wiggins, Paul A

    2016-11-01

    Many quantitative cell biology questions require fast yet reliable automated image segmentation to identify and link cells from frame-to-frame, and characterize the cell morphology and fluorescence. We present SuperSegger, an automated MATLAB-based image processing package well-suited to quantitative analysis of high-throughput live-cell fluorescence microscopy of bacterial cells. SuperSegger incorporates machine-learning algorithms to optimize cellular boundaries and automated error resolution to reliably link cells from frame-to-frame. Unlike existing packages, it can reliably segment microcolonies with many cells, facilitating the analysis of cell-cycle dynamics in bacteria as well as cell-contact mediated phenomena. This package has a range of built-in capabilities for characterizing bacterial cells, including the identification of cell division events, mother, daughter and neighbouring cells, and computing statistics on cellular fluorescence, the location and intensity of fluorescent foci. SuperSegger provides a variety of postprocessing data visualization tools for single cell and population level analysis, such as histograms, kymographs, frame mosaics, movies and consensus images. Finally, we demonstrate the power of the package by analyzing lag phase growth with single cell resolution. © 2016 John Wiley & Sons Ltd.

  7. Lessons learned from IDeAl - 33 recommendations from the IDeAl-net about design and analysis of small population clinical trials.

    PubMed

    Hilgers, Ralf-Dieter; Bogdan, Malgorzata; Burman, Carl-Fredrik; Dette, Holger; Karlsson, Mats; König, Franz; Male, Christoph; Mentré, France; Molenberghs, Geert; Senn, Stephen

    2018-05-11

    IDeAl (Integrated designs and analysis of small population clinical trials) is an EU funded project developing new statistical design and analysis methodologies for clinical trials in small population groups. Here we provide an overview of IDeAl findings and give recommendations to applied researchers. The description of the findings is broken down by the nine scientific IDeAl work packages and summarizes results from the project's more than 60 publications to date in peer reviewed journals. In addition, we applied text mining to evaluate the publications and the IDeAl work packages' output in relation to the design and analysis terms derived from in the IRDiRC task force report on small population clinical trials. The results are summarized, describing the developments from an applied viewpoint. The main result presented here are 33 practical recommendations drawn from the work, giving researchers a comprehensive guidance to the improved methodology. In particular, the findings will help design and analyse efficient clinical trials in rare diseases with limited number of patients available. We developed a network representation relating the hot topics developed by the IRDiRC task force on small population clinical trials to IDeAl's work as well as relating important methodologies by IDeAl's definition necessary to consider in design and analysis of small-population clinical trials. These network representation establish a new perspective on design and analysis of small-population clinical trials. IDeAl has provided a huge number of options to refine the statistical methodology for small-population clinical trials from various perspectives. A total of 33 recommendations developed and related to the work packages help the researcher to design small population clinical trial. The route to improvements is displayed in IDeAl-network representing important statistical methodological skills necessary to design and analysis of small-population clinical trials. The methods are ready for use.

  8. EpiModel: An R Package for Mathematical Modeling of Infectious Disease over Networks.

    PubMed

    Jenness, Samuel M; Goodreau, Steven M; Morris, Martina

    2018-04-01

    Package EpiModel provides tools for building, simulating, and analyzing mathematical models for the population dynamics of infectious disease transmission in R. Several classes of models are included, but the unique contribution of this software package is a general stochastic framework for modeling the spread of epidemics on networks. EpiModel integrates recent advances in statistical methods for network analysis (temporal exponential random graph models) that allow the epidemic modeling to be grounded in empirical data on contacts that can spread infection. This article provides an overview of both the modeling tools built into EpiModel , designed to facilitate learning for students new to modeling, and the application programming interface for extending package EpiModel , designed to facilitate the exploration of novel research questions for advanced modelers.

  9. EpiModel: An R Package for Mathematical Modeling of Infectious Disease over Networks

    PubMed Central

    Jenness, Samuel M.; Goodreau, Steven M.; Morris, Martina

    2018-01-01

    Package EpiModel provides tools for building, simulating, and analyzing mathematical models for the population dynamics of infectious disease transmission in R. Several classes of models are included, but the unique contribution of this software package is a general stochastic framework for modeling the spread of epidemics on networks. EpiModel integrates recent advances in statistical methods for network analysis (temporal exponential random graph models) that allow the epidemic modeling to be grounded in empirical data on contacts that can spread infection. This article provides an overview of both the modeling tools built into EpiModel, designed to facilitate learning for students new to modeling, and the application programming interface for extending package EpiModel, designed to facilitate the exploration of novel research questions for advanced modelers. PMID:29731699

  10. 76 FR 77768 - Information Collection; Flathead and McKenzie Rivers and McKenzie National Recreational Trail...

    Federal Register 2010, 2011, 2012, 2013, 2014

    2011-12-14

    ... completed and validated, the hardcopy questionnaires will be discarded. Data will be imported into SPSS (Statistical Package for the Social Sciences) for analysis. The database will be maintained at the respective...

  11. AnthropMMD: An R package with a graphical user interface for the mean measure of divergence.

    PubMed

    Santos, Frédéric

    2018-01-01

    The mean measure of divergence is a dissimilarity measure between groups of individuals described by dichotomous variables. It is well suited to datasets with many missing values, and it is generally used to compute distance matrices and represent phenograms. Although often used in biological anthropology and archaeozoology, this method suffers from a lack of implementation in common statistical software. A package for the R statistical software, AnthropMMD, is presented here. Offering a dynamic graphical user interface, it is the first one dedicated to Smith's mean measure of divergence. The package also provides facilities for graphical representations and the crucial step of trait selection, so that the entire analysis can be performed through the graphical user interface. Its use is demonstrated using an artificial dataset, and the impact of trait selection is discussed. Finally, AnthropMMD is compared to three other free tools available for calculating the mean measure of divergence, and is proven to be consistent with them. © 2017 Wiley Periodicals, Inc.

  12. Monte Carlo based statistical power analysis for mediation models: methods and software.

    PubMed

    Zhang, Zhiyong

    2014-12-01

    The existing literature on statistical power analysis for mediation models often assumes data normality and is based on a less powerful Sobel test instead of the more powerful bootstrap test. This study proposes to estimate statistical power to detect mediation effects on the basis of the bootstrap method through Monte Carlo simulation. Nonnormal data with excessive skewness and kurtosis are allowed in the proposed method. A free R package called bmem is developed to conduct the power analysis discussed in this study. Four examples, including a simple mediation model, a multiple-mediator model with a latent mediator, a multiple-group mediation model, and a longitudinal mediation model, are provided to illustrate the proposed method.

  13. A Meta-Analysis of Referential Communication Studies: A Computer Readable Literature Review.

    ERIC Educational Resources Information Center

    Dickson, W. Patrick; Moskoff, Mary

    A computer-assisted analysis of studies on referential communication (giving directions/explanations) located 66 reports involving 80 experiments, 114 referential tasks, and over 6,200 individuals. The studies were entered into a statistical software package system (SPSS) and analyzed for characteristics of the subjects and experimental designs,…

  14. Principle Component Analysis with Incomplete Data: A simulation of R pcaMethods package in Constructing an Environmental Quality Index with Missing Data

    EPA Science Inventory

    Missing data is a common problem in the application of statistical techniques. In principal component analysis (PCA), a technique for dimensionality reduction, incomplete data points are either discarded or imputed using interpolation methods. Such approaches are less valid when ...

  15. Modular Open-Source Software for Item Factor Analysis

    ERIC Educational Resources Information Center

    Pritikin, Joshua N.; Hunter, Micheal D.; Boker, Steven M.

    2015-01-01

    This article introduces an item factor analysis (IFA) module for "OpenMx," a free, open-source, and modular statistical modeling package that runs within the R programming environment on GNU/Linux, Mac OS X, and Microsoft Windows. The IFA module offers a novel model specification language that is well suited to programmatic generation…

  16. MAVTgsa: An R Package for Gene Set (Enrichment) Analysis

    DOE PAGES

    Chien, Chih-Yi; Chang, Ching-Wei; Tsai, Chen-An; ...

    2014-01-01

    Gene semore » t analysis methods aim to determine whether an a priori defined set of genes shows statistically significant difference in expression on either categorical or continuous outcomes. Although many methods for gene set analysis have been proposed, a systematic analysis tool for identification of different types of gene set significance modules has not been developed previously. This work presents an R package, called MAVTgsa, which includes three different methods for integrated gene set enrichment analysis. (1) The one-sided OLS (ordinary least squares) test detects coordinated changes of genes in gene set in one direction, either up- or downregulation. (2) The two-sided MANOVA (multivariate analysis variance) detects changes both up- and downregulation for studying two or more experimental conditions. (3) A random forests-based procedure is to identify gene sets that can accurately predict samples from different experimental conditions or are associated with the continuous phenotypes. MAVTgsa computes the P values and FDR (false discovery rate) q -value for all gene sets in the study. Furthermore, MAVTgsa provides several visualization outputs to support and interpret the enrichment results. This package is available online.« less

  17. The High Cost of Complexity in Experimental Design and Data Analysis: Type I and Type II Error Rates in Multiway ANOVA.

    ERIC Educational Resources Information Center

    Smith, Rachel A.; Levine, Timothy R.; Lachlan, Kenneth A.; Fediuk, Thomas A.

    2002-01-01

    Notes that the availability of statistical software packages has led to a sharp increase in use of complex research designs and complex statistical analyses in communication research. Reports a series of Monte Carlo simulations which demonstrate that this complexity may come at a heavier cost than many communication researchers realize. Warns…

  18. Gene- and pathway-based association tests for multiple traits with GWAS summary statistics.

    PubMed

    Kwak, Il-Youp; Pan, Wei

    2017-01-01

    To identify novel genetic variants associated with complex traits and to shed new insights on underlying biology, in addition to the most popular single SNP-single trait association analysis, it would be useful to explore multiple correlated (intermediate) traits at the gene- or pathway-level by mining existing single GWAS or meta-analyzed GWAS data. For this purpose, we present an adaptive gene-based test and a pathway-based test for association analysis of multiple traits with GWAS summary statistics. The proposed tests are adaptive at both the SNP- and trait-levels; that is, they account for possibly varying association patterns (e.g. signal sparsity levels) across SNPs and traits, thus maintaining high power across a wide range of situations. Furthermore, the proposed methods are general: they can be applied to mixed types of traits, and to Z-statistics or P-values as summary statistics obtained from either a single GWAS or a meta-analysis of multiple GWAS. Our numerical studies with simulated and real data demonstrated the promising performance of the proposed methods. The methods are implemented in R package aSPU, freely and publicly available at: https://cran.r-project.org/web/packages/aSPU/ CONTACT: weip@biostat.umn.eduSupplementary information: Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  19. CoinCalc-A new R package for quantifying simultaneities of event series

    NASA Astrophysics Data System (ADS)

    Siegmund, Jonatan F.; Siegmund, Nicole; Donner, Reik V.

    2017-01-01

    We present the new R package CoinCalc for performing event coincidence analysis (ECA), a novel statistical method to quantify the simultaneity of events contained in two series of observations, either as simultaneous or lagged coincidences within a user-specific temporal tolerance window. The package also provides different analytical as well as surrogate-based significance tests (valid under different assumptions about the nature of the observed event series) as well as an intuitive visualization of the identified coincidences. We demonstrate the usage of CoinCalc based on two typical geoscientific example problems addressing the relationship between meteorological extremes and plant phenology as well as that between soil properties and land cover.

  20. Microcomputer Programs for Educational Statistics: A Review of Popular Programs. TME Report 89.

    ERIC Educational Resources Information Center

    Stemmer, Paul M.; Berger, Carl F.

    This publication acquaints the user with microcomputer statistical packages and offers a method for evaluation based on a set of criteria that can be adapted to the needs of the user. Several popular packages, typical of those available, are reviewed in detail: (1) Abstat, an easy to use command driven package compatible with the IBM PC or the…

  1. Comparison of four software packages for CT lung volumetry in healthy individuals.

    PubMed

    Nemec, Stefan F; Molinari, Francesco; Dufresne, Valerie; Gosset, Natacha; Silva, Mario; Bankier, Alexander A

    2015-06-01

    To compare CT lung volumetry (CTLV) measurements provided by different software packages, and to provide normative data for lung densitometric measurements in healthy individuals. This retrospective study included 51 chest CTs of 17 volunteers (eight men and nine women; mean age, 30 ± 6 years), who underwent spirometrically monitored CT at total lung capacity (TLC), functional residual capacity (FRC), and mean inspiratory capacity (MIC). Volumetric differences assessed by four commercial software packages were compared with analysis of variance (ANOVA) for repeated measurements and benchmarked against the threshold for acceptable variability between spirometric measurements. Mean lung density (MLD) and parenchymal heterogeneity (MLD-SD) were also compared with ANOVA. Volumetric differences ranged from 12 to 213 ml (0.20 % to 6.45 %). Although 16/18 comparisons (among four software packages at TLC, MIC, and FRC) were statistically significant (P < 0.001 to P = 0.004), only 3/18 comparisons, one at MIC and two at FRC, exceeded the spirometry variability threshold. MLD and MLD-SD significantly increased with decreasing volumes, and were significantly larger in lower compared to upper lobes (P < 0.001). Lung volumetric differences provided by different software packages are small. These differences should not be interpreted based on statistical significance alone, but together with absolute volumetric differences. • Volumetric differences, assessed by different CTLV software, are small but statistically significant. • Volumetric differences are smaller at TLC than at MIC and FRC. • Volumetric differences rarely exceed spirometric repeatability thresholds at MIC and FRC. • Differences between CTLV measurements should be interpreted based on comparison of absolute differences. • MLD increases with decreasing volumes, and is larger in lower compared to upper lobes.

  2. Statistical Approaches to Adjusting Weights for Dependent Arms in Network Meta-analysis.

    PubMed

    Su, Yu-Xuan; Tu, Yu-Kang

    2018-05-22

    Network meta-analysis compares multiple treatments in terms of their efficacy and harm by including evidence from randomized controlled trials. Most clinical trials use parallel design, where patients are randomly allocated to different treatments and receive only one treatment. However, some trials use within person designs such as split-body, split-mouth and cross-over designs, where each patient may receive more than one treatment. Data from treatment arms within these trials are no longer independent, so the correlations between dependent arms need to be accounted for within the statistical analyses. Ignoring these correlations may result in incorrect conclusions. The main objective of this study is to develop statistical approaches to adjusting weights for dependent arms within special design trials. In this study, we demonstrate the following three approaches: the data augmentation approach, the adjusting variance approach, and the reducing weight approach. These three methods could be perfectly applied in current statistic tools such as R and STATA. An example of periodontal regeneration was used to demonstrate how these approaches could be undertaken and implemented within statistical software packages, and to compare results from different approaches. The adjusting variance approach can be implemented within the network package in STATA, while reducing weight approach requires computer software programming to set up the within-study variance-covariance matrix. This article is protected by copyright. All rights reserved.

  3. Using R in Introductory Statistics Courses with the pmg Graphical User Interface

    ERIC Educational Resources Information Center

    Verzani, John

    2008-01-01

    The pmg add-on package for the open source statistics software R is described. This package provides a simple to use graphical user interface (GUI) that allows introductory statistics students, without advanced computing skills, to quickly create the graphical and numeric summaries expected of them. (Contains 9 figures.)

  4. Scientific computations section monthly report, November 1993

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Buckner, M.R.

    1993-12-30

    This progress report from the Savannah River Technology Center contains abstracts from papers from the computational modeling, applied statistics, applied physics, experimental thermal hydraulics, and packaging and transportation groups. Specific topics covered include: engineering modeling and process simulation, criticality methods and analysis, plutonium disposition.

  5. Confirmatory Factor Analysis of Persian Adaptation of Multidimensional Students' Life Satisfaction Scale (MSLSS)

    ERIC Educational Resources Information Center

    Hatami, Gissou; Motamed, Niloofar; Ashrafzadeh, Mahshid

    2010-01-01

    Validity and reliability of Persian adaptation of MSLSS in the 12-18 years, middle and high school students (430 students in grades 6-12 in Bushehr port, Iran) using confirmatory factor analysis by means of LISREL statistical package were checked. Internal consistency reliability estimates (Cronbach's coefficient [alpha]) were all above the…

  6. SWATH2stats: An R/Bioconductor Package to Process and Convert Quantitative SWATH-MS Proteomics Data for Downstream Analysis Tools.

    PubMed

    Blattmann, Peter; Heusel, Moritz; Aebersold, Ruedi

    2016-01-01

    SWATH-MS is an acquisition and analysis technique of targeted proteomics that enables measuring several thousand proteins with high reproducibility and accuracy across many samples. OpenSWATH is popular open-source software for peptide identification and quantification from SWATH-MS data. For downstream statistical and quantitative analysis there exist different tools such as MSstats, mapDIA and aLFQ. However, the transfer of data from OpenSWATH to the downstream statistical tools is currently technically challenging. Here we introduce the R/Bioconductor package SWATH2stats, which allows convenient processing of the data into a format directly readable by the downstream analysis tools. In addition, SWATH2stats allows annotation, analyzing the variation and the reproducibility of the measurements, FDR estimation, and advanced filtering before submitting the processed data to downstream tools. These functionalities are important to quickly analyze the quality of the SWATH-MS data. Hence, SWATH2stats is a new open-source tool that summarizes several practical functionalities for analyzing, processing, and converting SWATH-MS data and thus facilitates the efficient analysis of large-scale SWATH/DIA datasets.

  7. mcaGUI: microbial community analysis R-Graphical User Interface (GUI).

    PubMed

    Copeland, Wade K; Krishnan, Vandhana; Beck, Daniel; Settles, Matt; Foster, James A; Cho, Kyu-Chul; Day, Mitch; Hickey, Roxana; Schütte, Ursel M E; Zhou, Xia; Williams, Christopher J; Forney, Larry J; Abdo, Zaid

    2012-08-15

    Microbial communities have an important role in natural ecosystems and have an impact on animal and human health. Intuitive graphic and analytical tools that can facilitate the study of these communities are in short supply. This article introduces Microbial Community Analysis GUI, a graphical user interface (GUI) for the R-programming language (R Development Core Team, 2010). With this application, researchers can input aligned and clustered sequence data to create custom abundance tables and perform analyses specific to their needs. This GUI provides a flexible modular platform, expandable to include other statistical tools for microbial community analysis in the future. The mcaGUI package and source are freely available as part of Bionconductor at http://www.bioconductor.org/packages/release/bioc/html/mcaGUI.html

  8. SNP_tools: A compact tool package for analysis and conversion of genotype data for MS-Excel

    PubMed Central

    Chen, Bowang; Wilkening, Stefan; Drechsel, Marion; Hemminki, Kari

    2009-01-01

    Background Single nucleotide polymorphism (SNP) genotyping is a major activity in biomedical research. Scientists prefer to have a facile access to the results which may require conversions between data formats. First hand SNP data is often entered in or saved in the MS-Excel format, but this software lacks genetic and epidemiological related functions. A general tool to do basic genetic and epidemiological analysis and data conversion for MS-Excel is needed. Findings The SNP_tools package is prepared as an add-in for MS-Excel. The code is written in Visual Basic for Application, embedded in the Microsoft Office package. This add-in is an easy to use tool for users with basic computer knowledge (and requirements for basic statistical analysis). Conclusion Our implementation for Microsoft Excel 2000-2007 in Microsoft Windows 2000, XP, Vista and Windows 7 beta can handle files in different formats and converts them into other formats. It is a free software. PMID:19852806

  9. SNP_tools: A compact tool package for analysis and conversion of genotype data for MS-Excel.

    PubMed

    Chen, Bowang; Wilkening, Stefan; Drechsel, Marion; Hemminki, Kari

    2009-10-23

    Single nucleotide polymorphism (SNP) genotyping is a major activity in biomedical research. Scientists prefer to have a facile access to the results which may require conversions between data formats. First hand SNP data is often entered in or saved in the MS-Excel format, but this software lacks genetic and epidemiological related functions. A general tool to do basic genetic and epidemiological analysis and data conversion for MS-Excel is needed. The SNP_tools package is prepared as an add-in for MS-Excel. The code is written in Visual Basic for Application, embedded in the Microsoft Office package. This add-in is an easy to use tool for users with basic computer knowledge (and requirements for basic statistical analysis). Our implementation for Microsoft Excel 2000-2007 in Microsoft Windows 2000, XP, Vista and Windows 7 beta can handle files in different formats and converts them into other formats. It is a free software.

  10. Around and about an application of the GAMLSS package to non-stationary flood frequency analysis

    NASA Astrophysics Data System (ADS)

    Debele, S. E.; Bogdanowicz, E.; Strupczewski, W. G.

    2017-08-01

    The non-stationarity of hydrologic processes due to climate change or human activities is challenging for the researchers and practitioners. However, the practical requirements for taking into account non-stationarity as a support in decision-making procedures exceed the up-to-date development of the theory and the of software. Currently, the most popular and freely available software package that allows for non-stationary statistical analysis is the GAMLSS (generalized additive models for location, scale and shape) package. GAMLSS has been used in a variety of fields. There are also several papers recommending GAMLSS in hydrological problems; however, there are still important issues which have not previously been discussed concerning mainly GAMLSS applicability not only for research and academic purposes, but also in a design practice. In this paper, we present a summary of our experiences in the implementation of GAMLSS to non-stationary flood frequency analysis, highlighting its advantages and pointing out weaknesses with regard to methodological and practical topics.

  11. Drought: A comprehensive R package for drought monitoring, prediction and analysis

    NASA Astrophysics Data System (ADS)

    Hao, Zengchao; Hao, Fanghua; Singh, Vijay P.; Cheng, Hongguang

    2015-04-01

    Drought may impose serious challenges to human societies and ecosystems. Due to complicated causing effects and wide impacts, a universally accepted definition of drought does not exist. The drought indicator is commonly used to characterize drought properties such as duration or severity. Various drought indicators have been developed in the past few decades for the monitoring of a certain aspect of drought condition along with the development of multivariate drought indices for drought characterizations from multiple sources or hydro-climatic variables. Reliable drought prediction with suitable drought indicators is critical to the drought preparedness plan to reduce potential drought impacts. In addition, drought analysis to quantify the risk of drought properties would provide useful information for operation drought managements. The drought monitoring, prediction and risk analysis are important components in drought modeling and assessments. In this study, a comprehensive R package "drought" is developed to aid the drought monitoring, prediction and risk analysis (available from R-Forge and CRAN soon). The computation of a suite of univariate and multivariate drought indices that integrate drought information from various sources such as precipitation, temperature, soil moisture, and runoff is available in the drought monitoring component in the package. The drought prediction/forecasting component consists of statistical drought predictions to enhance the drought early warning for decision makings. Analysis of drought properties such as duration and severity is also provided in this package for drought risk assessments. Based on this package, a drought monitoring and prediction/forecasting system is under development as a decision supporting tool. The package will be provided freely to the public to aid the drought modeling and assessment for researchers and practitioners.

  12. SOCR: Statistics Online Computational Resource

    PubMed Central

    Dinov, Ivo D.

    2011-01-01

    The need for hands-on computer laboratory experience in undergraduate and graduate statistics education has been firmly established in the past decade. As a result a number of attempts have been undertaken to develop novel approaches for problem-driven statistical thinking, data analysis and result interpretation. In this paper we describe an integrated educational web-based framework for: interactive distribution modeling, virtual online probability experimentation, statistical data analysis, visualization and integration. Following years of experience in statistical teaching at all college levels using established licensed statistical software packages, like STATA, S-PLUS, R, SPSS, SAS, Systat, etc., we have attempted to engineer a new statistics education environment, the Statistics Online Computational Resource (SOCR). This resource performs many of the standard types of statistical analysis, much like other classical tools. In addition, it is designed in a plug-in object-oriented architecture and is completely platform independent, web-based, interactive, extensible and secure. Over the past 4 years we have tested, fine-tuned and reanalyzed the SOCR framework in many of our undergraduate and graduate probability and statistics courses and have evidence that SOCR resources build student’s intuition and enhance their learning. PMID:21451741

  13. PHAST: Protein-like heteropolymer analysis by statistical thermodynamics

    NASA Astrophysics Data System (ADS)

    Frigori, Rafael B.

    2017-06-01

    PHAST is a software package written in standard Fortran, with MPI and CUDA extensions, able to efficiently perform parallel multicanonical Monte Carlo simulations of single or multiple heteropolymeric chains, as coarse-grained models for proteins. The outcome data can be straightforwardly analyzed within its microcanonical Statistical Thermodynamics module, which allows for computing the entropy, caloric curve, specific heat and free energies. As a case study, we investigate the aggregation of heteropolymers bioinspired on Aβ25-33 fragments and their cross-seeding with IAPP20-29 isoforms. Excellent parallel scaling is observed, even under numerically difficult first-order like phase transitions, which are properly described by the built-in fully reconfigurable force fields. Still, the package is free and open source, this shall motivate users to readily adapt it to specific purposes.

  14. A New Paradigm to Analyze Data Completeness of Patient Data.

    PubMed

    Nasir, Ayan; Gurupur, Varadraj; Liu, Xinliang

    2016-08-03

    There is a need to develop a tool that will measure data completeness of patient records using sophisticated statistical metrics. Patient data integrity is important in providing timely and appropriate care. Completeness is an important step, with an emphasis on understanding the complex relationships between data fields and their relative importance in delivering care. This tool will not only help understand where data problems are but also help uncover the underlying issues behind them. Develop a tool that can be used alongside a variety of health care database software packages to determine the completeness of individual patient records as well as aggregate patient records across health care centers and subpopulations. The methodology of this project is encapsulated within the Data Completeness Analysis Package (DCAP) tool, with the major components including concept mapping, CSV parsing, and statistical analysis. The results from testing DCAP with Healthcare Cost and Utilization Project (HCUP) State Inpatient Database (SID) data show that this tool is successful in identifying relative data completeness at the patient, subpopulation, and database levels. These results also solidify a need for further analysis and call for hypothesis driven research to find underlying causes for data incompleteness. DCAP examines patient records and generates statistics that can be used to determine the completeness of individual patient data as well as the general thoroughness of record keeping in a medical database. DCAP uses a component that is customized to the settings of the software package used for storing patient data as well as a Comma Separated Values (CSV) file parser to determine the appropriate measurements. DCAP itself is assessed through a proof of concept exercise using hypothetical data as well as available HCUP SID patient data.

  15. A New Paradigm to Analyze Data Completeness of Patient Data

    PubMed Central

    Nasir, Ayan; Liu, Xinliang

    2016-01-01

    Summary Background There is a need to develop a tool that will measure data completeness of patient records using sophisticated statistical metrics. Patient data integrity is important in providing timely and appropriate care. Completeness is an important step, with an emphasis on understanding the complex relationships between data fields and their relative importance in delivering care. This tool will not only help understand where data problems are but also help uncover the underlying issues behind them. Objectives Develop a tool that can be used alongside a variety of health care database software packages to determine the completeness of individual patient records as well as aggregate patient records across health care centers and subpopulations. Methods The methodology of this project is encapsulated within the Data Completeness Analysis Package (DCAP) tool, with the major components including concept mapping, CSV parsing, and statistical analysis. Results The results from testing DCAP with Healthcare Cost and Utilization Project (HCUP) State Inpatient Database (SID) data show that this tool is successful in identifying relative data completeness at the patient, subpopulation, and database levels. These results also solidify a need for further analysis and call for hypothesis driven research to find underlying causes for data incompleteness. Conclusion DCAP examines patient records and generates statistics that can be used to determine the completeness of individual patient data as well as the general thoroughness of record keeping in a medical database. DCAP uses a component that is customized to the settings of the software package used for storing patient data as well as a Comma Separated Values (CSV) file parser to determine the appropriate measurements. DCAP itself is assessed through a proof of concept exercise using hypothetical data as well as available HCUP SID patient data. PMID:27484918

  16. Use of statistical study methods for the analysis of the results of the imitation modeling of radiation transfer

    NASA Astrophysics Data System (ADS)

    Alekseenko, M. A.; Gendrina, I. Yu.

    2017-11-01

    Recently, due to the abundance of various types of observational data in the systems of vision through the atmosphere and the need for their processing, the use of various methods of statistical research in the study of such systems as correlation-regression analysis, dynamic series, variance analysis, etc. is actual. We have attempted to apply elements of correlation-regression analysis for the study and subsequent prediction of the patterns of radiation transfer in these systems same as in the construction of radiation models of the atmosphere. In this paper, we present some results of statistical processing of the results of numerical simulation of the characteristics of vision systems through the atmosphere obtained with the help of a special software package.1

  17. The Data from Aeromechanics Test and Analytics -- Management and Analysis Package (DATAMAP). Volume I. User’s Manual.

    DTIC Science & Technology

    1980-12-01

    to sound pressure level in decibels assuming a fre- quency of 1000 Hz. 249 The perceived noisiness values are derived from a formula specified in...Analyses .......... 244 6.i.16 Perceived Noise Level Analysis .............249 6.1.17 Acoustic Weighting Networks ................250 6.2 DERIVATIONS...BAND ANALYSIS BASIC STATISTICAL ANALYSES: *OCTAVE ANALYSIS MEAN *THIRD OCTAVE ANALYSIS VARIANCE *PERCEIVED NOISE LEVEL STANDARD DEVIATION CALCULATION

  18. RankProd 2.0: a refactored bioconductor package for detecting differentially expressed features in molecular profiling datasets.

    PubMed

    Del Carratore, Francesco; Jankevics, Andris; Eisinga, Rob; Heskes, Tom; Hong, Fangxin; Breitling, Rainer

    2017-09-01

    The Rank Product (RP) is a statistical technique widely used to detect differentially expressed features in molecular profiling experiments such as transcriptomics, metabolomics and proteomics studies. An implementation of the RP and the closely related Rank Sum (RS) statistics has been available in the RankProd Bioconductor package for several years. However, several recent advances in the understanding of the statistical foundations of the method have made a complete refactoring of the existing package desirable. We implemented a completely refactored version of the RankProd package, which provides a more principled implementation of the statistics for unpaired datasets. Moreover, the permutation-based P -value estimation methods have been replaced by exact methods, providing faster and more accurate results. RankProd 2.0 is available at Bioconductor ( https://www.bioconductor.org/packages/devel/bioc/html/RankProd.html ) and as part of the mzMatch pipeline ( http://www.mzmatch.sourceforge.net ). rainer.breitling@manchester.ac.uk. Supplementary data are available at Bioinformatics online. © The Author(s) 2017. Published by Oxford University Press.

  19. Multiscale analysis of river networks using the R package linbin

    USGS Publications Warehouse

    Welty, Ethan Z.; Torgersen, Christian E.; Brenkman, Samuel J.; Duda, Jeffrey J.; Armstrong, Jonathan B.

    2015-01-01

    Analytical tools are needed in riverine science and management to bridge the gap between GIS and statistical packages that were not designed for the directional and dendritic structure of streams. We introduce linbin, an R package developed for the analysis of riverscapes at multiple scales. With this software, riverine data on aquatic habitat and species distribution can be scaled and plotted automatically with respect to their position in the stream network or—in the case of temporal data—their position in time. The linbin package aggregates data into bins of different sizes as specified by the user. We provide case studies illustrating the use of the software for (1) exploring patterns at different scales by aggregating variables at a range of bin sizes, (2) comparing repeat observations by aggregating surveys into bins of common coverage, and (3) tailoring analysis to data with custom bin designs. Furthermore, we demonstrate the utility of linbin for summarizing patterns throughout an entire stream network, and we analyze the diel and seasonal movements of tagged fish past a stationary receiver to illustrate how linbin can be used with temporal data. In short, linbin enables more rapid analysis of complex data sets by fisheries managers and stream ecologists and can reveal underlying spatial and temporal patterns of fish distribution and habitat throughout a riverscape.

  20. FieldTrip: Open source software for advanced analysis of MEG, EEG, and invasive electrophysiological data.

    PubMed

    Oostenveld, Robert; Fries, Pascal; Maris, Eric; Schoffelen, Jan-Mathijs

    2011-01-01

    This paper describes FieldTrip, an open source software package that we developed for the analysis of MEG, EEG, and other electrophysiological data. The software is implemented as a MATLAB toolbox and includes a complete set of consistent and user-friendly high-level functions that allow experimental neuroscientists to analyze experimental data. It includes algorithms for simple and advanced analysis, such as time-frequency analysis using multitapers, source reconstruction using dipoles, distributed sources and beamformers, connectivity analysis, and nonparametric statistical permutation tests at the channel and source level. The implementation as toolbox allows the user to perform elaborate and structured analyses of large data sets using the MATLAB command line and batch scripting. Furthermore, users and developers can easily extend the functionality and implement new algorithms. The modular design facilitates the reuse in other software packages.

  1. msap: a tool for the statistical analysis of methylation-sensitive amplified polymorphism data.

    PubMed

    Pérez-Figueroa, A

    2013-05-01

    In this study msap, an R package which analyses methylation-sensitive amplified polymorphism (MSAP or MS-AFLP) data is presented. The program provides a deep analysis of epigenetic variation starting from a binary data matrix indicating the banding pattern between the isoesquizomeric endonucleases HpaII and MspI, with differential sensitivity to cytosine methylation. After comparing the restriction fragments, the program determines if each fragment is susceptible to methylation (representative of epigenetic variation) or if there is no evidence of methylation (representative of genetic variation). The package provides, in a user-friendly command line interface, a pipeline of different analyses of the variation (genetic and epigenetic) among user-defined groups of samples, as well as the classification of the methylation occurrences in those groups. Statistical testing provides support to the analyses. A comprehensive report of the analyses and several useful plots could help researchers to assess the epigenetic and genetic variation in their MSAP experiments. msap is downloadable from CRAN (http://cran.r-project.org/) and its own webpage (http://msap.r-forge.R-project.org/). The package is intended to be easy to use even for those people unfamiliar with the R command line environment. Advanced users may take advantage of the available source code to adapt msap to more complex analyses. © 2013 Blackwell Publishing Ltd.

  2. A graphical user interface for RAId, a knowledge integrated proteomics analysis suite with accurate statistics.

    PubMed

    Joyce, Brendan; Lee, Danny; Rubio, Alex; Ogurtsov, Aleksey; Alves, Gelio; Yu, Yi-Kuo

    2018-03-15

    RAId is a software package that has been actively developed for the past 10 years for computationally and visually analyzing MS/MS data. Founded on rigorous statistical methods, RAId's core program computes accurate E-values for peptides and proteins identified during database searches. Making this robust tool readily accessible for the proteomics community by developing a graphical user interface (GUI) is our main goal here. We have constructed a graphical user interface to facilitate the use of RAId on users' local machines. Written in Java, RAId_GUI not only makes easy executions of RAId but also provides tools for data/spectra visualization, MS-product analysis, molecular isotopic distribution analysis, and graphing the retrieval versus the proportion of false discoveries. The results viewer displays and allows the users to download the analyses results. Both the knowledge-integrated organismal databases and the code package (containing source code, the graphical user interface, and a user manual) are available for download at https://www.ncbi.nlm.nih.gov/CBBresearch/Yu/downloads/raid.html .

  3. Computing and software

    USGS Publications Warehouse

    White, Gary C.; Hines, J.E.

    2004-01-01

    The reality is that the statistical methods used for analysis of data depend upon the availability of software. Analysis of marked animal data is no different than the rest of the statistical field. The methods used for analysis are those that are available in reliable software packages. Thus, the critical importance of having reliable, up–to–date software available to biologists is obvious. Statisticians have continued to develop more robust models, ever expanding the suite of potential analysis methodsavailable. But without software to implement these newer methods, they will languish in the abstract, and not be applied to the problems deserving them.In the Computers and Software Session, two new software packages are described, a comparison of implementation of methods for the estimation of nest survival is provided, and a more speculative paper about how the next generation of software might be structured is presented.Rotella et al. (2004) compare nest survival estimation with different software packages: SAS logistic regression, SAS non–linear mixed models, and Program MARK. Nests are assumed to be visited at various, possibly infrequent, intervals. All of the approaches described compute nest survival with the same likelihood, and require that the age of the nest is known to account for nests that eventually hatch. However, each approach offers advantages and disadvantages, explored by Rotella et al. (2004).Efford et al. (2004) present a new software package called DENSITY. The package computes population abundance and density from trapping arrays and other detection methods with a new and unique approach. DENSITY represents the first major addition to the analysis of trapping arrays in 20 years.Barker & White (2004) discuss how existing software such as Program MARK require that each new model’s likelihood must be programmed specifically for that model. They wishfully think that future software might allow the user to combine pieces of likelihood functions together to generate estimates. The idea is interesting, and maybe some bright young statistician can work out the specifics to implement the procedure.Choquet et al. (2004) describe MSURGE, a software package that implements the multistate capture–recapture models. The unique feature of MSURGE is that the design matrix is constructed with an interpreted language called GEMACO. Because MSURGE is limited to just multistate models, the special requirements of these likelihoods can be provided.The software and methods presented in these papers gives biologists and wildlife managers an expanding range of possibilities for data analysis. Although ease–of–use is generally getting better, it does not replace the need for understanding of the requirements and structure of the models being computed. The internet provides access to many free software packages as well as user–discussion groups to share knowledge and ideas. (A starting point for wildlife–related applications is (http://www.phidot.org).

  4. Computer Managed Instruction: An Application in Teaching Introductory Statistics.

    ERIC Educational Resources Information Center

    Hudson, Walter W.

    1985-01-01

    This paper describes a computer managed instruction package for teaching introductory or advanced statistics. The instructional package is described and anecdotal information concerning its performance and student responses to its use over two semesters are given. (Author/BL)

  5. Bioconductor Workflow for Microbiome Data Analysis: from raw reads to community analyses

    PubMed Central

    Callahan, Ben J.; Sankaran, Kris; Fukuyama, Julia A.; McMurdie, Paul J.; Holmes, Susan P.

    2016-01-01

    High-throughput sequencing of PCR-amplified taxonomic markers (like the 16S rRNA gene) has enabled a new level of analysis of complex bacterial communities known as microbiomes. Many tools exist to quantify and compare abundance levels or OTU composition of communities in different conditions. The sequencing reads have to be denoised and assigned to the closest taxa from a reference database. Common approaches use a notion of 97% similarity and normalize the data by subsampling to equalize library sizes. In this paper, we show that statistical models allow more accurate abundance estimates. By providing a complete workflow in R, we enable the user to do sophisticated downstream statistical analyses, whether parametric or nonparametric. We provide examples of using the R packages dada2, phyloseq, DESeq2, ggplot2 and vegan to filter, visualize and test microbiome data. We also provide examples of supervised analyses using random forests and nonparametric testing using community networks and the ggnetwork package. PMID:27508062

  6. PopSc: Computing Toolkit for Basic Statistics of Molecular Population Genetics Simultaneously Implemented in Web-Based Calculator, Python and R

    PubMed Central

    Huang, Ying; Li, Cao; Liu, Linhai; Jia, Xianbo; Lai, Song-Jia

    2016-01-01

    Although various computer tools have been elaborately developed to calculate a series of statistics in molecular population genetics for both small- and large-scale DNA data, there is no efficient and easy-to-use toolkit available yet for exclusively focusing on the steps of mathematical calculation. Here, we present PopSc, a bioinformatic toolkit for calculating 45 basic statistics in molecular population genetics, which could be categorized into three classes, including (i) genetic diversity of DNA sequences, (ii) statistical tests for neutral evolution, and (iii) measures of genetic differentiation among populations. In contrast to the existing computer tools, PopSc was designed to directly accept the intermediate metadata, such as allele frequencies, rather than the raw DNA sequences or genotyping results. PopSc is first implemented as the web-based calculator with user-friendly interface, which greatly facilitates the teaching of population genetics in class and also promotes the convenient and straightforward calculation of statistics in research. Additionally, we also provide the Python library and R package of PopSc, which can be flexibly integrated into other advanced bioinformatic packages of population genetics analysis. PMID:27792763

  7. PopSc: Computing Toolkit for Basic Statistics of Molecular Population Genetics Simultaneously Implemented in Web-Based Calculator, Python and R.

    PubMed

    Chen, Shi-Yi; Deng, Feilong; Huang, Ying; Li, Cao; Liu, Linhai; Jia, Xianbo; Lai, Song-Jia

    2016-01-01

    Although various computer tools have been elaborately developed to calculate a series of statistics in molecular population genetics for both small- and large-scale DNA data, there is no efficient and easy-to-use toolkit available yet for exclusively focusing on the steps of mathematical calculation. Here, we present PopSc, a bioinformatic toolkit for calculating 45 basic statistics in molecular population genetics, which could be categorized into three classes, including (i) genetic diversity of DNA sequences, (ii) statistical tests for neutral evolution, and (iii) measures of genetic differentiation among populations. In contrast to the existing computer tools, PopSc was designed to directly accept the intermediate metadata, such as allele frequencies, rather than the raw DNA sequences or genotyping results. PopSc is first implemented as the web-based calculator with user-friendly interface, which greatly facilitates the teaching of population genetics in class and also promotes the convenient and straightforward calculation of statistics in research. Additionally, we also provide the Python library and R package of PopSc, which can be flexibly integrated into other advanced bioinformatic packages of population genetics analysis.

  8. User’s guide for GcClust—An R package for clustering of regional geochemical data

    USGS Publications Warehouse

    Ellefsen, Karl J.; Smith, David B.

    2016-04-08

    GcClust is a software package developed by the U.S. Geological Survey for statistical clustering of regional geochemical data, and similar data such as regional mineralogical data. Functions within the software package are written in the R statistical programming language. These functions, their documentation, and a copy of the user’s guide are bundled together in R’s unit of sharable code, which is called a “package.” The user’s guide includes step-by-step instructions showing how the functions are used to cluster data and to evaluate the clustering results. These functions are demonstrated in this report using test data, which are included in the package.

  9. Classification Techniques for Multivariate Data Analysis.

    DTIC Science & Technology

    1980-03-28

    analysis among biologists, botanists, and ecologists, while some social scientists may refer "typology". Other frequently encountered terms are pattern...the determinantal equation: lB -XW 0 (42) 49 The solutions X. are the eigenvalues of the matrix W-1 B 1 as in discriminant analysis. There are t non...Statistical Package for Social Sciences (SPSS) (14) subprogram FACTOR was used for the principal components analysis. It is designed both for the factor

  10. ontologyX: a suite of R packages for working with ontological data.

    PubMed

    Greene, Daniel; Richardson, Sylvia; Turro, Ernest

    2017-04-01

    Ontologies are widely used constructs for encoding and analyzing biomedical data, but the absence of simple and consistent tools has made exploratory and systematic analysis of such data unnecessarily difficult. Here we present three packages which aim to simplify such procedures. The ontologyIndex package enables arbitrary ontologies to be read into R, supports representation of ontological objects by native R types, and provides a parsimonius set of performant functions for querying ontologies. ontologySimilarity and ontologyPlot extend ontologyIndex with functionality for straightforward visualization and semantic similarity calculations, including statistical routines. ontologyIndex , ontologyPlot and ontologySimilarity are all available on the Comprehensive R Archive Network website under https://cran.r-project.org/web/packages/ . Daniel Greene dg333@cam.ac.uk. Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.

  11. SEDA: A software package for the Statistical Earthquake Data Analysis

    NASA Astrophysics Data System (ADS)

    Lombardi, A. M.

    2017-03-01

    In this paper, the first version of the software SEDA (SEDAv1.0), designed to help seismologists statistically analyze earthquake data, is presented. The package consists of a user-friendly Matlab-based interface, which allows the user to easily interact with the application, and a computational core of Fortran codes, to guarantee the maximum speed. The primary factor driving the development of SEDA is to guarantee the research reproducibility, which is a growing movement among scientists and highly recommended by the most important scientific journals. SEDAv1.0 is mainly devoted to produce accurate and fast outputs. Less care has been taken for the graphic appeal, which will be improved in the future. The main part of SEDAv1.0 is devoted to the ETAS modeling. SEDAv1.0 contains a set of consistent tools on ETAS, allowing the estimation of parameters, the testing of model on data, the simulation of catalogs, the identification of sequences and forecasts calculation. The peculiarities of routines inside SEDAv1.0 are discussed in this paper. More specific details on the software are presented in the manual accompanying the program package.

  12. Genome-wide regression and prediction with the BGLR statistical package.

    PubMed

    Pérez, Paulino; de los Campos, Gustavo

    2014-10-01

    Many modern genomic data analyses require implementing regressions where the number of parameters (p, e.g., the number of marker effects) exceeds sample size (n). Implementing these large-p-with-small-n regressions poses several statistical and computational challenges, some of which can be confronted using Bayesian methods. This approach allows integrating various parametric and nonparametric shrinkage and variable selection procedures in a unified and consistent manner. The BGLR R-package implements a large collection of Bayesian regression models, including parametric variable selection and shrinkage methods and semiparametric procedures (Bayesian reproducing kernel Hilbert spaces regressions, RKHS). The software was originally developed for genomic applications; however, the methods implemented are useful for many nongenomic applications as well. The response can be continuous (censored or not) or categorical (either binary or ordinal). The algorithm is based on a Gibbs sampler with scalar updates and the implementation takes advantage of efficient compiled C and Fortran routines. In this article we describe the methods implemented in BGLR, present examples of the use of the package, and discuss practical issues emerging in real-data analysis. Copyright © 2014 by the Genetics Society of America.

  13. SEDA: A software package for the Statistical Earthquake Data Analysis

    PubMed Central

    Lombardi, A. M.

    2017-01-01

    In this paper, the first version of the software SEDA (SEDAv1.0), designed to help seismologists statistically analyze earthquake data, is presented. The package consists of a user-friendly Matlab-based interface, which allows the user to easily interact with the application, and a computational core of Fortran codes, to guarantee the maximum speed. The primary factor driving the development of SEDA is to guarantee the research reproducibility, which is a growing movement among scientists and highly recommended by the most important scientific journals. SEDAv1.0 is mainly devoted to produce accurate and fast outputs. Less care has been taken for the graphic appeal, which will be improved in the future. The main part of SEDAv1.0 is devoted to the ETAS modeling. SEDAv1.0 contains a set of consistent tools on ETAS, allowing the estimation of parameters, the testing of model on data, the simulation of catalogs, the identification of sequences and forecasts calculation. The peculiarities of routines inside SEDAv1.0 are discussed in this paper. More specific details on the software are presented in the manual accompanying the program package. PMID:28290482

  14. Association of Time between Surgery and Adjuvant Therapy with Survival in Oral Cavity Cancer.

    PubMed

    Chen, Michelle M; Harris, Jeremy P; Orosco, Ryan K; Sirjani, Davud; Hara, Wendy; Divi, Vasu

    2018-06-01

    Objective The National Cancer Center Network recommends starting radiation therapy within 6 weeks after surgery for oral cavity squamous cell carcinoma (OCSCC), but there is limited evidence of the importance of the total time from surgery to completion of radiation therapy (package time). We set out to determine if there was an association between package time and survival in OCSCC and to evaluate the impact of treatment location on outcomes. Study Design Retrospective cohort study. Setting Tertiary academic medical center. Subjects and Methods We reviewed the records of patients with OCSCC who completed postoperative radiation therapy at an academic medical center from 2008 to 2016. The primary endpoints were overall survival and recurrence-free survival. Statistical analysis included χ 2 tests and Cox proportional hazards regressions. Results We identified 132 patients with an average package time of 12.6 weeks. On multivariate analysis, package time >11 weeks was independently associated with decreased overall survival (hazard ratio, 6.68; 95% CI, 1.42-31.44) and recurrence-free survival (hazard ratio, 2.94; 95% CI, 1.20-7.18). Patients who received radiation therapy at outside facilities were more likely to have treatment delays (90.2% vs 62.9%, P = .001). Conclusions Prolonged package times are associated with decreased overall and recurrence-free survival among patients with OCSCC. Patients who received radiation therapy at outside facilities are more likely to have prolonged package times.

  15. Attitudes and Achievement in Introductory Psychological Statistics Classes: Traditional versus Computer-Supported Instruction.

    ERIC Educational Resources Information Center

    Gratz, Zandra S.; And Others

    A study was conducted at a large, state-supported college in the Northeast to establish a mechanism by which a popular software package, Statistical Package for the Social Sciences (SPSS), could be used in psychology program statistics courses in such a way that no prior computer expertise would be needed on the part of the faculty or the…

  16. Motivation, values, and work design as drivers of participation in the R open source project for statistical computing

    PubMed Central

    Mair, Patrick; Hofmann, Eva; Gruber, Kathrin; Hatzinger, Reinhold; Zeileis, Achim; Hornik, Kurt

    2015-01-01

    One of the cornerstones of the R system for statistical computing is the multitude of packages contributed by numerous package authors. This amount of packages makes an extremely broad range of statistical techniques and other quantitative methods freely available. Thus far, no empirical study has investigated psychological factors that drive authors to participate in the R project. This article presents a study of R package authors, collecting data on different types of participation (number of packages, participation in mailing lists, participation in conferences), three psychological scales (types of motivation, psychological values, and work design characteristics), and various socio-demographic factors. The data are analyzed using item response models and subsequent generalized linear models, showing that the most important determinants for participation are a hybrid form of motivation and the social characteristics of the work design. Other factors are found to have less impact or influence only specific aspects of participation. PMID:26554005

  17. Motivation, values, and work design as drivers of participation in the R open source project for statistical computing.

    PubMed

    Mair, Patrick; Hofmann, Eva; Gruber, Kathrin; Hatzinger, Reinhold; Zeileis, Achim; Hornik, Kurt

    2015-12-01

    One of the cornerstones of the R system for statistical computing is the multitude of packages contributed by numerous package authors. This amount of packages makes an extremely broad range of statistical techniques and other quantitative methods freely available. Thus far, no empirical study has investigated psychological factors that drive authors to participate in the R project. This article presents a study of R package authors, collecting data on different types of participation (number of packages, participation in mailing lists, participation in conferences), three psychological scales (types of motivation, psychological values, and work design characteristics), and various socio-demographic factors. The data are analyzed using item response models and subsequent generalized linear models, showing that the most important determinants for participation are a hybrid form of motivation and the social characteristics of the work design. Other factors are found to have less impact or influence only specific aspects of participation.

  18. FieldTrip: Open Source Software for Advanced Analysis of MEG, EEG, and Invasive Electrophysiological Data

    PubMed Central

    Oostenveld, Robert; Fries, Pascal; Maris, Eric; Schoffelen, Jan-Mathijs

    2011-01-01

    This paper describes FieldTrip, an open source software package that we developed for the analysis of MEG, EEG, and other electrophysiological data. The software is implemented as a MATLAB toolbox and includes a complete set of consistent and user-friendly high-level functions that allow experimental neuroscientists to analyze experimental data. It includes algorithms for simple and advanced analysis, such as time-frequency analysis using multitapers, source reconstruction using dipoles, distributed sources and beamformers, connectivity analysis, and nonparametric statistical permutation tests at the channel and source level. The implementation as toolbox allows the user to perform elaborate and structured analyses of large data sets using the MATLAB command line and batch scripting. Furthermore, users and developers can easily extend the functionality and implement new algorithms. The modular design facilitates the reuse in other software packages. PMID:21253357

  19. Cost-effectiveness Analysis in R Using a Multi-state Modeling Survival Analysis Framework: A Tutorial.

    PubMed

    Williams, Claire; Lewsey, James D; Briggs, Andrew H; Mackay, Daniel F

    2017-05-01

    This tutorial provides a step-by-step guide to performing cost-effectiveness analysis using a multi-state modeling approach. Alongside the tutorial, we provide easy-to-use functions in the statistics package R. We argue that this multi-state modeling approach using a package such as R has advantages over approaches where models are built in a spreadsheet package. In particular, using a syntax-based approach means there is a written record of what was done and the calculations are transparent. Reproducing the analysis is straightforward as the syntax just needs to be run again. The approach can be thought of as an alternative way to build a Markov decision-analytic model, which also has the option to use a state-arrival extended approach. In the state-arrival extended multi-state model, a covariate that represents patients' history is included, allowing the Markov property to be tested. We illustrate the building of multi-state survival models, making predictions from the models and assessing fits. We then proceed to perform a cost-effectiveness analysis, including deterministic and probabilistic sensitivity analyses. Finally, we show how to create 2 common methods of visualizing the results-namely, cost-effectiveness planes and cost-effectiveness acceptability curves. The analysis is implemented entirely within R. It is based on adaptions to functions in the existing R package mstate to accommodate parametric multi-state modeling that facilitates extrapolation of survival curves.

  20. On Fitting Generalized Linear Mixed-effects Models for Binary Responses using Different Statistical Packages

    PubMed Central

    Zhang, Hui; Lu, Naiji; Feng, Changyong; Thurston, Sally W.; Xia, Yinglin; Tu, Xin M.

    2011-01-01

    Summary The generalized linear mixed-effects model (GLMM) is a popular paradigm to extend models for cross-sectional data to a longitudinal setting. When applied to modeling binary responses, different software packages and even different procedures within a package may give quite different results. In this report, we describe the statistical approaches that underlie these different procedures and discuss their strengths and weaknesses when applied to fit correlated binary responses. We then illustrate these considerations by applying these procedures implemented in some popular software packages to simulated and real study data. Our simulation results indicate a lack of reliability for most of the procedures considered, which carries significant implications for applying such popular software packages in practice. PMID:21671252

  1. qFeature

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    2015-09-14

    This package contains statistical routines for extracting features from multivariate time-series data which can then be used for subsequent multivariate statistical analysis to identify patterns and anomalous behavior. It calculates local linear or quadratic regression model fits to moving windows for each series and then summarizes the model coefficients across user-defined time intervals for each series. These methods are domain agnostic-but they have been successfully applied to a variety of domains, including commercial aviation and electric power grid data.

  2. An econometric analysis of regional differences in household waste collection: the case of plastic packaging waste in Sweden.

    PubMed

    Hage, Olle; Söderholm, Patrik

    2008-01-01

    The Swedish producer responsibility ordinance mandates producers to collect and recycle packaging materials. This paper investigates the main determinants of collection rates of household plastic packaging waste in Swedish municipalities. This is done by the use of a regression analysis based on cross-sectional data for 252 Swedish municipalities. The results suggest that local policies, geographic/demographic variables, socio-economic factors and environmental preferences all help explain inter-municipality collection rates. For instance, the collection rate appears to be positively affected by increases in the unemployment rate, the share of private houses, and the presence of immigrants (unless newly arrived) in the municipality. The impacts of distance to recycling industry, urbanization rate and population density on collection outcomes turn out, though, to be both statistically and economically insignificant. A reasonable explanation for this is that the monetary compensation from the material companies to the collection entrepreneurs vary depending on region and is typically higher in high-cost regions. This implies that the plastic packaging collection in Sweden may be cost ineffective. Finally, the analysis also shows that municipalities that employ weight-based waste management fees generally experience higher collection rates than those municipalities in which flat and/or volume-based fees are used.

  3. An econometric analysis of regional differences in household waste collection: The case of plastic packaging waste in Sweden

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hage, Olle; Soederholm, Patrik

    2008-07-01

    The Swedish producer responsibility ordinance mandates producers to collect and recycle packaging materials. This paper investigates the main determinants of collection rates of household plastic packaging waste in Swedish municipalities. This is done by the use of a regression analysis based on cross-sectional data for 252 Swedish municipalities. The results suggest that local policies, geographic/demographic variables, socio-economic factors and environmental preferences all help explain inter-municipality collection rates. For instance, the collection rate appears to be positively affected by increases in the unemployment rate, the share of private houses, and the presence of immigrants (unless newly arrived) in the municipality. Themore » impacts of distance to recycling industry, urbanization rate and population density on collection outcomes turn out, though, to be both statistically and economically insignificant. A reasonable explanation for this is that the monetary compensation from the material companies to the collection entrepreneurs vary depending on region and is typically higher in high-cost regions. This implies that the plastic packaging collection in Sweden may be cost ineffective. Finally, the analysis also shows that municipalities that employ weight-based waste management fees generally experience higher collection rates than those municipalities in which flat and/or volume-based fees are used.« less

  4. bcROCsurface: an R package for correcting verification bias in estimation of the ROC surface and its volume for continuous diagnostic tests.

    PubMed

    To Duc, Khanh

    2017-11-18

    Receiver operating characteristic (ROC) surface analysis is usually employed to assess the accuracy of a medical diagnostic test when there are three ordered disease status (e.g. non-diseased, intermediate, diseased). In practice, verification bias can occur due to missingness of the true disease status and can lead to a distorted conclusion on diagnostic accuracy. In such situations, bias-corrected inference tools are required. This paper introduce an R package, named bcROCsurface, which provides utility functions for verification bias-corrected ROC surface analysis. The shiny web application of the correction for verification bias in estimation of the ROC surface analysis is also developed. bcROCsurface may become an important tool for the statistical evaluation of three-class diagnostic markers in presence of verification bias. The R package, readme and example data are available on CRAN. The web interface enables users less familiar with R to evaluate the accuracy of diagnostic tests, and can be found at http://khanhtoduc.shinyapps.io/bcROCsurface_shiny/ .

  5. A Divergence Statistics Extension to VTK for Performance Analysis

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Pebay, Philippe Pierre; Bennett, Janine Camille

    This report follows the series of previous documents ([PT08, BPRT09b, PT09, BPT09, PT10, PB13], where we presented the parallel descriptive, correlative, multi-correlative, principal component analysis, contingency, k -means, order and auto-correlative statistics engines which we developed within the Visualization Tool Kit ( VTK ) as a scalable, parallel and versatile statistics package. We now report on a new engine which we developed for the calculation of divergence statistics, a concept which we hereafter explain and whose main goal is to quantify the discrepancy, in a stasticial manner akin to measuring a distance, between an observed empirical distribution and a theoretical,more » "ideal" one. The ease of use of the new diverence statistics engine is illustrated by the means of C++ code snippets. Although this new engine does not yet have a parallel implementation, it has already been applied to HPC performance analysis, of which we provide an example.« less

  6. tscvh R Package: Computational of the two samples test on microarray-sequencing data

    NASA Astrophysics Data System (ADS)

    Fajriyah, Rohmatul; Rosadi, Dedi

    2017-12-01

    We present a new R package, a tscvh (two samples cross-variance homogeneity), as we called it. This package is a software of the cross-variance statistical test which has been proposed and introduced by Fajriyah ([3] and [4]), based on the cross-variance concept. The test can be used as an alternative test for the significance difference between two means when sample size is small, the situation which is usually appeared in the bioinformatics research. Based on its statistical distribution, the p-value can be also provided. The package is built under a homogeneity of variance between samples.

  7. Bayesian Hierarchical Random Effects Models in Forensic Science.

    PubMed

    Aitken, Colin G G

    2018-01-01

    Statistical modeling of the evaluation of evidence with the use of the likelihood ratio has a long history. It dates from the Dreyfus case at the end of the nineteenth century through the work at Bletchley Park in the Second World War to the present day. The development received a significant boost in 1977 with a seminal work by Dennis Lindley which introduced a Bayesian hierarchical random effects model for the evaluation of evidence with an example of refractive index measurements on fragments of glass. Many models have been developed since then. The methods have now been sufficiently well-developed and have become so widespread that it is timely to try and provide a software package to assist in their implementation. With that in mind, a project (SAILR: Software for the Analysis and Implementation of Likelihood Ratios) was funded by the European Network of Forensic Science Institutes through their Monopoly programme to develop a software package for use by forensic scientists world-wide that would assist in the statistical analysis and implementation of the approach based on likelihood ratios. It is the purpose of this document to provide a short review of a small part of this history. The review also provides a background, or landscape, for the development of some of the models within the SAILR package and references to SAILR as made as appropriate.

  8. 17 CFR Appendix A to Part 145 - Compilation of Commission Records Available to the Public

    Code of Federal Regulations, 2011 CFR

    2011-04-01

    ... photographs. (10) Statistical data concerning the Commission's budget. (11) Statistical data concerning...) Complaint packages, which contain the Reparation Rules, Brochure “Questions and Answers About How You Can... grain reports. (2) Weekly cotton or call reports. (f) Division of Enforcement. Complaint package...

  9. CSAM Metrology Software Tool

    NASA Technical Reports Server (NTRS)

    Vu, Duc; Sandor, Michael; Agarwal, Shri

    2005-01-01

    CSAM Metrology Software Tool (CMeST) is a computer program for analysis of false-color CSAM images of plastic-encapsulated microcircuits. (CSAM signifies C-mode scanning acoustic microscopy.) The colors in the images indicate areas of delamination within the plastic packages. Heretofore, the images have been interpreted by human examiners. Hence, interpretations have not been entirely consistent and objective. CMeST processes the color information in image-data files to detect areas of delamination without incurring inconsistencies of subjective judgement. CMeST can be used to create a database of baseline images of packages acquired at given times for comparison with images of the same packages acquired at later times. Any area within an image can be selected for analysis, which can include examination of different delamination types by location. CMeST can also be used to perform statistical analyses of image data. Results of analyses are available in a spreadsheet format for further processing. The results can be exported to any data-base-processing software.

  10. The IRAF Fabry-Perot analysis package: Ring fitting

    NASA Technical Reports Server (NTRS)

    Shopbell, P. L.; Bland-Hawthorn, J.; Cecil, G.

    1992-01-01

    As introduced at ADASSI, a Fabry-Perot analysis package for IRAF is currently under development as a joint effort of ourselves and Frank Valdes of the IRAF group. Although additional portions of the package were also implemented, we report primarily on the development of a robust ring fitting task, useful for fitting the calibration rings obtained in Fabry-Perot observations. The general equation of an ellipse is fit to the shape of the rings, providing information on ring center, ellipticity, and position angle. Such parameters provide valuable information on the wavelength response of the etalon and the geometric stability of the system. Appropriate statistical weighting is applied to the pixels to account for increasing numbers with radius, the Lorentzian cross-section, and uneven illumination. The major problems of incomplete, non-uniform, and multiple rings are addressed with the final task capable of fitting rings regardless of center, cross-section, or completion. The task requires only minimal user intervention, allowing large numbers of rings to be fit in an extremely automated manner.

  11. elevatr: Access Elevation Data from Various APIs | Science ...

    EPA Pesticide Factsheets

    Several web services are available that provide access to elevation data. This package provides access to several of those services and returns elevation data either as a SpatialPointsDataFrame from point elevation services or as a raster object from raster elevation services. Currently, the package supports access to the Mapzen Elevation Service, Mapzen Terrain Service, and the USGS Elevation Point Query Service. The R language for statistical computing is increasingly used for spatial data analysis . This R package, elevatr, is in response to this and provides access to elevation data from various sources directly in R. The impact of `elevatr` is that it will 1) facilitate spatial analysis in R by providing access to foundational dataset for many types of analyses (e.g. hydrology, limnology) 2) open up a new set of users and uses for APIs widely used outside of R, and 3) provide an excellent example federal open source development as promoted by the Federal Source Code Policy (https://sourcecode.cio.gov/).

  12. Ultrasonic test of resistance spot welds based on wavelet package analysis.

    PubMed

    Liu, Jing; Xu, Guocheng; Gu, Xiaopeng; Zhou, Guanghao

    2015-02-01

    In this paper, ultrasonic test of spot welds for stainless steel sheets has been studied. It is indicated that traditional ultrasonic signal analysis in either time domain or frequency domain remains inadequate to evaluate the nugget diameter of spot welds. However, the method based on wavelet package analysis in time-frequency domain can easily distinguish the nugget from the corona bond by extracting high-frequency signals in different positions of spot welds, thereby quantitatively evaluating the nugget diameter. The results of ultrasonic test fit the actual measured value well. Mean value of normal distribution of error statistics is 0.00187, and the standard deviation is 0.1392. Furthermore, the quality of spot welds was evaluated, and it is showed ultrasonic nondestructive test based on wavelet packet analysis can be used to evaluate the quality of spot welds, and it is more reliable than single tensile destructive test. Copyright © 2014 Elsevier B.V. All rights reserved.

  13. Sampling and sensitivity analyses tools (SaSAT) for computational modelling

    PubMed Central

    Hoare, Alexander; Regan, David G; Wilson, David P

    2008-01-01

    SaSAT (Sampling and Sensitivity Analysis Tools) is a user-friendly software package for applying uncertainty and sensitivity analyses to mathematical and computational models of arbitrary complexity and context. The toolbox is built in Matlab®, a numerical mathematical software package, and utilises algorithms contained in the Matlab® Statistics Toolbox. However, Matlab® is not required to use SaSAT as the software package is provided as an executable file with all the necessary supplementary files. The SaSAT package is also designed to work seamlessly with Microsoft Excel but no functionality is forfeited if that software is not available. A comprehensive suite of tools is provided to enable the following tasks to be easily performed: efficient and equitable sampling of parameter space by various methodologies; calculation of correlation coefficients; regression analysis; factor prioritisation; and graphical output of results, including response surfaces, tornado plots, and scatterplots. Use of SaSAT is exemplified by application to a simple epidemic model. To our knowledge, a number of the methods available in SaSAT for performing sensitivity analyses have not previously been used in epidemiological modelling and their usefulness in this context is demonstrated. PMID:18304361

  14. On fitting generalized linear mixed-effects models for binary responses using different statistical packages.

    PubMed

    Zhang, Hui; Lu, Naiji; Feng, Changyong; Thurston, Sally W; Xia, Yinglin; Zhu, Liang; Tu, Xin M

    2011-09-10

    The generalized linear mixed-effects model (GLMM) is a popular paradigm to extend models for cross-sectional data to a longitudinal setting. When applied to modeling binary responses, different software packages and even different procedures within a package may give quite different results. In this report, we describe the statistical approaches that underlie these different procedures and discuss their strengths and weaknesses when applied to fit correlated binary responses. We then illustrate these considerations by applying these procedures implemented in some popular software packages to simulated and real study data. Our simulation results indicate a lack of reliability for most of the procedures considered, which carries significant implications for applying such popular software packages in practice. Copyright © 2011 John Wiley & Sons, Ltd.

  15. ExGUtils: A Python Package for Statistical Analysis With the ex-Gaussian Probability Density.

    PubMed

    Moret-Tatay, Carmen; Gamermann, Daniel; Navarro-Pardo, Esperanza; Fernández de Córdoba Castellá, Pedro

    2018-01-01

    The study of reaction times and their underlying cognitive processes is an important field in Psychology. Reaction times are often modeled through the ex-Gaussian distribution, because it provides a good fit to multiple empirical data. The complexity of this distribution makes the use of computational tools an essential element. Therefore, there is a strong need for efficient and versatile computational tools for the research in this area. In this manuscript we discuss some mathematical details of the ex-Gaussian distribution and apply the ExGUtils package, a set of functions and numerical tools, programmed for python, developed for numerical analysis of data involving the ex-Gaussian probability density. In order to validate the package, we present an extensive analysis of fits obtained with it, discuss advantages and differences between the least squares and maximum likelihood methods and quantitatively evaluate the goodness of the obtained fits (which is usually an overlooked point in most literature in the area). The analysis done allows one to identify outliers in the empirical datasets and criteriously determine if there is a need for data trimming and at which points it should be done.

  16. ExGUtils: A Python Package for Statistical Analysis With the ex-Gaussian Probability Density

    PubMed Central

    Moret-Tatay, Carmen; Gamermann, Daniel; Navarro-Pardo, Esperanza; Fernández de Córdoba Castellá, Pedro

    2018-01-01

    The study of reaction times and their underlying cognitive processes is an important field in Psychology. Reaction times are often modeled through the ex-Gaussian distribution, because it provides a good fit to multiple empirical data. The complexity of this distribution makes the use of computational tools an essential element. Therefore, there is a strong need for efficient and versatile computational tools for the research in this area. In this manuscript we discuss some mathematical details of the ex-Gaussian distribution and apply the ExGUtils package, a set of functions and numerical tools, programmed for python, developed for numerical analysis of data involving the ex-Gaussian probability density. In order to validate the package, we present an extensive analysis of fits obtained with it, discuss advantages and differences between the least squares and maximum likelihood methods and quantitatively evaluate the goodness of the obtained fits (which is usually an overlooked point in most literature in the area). The analysis done allows one to identify outliers in the empirical datasets and criteriously determine if there is a need for data trimming and at which points it should be done. PMID:29765345

  17. SimExTargId: A comprehensive package for real-time LC-MS data acquisition and analysis.

    PubMed

    Edmands, William M B; Hayes, Josie; Rappaport, Stephen M

    2018-05-22

    Liquid chromatography mass spectrometry (LC-MS) is the favored method for untargeted metabolomic analysis of small molecules in biofluids. Here we present SimExTargId, an open-source R package for autonomous analysis of metabolomic data and real-time observation of experimental runs. This simultaneous, fully automated and multi-threaded (optional) package is a wrapper for vendor-independent format conversion (ProteoWizard), xcms- and CAMERA- based peak-picking, MetMSLine-based pre-processing and covariate-based statistical analysis. Users are notified of detrimental instrument drift or errors by email. Also included are two shiny applications, targetId for real-time MS2 target identification, and peakMonitor to monitor targeted metabolites. SimExTargId is publicly available under GNU LGPL v3.0 license at https://github.com/JosieLHayes/simExTargId, which includes a vignette with example data. SimExTargId should be installed on a dedicated data-processing workstation or server that is networked to the LC-MS platform to facilitate MS1 profiling of metabolomic data. josie.hayes@berkeley.edu. Supplementary data are available at Bioinformatics online.

  18. Survival analysis in hematologic malignancies: recommendations for clinicians

    PubMed Central

    Delgado, Julio; Pereira, Arturo; Villamor, Neus; López-Guillermo, Armando; Rozman, Ciril

    2014-01-01

    The widespread availability of statistical packages has undoubtedly helped hematologists worldwide in the analysis of their data, but has also led to the inappropriate use of statistical methods. In this article, we review some basic concepts of survival analysis and also make recommendations about how and when to perform each particular test using SPSS, Stata and R. In particular, we describe a simple way of defining cut-off points for continuous variables and the appropriate and inappropriate uses of the Kaplan-Meier method and Cox proportional hazard regression models. We also provide practical advice on how to check the proportional hazards assumption and briefly review the role of relative survival and multiple imputation. PMID:25176982

  19. No direct correlation between rotavirus diarrhea and breast feeding: A meta-analysis.

    PubMed

    Shen, Jian; Zhang, Bi-Meng; Zhu, Sheng-Guo; Chen, Jian-Jie

    2018-04-01

    Some studies indicated that children with exclusive breast feeding had a reduction in the prevalence of rotavirus diarrhea, while some others held the opposite views. In this study, we aimed to systematically find the associations between rotavirus diarrhea and breast feeding. A literature search up to June 2016 in electronic literature databases, including PubMed and Embase, was performed. The Newcastle-Ottawa Scale was used to conduct the quality assessment of all the selected studies. Statistical analyses were performed using the R package version 3.12 (R Foundation for Statistical Computing, Beijing1, China, meta package), and odds ratio (OR) and 95% confidence interval (CI) were used to assess the strength of the association. The heterogeneity was assessed by Cochran's Q-statistic and I 2 test, and the sensitivity analysis was performed by trimming one study at a time. A total of 17 articles, which included 10,841 participants, were investigated in the present meta-analysis. There was no significant difference between the case group and control group (OR, 0.59 95% CI 0.33-1.07) in the meta-analysis of exclusive breast feeding, and no significant difference was found between the case group and the control group (OR, 0.86; 95% CI 0.63-1.16) in the meta-analysis of breast feeding. No significant difference was found between the case group and control group (OR, 0.78 95% CI 0.59-1.04) for all quantitative data. There may be no direct correlation between rotavirus diarrhea and breast feeding. Copyright © 2017. Published by Elsevier B.V.

  20. A menu-driven software package of Bayesian nonparametric (and parametric) mixed models for regression analysis and density estimation.

    PubMed

    Karabatsos, George

    2017-02-01

    Most of applied statistics involves regression analysis of data. In practice, it is important to specify a regression model that has minimal assumptions which are not violated by data, to ensure that statistical inferences from the model are informative and not misleading. This paper presents a stand-alone and menu-driven software package, Bayesian Regression: Nonparametric and Parametric Models, constructed from MATLAB Compiler. Currently, this package gives the user a choice from 83 Bayesian models for data analysis. They include 47 Bayesian nonparametric (BNP) infinite-mixture regression models; 5 BNP infinite-mixture models for density estimation; and 31 normal random effects models (HLMs), including normal linear models. Each of the 78 regression models handles either a continuous, binary, or ordinal dependent variable, and can handle multi-level (grouped) data. All 83 Bayesian models can handle the analysis of weighted observations (e.g., for meta-analysis), and the analysis of left-censored, right-censored, and/or interval-censored data. Each BNP infinite-mixture model has a mixture distribution assigned one of various BNP prior distributions, including priors defined by either the Dirichlet process, Pitman-Yor process (including the normalized stable process), beta (two-parameter) process, normalized inverse-Gaussian process, geometric weights prior, dependent Dirichlet process, or the dependent infinite-probits prior. The software user can mouse-click to select a Bayesian model and perform data analysis via Markov chain Monte Carlo (MCMC) sampling. After the sampling completes, the software automatically opens text output that reports MCMC-based estimates of the model's posterior distribution and model predictive fit to the data. Additional text and/or graphical output can be generated by mouse-clicking other menu options. This includes output of MCMC convergence analyses, and estimates of the model's posterior predictive distribution, for selected functionals and values of covariates. The software is illustrated through the BNP regression analysis of real data.

  1. Proposing a Mathematical Software Tool in Physics Secondary Education

    ERIC Educational Resources Information Center

    Baltzis, Konstantinos B.

    2009-01-01

    MathCad® is a very popular software tool for mathematical and statistical analysis in science and engineering. Its low cost, ease of use, extensive function library, and worksheet-like user interface distinguish it among other commercial packages. Its features are also well suited to educational process. The use of natural mathematical notation…

  2. New Tools for "New" History: Computers and the Teaching of Quantitative Historical Methods.

    ERIC Educational Resources Information Center

    Burton, Orville Vernon; Finnegan, Terence

    1989-01-01

    Explains the development of an instructional software package and accompanying workbook which teaches students to apply computerized statistical analysis to historical data, improving the study of social history. Concludes that the use of microcomputers and supercomputers to manipulate historical data enhances critical thinking skills and the use…

  3. DCL System Research Using Advanced Approaches for Land-based or Ship-based Real-Time Recognition and Localization of Marine Mammals

    DTIC Science & Technology

    2012-09-30

    recognition. Algorithm design and statistical analysis and feature analysis. Post -Doctoral Associate, Cornell University, Bioacoustics Research...short. The HPC-ADA was designed based on fielded systems [1-4, 6] that offer a variety of desirable attributes, specifically dynamic resource...The software package was designed to utilize parallel and distributed processing for running recognition and other advanced algorithms. DeLMA

  4. HEART: an automated beat-to-beat cardiovascular analysis package using Matlab.

    PubMed

    Schroeder, M J Mark J; Perreault, Bill; Ewert, D L Daniel L; Koenig, S C Steven C

    2004-07-01

    A computer program is described for beat-to-beat analysis of cardiovascular parameters from high-fidelity pressure and flow waveforms. The Hemodynamic Estimation and Analysis Research Tool (HEART) is a post-processing analysis software package developed in Matlab that enables scientists and clinicians to document, load, view, calibrate, and analyze experimental data that have been digitally saved in ascii or binary format. Analysis routines include traditional hemodynamic parameter estimates as well as more sophisticated analyses such as lumped arterial model parameter estimation and vascular impedance frequency spectra. Cardiovascular parameter values of all analyzed beats can be viewed and statistically analyzed. An attractive feature of the HEART program is the ability to analyze data with visual quality assurance throughout the process, thus establishing a framework toward which Good Laboratory Practice (GLP) compliance can be obtained. Additionally, the development of HEART on the Matlab platform provides users with the flexibility to adapt or create study specific analysis files according to their specific needs. Copyright 2003 Elsevier Ltd.

  5. DMRfinder: efficiently identifying differentially methylated regions from MethylC-seq data.

    PubMed

    Gaspar, John M; Hart, Ronald P

    2017-11-29

    DNA methylation is an epigenetic modification that is studied at a single-base resolution with bisulfite treatment followed by high-throughput sequencing. After alignment of the sequence reads to a reference genome, methylation counts are analyzed to determine genomic regions that are differentially methylated between two or more biological conditions. Even though a variety of software packages is available for different aspects of the bioinformatics analysis, they often produce results that are biased or require excessive computational requirements. DMRfinder is a novel computational pipeline that identifies differentially methylated regions efficiently. Following alignment, DMRfinder extracts methylation counts and performs a modified single-linkage clustering of methylation sites into genomic regions. It then compares methylation levels using beta-binomial hierarchical modeling and Wald tests. Among its innovative attributes are the analyses of novel methylation sites and methylation linkage, as well as the simultaneous statistical analysis of multiple sample groups. To demonstrate its efficiency, DMRfinder is benchmarked against other computational approaches using a large published dataset. Contrasting two replicates of the same sample yielded minimal genomic regions with DMRfinder, whereas two alternative software packages reported a substantial number of false positives. Further analyses of biological samples revealed fundamental differences between DMRfinder and another software package, despite the fact that they utilize the same underlying statistical basis. For each step, DMRfinder completed the analysis in a fraction of the time required by other software. Among the computational approaches for identifying differentially methylated regions from high-throughput bisulfite sequencing datasets, DMRfinder is the first that integrates all the post-alignment steps in a single package. Compared to other software, DMRfinder is extremely efficient and unbiased in this process. DMRfinder is free and open-source software, available on GitHub ( github.com/jsh58/DMRfinder ); it is written in Python and R, and is supported on Linux.

  6. Nutritional quality and labelling of ready-to-eat breakfast cereals: the contribution of the French observatory of food quality.

    PubMed

    Goglia, R; Spiteri, M; Ménard, C; Dumas, C; Combris, P; Labarbe, B; Soler, L G; Volatier, J L

    2010-11-01

    To assess developments in the nutritional quality of food products in various food groups in France, an Observatory of Food Quality (Oqali) was created in 2008. To achieve its aims, Oqali built up a new database to describe each specific food item at the most detailed level, and also included economic parameters (market share and mean prices). The objective of this paper is to give a detailed analysis of the monitoring of the ready-to-eat breakfast cereals (RTEBCs) sector in order to show the benefits of the Oqali database. Analysis was limited to products with nutritional information on labels. Packaging was provided by manufacturers or retailers, or obtained by buying products in regular stores. Economic parameters were obtained from surveys on French food consumption and data from consumer purchase panels. The breakfast cereal sector was divided into 10 categories and 5 types of brand. Oqali has developed anonymous indicators to describe product characteristics for each category of RTEBC and each type of brand by cross-referencing nutritional values with economic data. Packaging-related data were also analysed. The major nutritional parameters studied were energy, protein, fat, saturated fat, carbohydrates, sugars, fibre and sodium. Analysis was performed on the basis of descriptive statistics, multivariate statistics and a Kruskal-Wallis test. For the RTEBC, there is large variability in nutrient content throughout the sector, both within and between product categories. There is no systematic relation between brand type and nutritional quality within each product category, and the proportion of brand type within each product category is different. Nutritional labels, claims and pictograms are widespread on packages but vary according to the type of brand. These findings form the basis for monitoring developments in the nutritional composition and packaging-related data for breakfast cereals in the future. The final objective is to expand the approach illustrated here to all food sectors progressively.

  7. Bayesian hypothesis testing for human threat conditioning research: an introduction and the condir R package

    PubMed Central

    Krypotos, Angelos-Miltiadis; Klugkist, Irene; Engelhard, Iris M.

    2017-01-01

    ABSTRACT Threat conditioning procedures have allowed the experimental investigation of the pathogenesis of Post-Traumatic Stress Disorder. The findings of these procedures have also provided stable foundations for the development of relevant intervention programs (e.g. exposure therapy). Statistical inference of threat conditioning procedures is commonly based on p-values and Null Hypothesis Significance Testing (NHST). Nowadays, however, there is a growing concern about this statistical approach, as many scientists point to the various limitations of p-values and NHST. As an alternative, the use of Bayes factors and Bayesian hypothesis testing has been suggested. In this article, we apply this statistical approach to threat conditioning data. In order to enable the easy computation of Bayes factors for threat conditioning data we present a new R package named condir, which can be used either via the R console or via a Shiny application. This article provides both a non-technical introduction to Bayesian analysis for researchers using the threat conditioning paradigm, and the necessary tools for computing Bayes factors easily. PMID:29038683

  8. Enabling More than Moore: Accelerated Reliability Testing and Risk Analysis for Advanced Electronics Packaging

    NASA Technical Reports Server (NTRS)

    Ghaffarian, Reza; Evans, John W.

    2014-01-01

    For five decades, the semiconductor industry has distinguished itself by the rapid pace of improvement in miniaturization of electronics products-Moore's Law. Now, scaling hits a brick wall, a paradigm shift. The industry roadmaps recognized the scaling limitation and project that packaging technologies will meet further miniaturization needs or ak.a "More than Moore". This paper presents packaging technology trends and accelerated reliability testing methods currently being practiced. Then, it presents industry status on key advanced electronic packages, factors affecting accelerated solder joint reliability of area array packages, and IPC/JEDEC/Mil specifications for characterizations of assemblies under accelerated thermal and mechanical loading. Finally, it presents an examples demonstrating how Accelerated Testing and Analysis have been effectively employed in the development of complex spacecraft thereby reducing risk. Quantitative assessments necessarily involve the mathematics of probability and statistics. In addition, accelerated tests need to be designed which consider the desired risk posture and schedule for particular project. Such assessments relieve risks without imposing additional costs. and constraints that are not value added for a particular mission. Furthermore, in the course of development of complex systems, variances and defects will inevitably present themselves and require a decision concerning their disposition, necessitating quantitative assessments. In summary, this paper presents a comprehensive view point, from technology to systems, including the benefits and impact of accelerated testing in offsetting risk.

  9. MAGNAMWAR: an R package for genome-wide association studies of bacterial orthologs.

    PubMed

    Sexton, Corinne E; Smith, Hayden Z; Newell, Peter D; Douglas, Angela E; Chaston, John M

    2018-06-01

    Here we report on an R package for genome-wide association studies of orthologous genes in bacteria. Before using the software, orthologs from bacterial genomes or metagenomes are defined using local or online implementations of OrthoMCL. These presence-absence patterns are statistically associated with variation in user-collected phenotypes using the Mono-Associated GNotobiotic Animals Metagenome-Wide Association R package (MAGNAMWAR). Genotype-phenotype associations can be performed with several different statistical tests based on the type and distribution of the data. MAGNAMWAR is available on CRAN. john_chaston@byu.edu.

  10. PyMVPA: A python toolbox for multivariate pattern analysis of fMRI data.

    PubMed

    Hanke, Michael; Halchenko, Yaroslav O; Sederberg, Per B; Hanson, Stephen José; Haxby, James V; Pollmann, Stefan

    2009-01-01

    Decoding patterns of neural activity onto cognitive states is one of the central goals of functional brain imaging. Standard univariate fMRI analysis methods, which correlate cognitive and perceptual function with the blood oxygenation-level dependent (BOLD) signal, have proven successful in identifying anatomical regions based on signal increases during cognitive and perceptual tasks. Recently, researchers have begun to explore new multivariate techniques that have proven to be more flexible, more reliable, and more sensitive than standard univariate analysis. Drawing on the field of statistical learning theory, these new classifier-based analysis techniques possess explanatory power that could provide new insights into the functional properties of the brain. However, unlike the wealth of software packages for univariate analyses, there are few packages that facilitate multivariate pattern classification analyses of fMRI data. Here we introduce a Python-based, cross-platform, and open-source software toolbox, called PyMVPA, for the application of classifier-based analysis techniques to fMRI datasets. PyMVPA makes use of Python's ability to access libraries written in a large variety of programming languages and computing environments to interface with the wealth of existing machine learning packages. We present the framework in this paper and provide illustrative examples on its usage, features, and programmability.

  11. PyMVPA: A Python toolbox for multivariate pattern analysis of fMRI data

    PubMed Central

    Hanke, Michael; Halchenko, Yaroslav O.; Sederberg, Per B.; Hanson, Stephen José; Haxby, James V.; Pollmann, Stefan

    2009-01-01

    Decoding patterns of neural activity onto cognitive states is one of the central goals of functional brain imaging. Standard univariate fMRI analysis methods, which correlate cognitive and perceptual function with the blood oxygenation-level dependent (BOLD) signal, have proven successful in identifying anatomical regions based on signal increases during cognitive and perceptual tasks. Recently, researchers have begun to explore new multivariate techniques that have proven to be more flexible, more reliable, and more sensitive than standard univariate analysis. Drawing on the field of statistical learning theory, these new classifier-based analysis techniques possess explanatory power that could provide new insights into the functional properties of the brain. However, unlike the wealth of software packages for univariate analyses, there are few packages that facilitate multivariate pattern classification analyses of fMRI data. Here we introduce a Python-based, cross-platform, and open-source software toolbox, called PyMVPA, for the application of classifier-based analysis techniques to fMRI datasets. PyMVPA makes use of Python's ability to access libraries written in a large variety of programming languages and computing environments to interface with the wealth of existing machine-learning packages. We present the framework in this paper and provide illustrative examples on its usage, features, and programmability. PMID:19184561

  12. SARTools: A DESeq2- and EdgeR-Based R Pipeline for Comprehensive Differential Analysis of RNA-Seq Data.

    PubMed

    Varet, Hugo; Brillet-Guéguen, Loraine; Coppée, Jean-Yves; Dillies, Marie-Agnès

    2016-01-01

    Several R packages exist for the detection of differentially expressed genes from RNA-Seq data. The analysis process includes three main steps, namely normalization, dispersion estimation and test for differential expression. Quality control steps along this process are recommended but not mandatory, and failing to check the characteristics of the dataset may lead to spurious results. In addition, normalization methods and statistical models are not exchangeable across the packages without adequate transformations the users are often not aware of. Thus, dedicated analysis pipelines are needed to include systematic quality control steps and prevent errors from misusing the proposed methods. SARTools is an R pipeline for differential analysis of RNA-Seq count data. It can handle designs involving two or more conditions of a single biological factor with or without a blocking factor (such as a batch effect or a sample pairing). It is based on DESeq2 and edgeR and is composed of an R package and two R script templates (for DESeq2 and edgeR respectively). Tuning a small number of parameters and executing one of the R scripts, users have access to the full results of the analysis, including lists of differentially expressed genes and a HTML report that (i) displays diagnostic plots for quality control and model hypotheses checking and (ii) keeps track of the whole analysis process, parameter values and versions of the R packages used. SARTools provides systematic quality controls of the dataset as well as diagnostic plots that help to tune the model parameters. It gives access to the main parameters of DESeq2 and edgeR and prevents untrained users from misusing some functionalities of both packages. By keeping track of all the parameters of the analysis process it fits the requirements of reproducible research.

  13. Statistical significance approximation in local trend analysis of high-throughput time-series data using the theory of Markov chains.

    PubMed

    Xia, Li C; Ai, Dongmei; Cram, Jacob A; Liang, Xiaoyi; Fuhrman, Jed A; Sun, Fengzhu

    2015-09-21

    Local trend (i.e. shape) analysis of time series data reveals co-changing patterns in dynamics of biological systems. However, slow permutation procedures to evaluate the statistical significance of local trend scores have limited its applications to high-throughput time series data analysis, e.g., data from the next generation sequencing technology based studies. By extending the theories for the tail probability of the range of sum of Markovian random variables, we propose formulae for approximating the statistical significance of local trend scores. Using simulations and real data, we show that the approximate p-value is close to that obtained using a large number of permutations (starting at time points >20 with no delay and >30 with delay of at most three time steps) in that the non-zero decimals of the p-values obtained by the approximation and the permutations are mostly the same when the approximate p-value is less than 0.05. In addition, the approximate p-value is slightly larger than that based on permutations making hypothesis testing based on the approximate p-value conservative. The approximation enables efficient calculation of p-values for pairwise local trend analysis, making large scale all-versus-all comparisons possible. We also propose a hybrid approach by integrating the approximation and permutations to obtain accurate p-values for significantly associated pairs. We further demonstrate its use with the analysis of the Polymouth Marine Laboratory (PML) microbial community time series from high-throughput sequencing data and found interesting organism co-occurrence dynamic patterns. The software tool is integrated into the eLSA software package that now provides accelerated local trend and similarity analysis pipelines for time series data. The package is freely available from the eLSA website: http://bitbucket.org/charade/elsa.

  14. Melanie II--a third-generation software package for analysis of two-dimensional electrophoresis images: I. Features and user interface.

    PubMed

    Appel, R D; Palagi, P M; Walther, D; Vargas, J R; Sanchez, J C; Ravier, F; Pasquali, C; Hochstrasser, D F

    1997-12-01

    Although two-dimensional electrophoresis (2-DE) computer analysis software packages have existed ever since 2-DE technology was developed, it is only now that the hardware and software technology allows large-scale studies to be performed on low-cost personal computers or workstations, and that setting up a 2-DE computer analysis system in a small laboratory is no longer considered a luxury. After a first attempt in the seventies and early eighties to develop 2-DE analysis software systems on hardware that had poor or even no graphical capabilities, followed in the late eighties by a wave of innovative software developments that were possible thanks to new graphical interface standards such as XWindows, a third generation of 2-DE analysis software packages has now come to maturity. It can be run on a variety of low-cost, general-purpose personal computers, thus making the purchase of a 2-DE analysis system easily attainable for even the smallest laboratory that is involved in proteome research. Melanie II 2-D PAGE, developed at the University Hospital of Geneva, is such a third-generation software system for 2-DE analysis. Based on unique image processing algorithms, this user-friendly object-oriented software package runs on multiple platforms, including Unix, MS-Windows 95 and NT, and Power Macintosh. It provides efficient spot detection and quantitation, state-of-the-art image comparison, statistical data analysis facilities, and is Internet-ready. Linked to proteome databases such as those available on the World Wide Web, it represents a valuable tool for the "Virtual Lab" of the post-genome area.

  15. The U.S. geological survey rass-statpac system for management and statistical reduction of geochemical data

    USGS Publications Warehouse

    VanTrump, G.; Miesch, A.T.

    1977-01-01

    RASS is an acronym for Rock Analysis Storage System and STATPAC, for Statistical Package. The RASS and STATPAC computer programs are integrated into the RASS-STATPAC system for the management and statistical reduction of geochemical data. The system, in its present form, has been in use for more than 9 yr by scores of U.S. Geological Survey geologists, geochemists, and other scientists engaged in a broad range of geologic and geochemical investigations. The principal advantage of the system is the flexibility afforded the user both in data searches and retrievals and in the manner of statistical treatment of data. The statistical programs provide for most types of statistical reduction normally used in geochemistry and petrology, but also contain bridges to other program systems for statistical processing and automatic plotting. ?? 1977.

  16. minet: A R/Bioconductor package for inferring large transcriptional networks using mutual information.

    PubMed

    Meyer, Patrick E; Lafitte, Frédéric; Bontempi, Gianluca

    2008-10-29

    This paper presents the R/Bioconductor package minet (version 1.1.6) which provides a set of functions to infer mutual information networks from a dataset. Once fed with a microarray dataset, the package returns a network where nodes denote genes, edges model statistical dependencies between genes and the weight of an edge quantifies the statistical evidence of a specific (e.g transcriptional) gene-to-gene interaction. Four different entropy estimators are made available in the package minet (empirical, Miller-Madow, Schurmann-Grassberger and shrink) as well as four different inference methods, namely relevance networks, ARACNE, CLR and MRNET. Also, the package integrates accuracy assessment tools, like F-scores, PR-curves and ROC-curves in order to compare the inferred network with a reference one. The package minet provides a series of tools for inferring transcriptional networks from microarray data. It is freely available from the Comprehensive R Archive Network (CRAN) as well as from the Bioconductor website.

  17. Spatial variation in the bacterial and denitrifying bacterial community in a biofilter treating subsurface agricultural drainage.

    PubMed

    Andrus, J Malia; Porter, Matthew D; Rodríguez, Luis F; Kuehlhorn, Timothy; Cooke, Richard A C; Zhang, Yuanhui; Kent, Angela D; Zilles, Julie L

    2014-02-01

    Denitrifying biofilters can remove agricultural nitrates from subsurface drainage, reducing nitrate pollution that contributes to coastal hypoxic zones. The performance and reliability of natural and engineered systems dependent upon microbially mediated processes, such as the denitrifying biofilters, can be affected by the spatial structure of their microbial communities. Furthermore, our understanding of the relationship between microbial community composition and function is influenced by the spatial distribution of samples.In this study we characterized the spatial structure of bacterial communities in a denitrifying biofilter in central Illinois. Bacterial communities were assessed using automated ribosomal intergenic spacer analysis for bacteria and terminal restriction fragment length polymorphism of nosZ for denitrifying bacteria.Non-metric multidimensional scaling and analysis of similarity (ANOSIM) analyses indicated that bacteria showed statistically significant spatial structure by depth and transect,while denitrifying bacteria did not exhibit significant spatial structure. For determination of spatial patterns, we developed a package of automated functions for the R statistical environment that allows directional analysis of microbial community composition data using either ANOSIM or Mantel statistics.Applying this package to the biofilter data, the flow path correlation range for the bacterial community was 6.4 m at the shallower, periodically in undated depth and 10.7 m at the deeper, continually submerged depth. These spatial structures suggest a strong influence of hydrology on the microbial community composition in these denitrifying biofilters. Understanding such spatial structure can also guide optimal sample collection strategies for microbial community analyses.

  18. Eta Squared, Partial Eta Squared, and Misreporting of Effect Size in Communication Research.

    ERIC Educational Resources Information Center

    Levine, Timothy R.; Hullett, Craig R.

    2002-01-01

    Alerts communication researchers to potential errors stemming from the use of SPSS (Statistical Package for the Social Sciences) to obtain estimates of eta squared in analysis of variance (ANOVA). Strives to clarify issues concerning the development and appropriate use of eta squared and partial eta squared in ANOVA. Discusses the reporting of…

  19. The Application of SPSS in Analyzing the Effect of English Vocabulary Strategy Instruction

    ERIC Educational Resources Information Center

    Chen, Shaoying

    2010-01-01

    The vocabulary learning is one of very important part in the college English teaching. Correct analysis of the result of vocabulary strategy instruction can offer feedbacks for English teaching, and help teachers to improve the teaching method. In this article, the issue how to use SPSS (Statistical Package for the Social Science) to…

  20. Cognition of and Demand for Education and Teaching in Medical Statistics in China: A Systematic Review and Meta-Analysis

    PubMed Central

    Li, Gaoming; Yi, Dali; Wu, Xiaojiao; Liu, Xiaoyu; Zhang, Yanqi; Liu, Ling; Yi, Dong

    2015-01-01

    Background Although a substantial number of studies focus on the teaching and application of medical statistics in China, few studies comprehensively evaluate the recognition of and demand for medical statistics. In addition, the results of these various studies differ and are insufficiently comprehensive and systematic. Objectives This investigation aimed to evaluate the general cognition of and demand for medical statistics by undergraduates, graduates, and medical staff in China. Methods We performed a comprehensive database search related to the cognition of and demand for medical statistics from January 2007 to July 2014 and conducted a meta-analysis of non-controlled studies with sub-group analysis for undergraduates, graduates, and medical staff. Results There are substantial differences with respect to the cognition of theory in medical statistics among undergraduates (73.5%), graduates (60.7%), and medical staff (39.6%). The demand for theory in medical statistics is high among graduates (94.6%), undergraduates (86.1%), and medical staff (88.3%). Regarding specific statistical methods, the cognition of basic statistical methods is higher than of advanced statistical methods. The demand for certain advanced statistical methods, including (but not limited to) multiple analysis of variance (ANOVA), multiple linear regression, and logistic regression, is higher than that for basic statistical methods. The use rates of the Statistical Package for the Social Sciences (SPSS) software and statistical analysis software (SAS) are only 55% and 15%, respectively. Conclusion The overall statistical competence of undergraduates, graduates, and medical staff is insufficient, and their ability to practically apply their statistical knowledge is limited, which constitutes an unsatisfactory state of affairs for medical statistics education. Because the demand for skills in this area is increasing, the need to reform medical statistics education in China has become urgent. PMID:26053876

  1. Cognition of and Demand for Education and Teaching in Medical Statistics in China: A Systematic Review and Meta-Analysis.

    PubMed

    Wu, Yazhou; Zhou, Liang; Li, Gaoming; Yi, Dali; Wu, Xiaojiao; Liu, Xiaoyu; Zhang, Yanqi; Liu, Ling; Yi, Dong

    2015-01-01

    Although a substantial number of studies focus on the teaching and application of medical statistics in China, few studies comprehensively evaluate the recognition of and demand for medical statistics. In addition, the results of these various studies differ and are insufficiently comprehensive and systematic. This investigation aimed to evaluate the general cognition of and demand for medical statistics by undergraduates, graduates, and medical staff in China. We performed a comprehensive database search related to the cognition of and demand for medical statistics from January 2007 to July 2014 and conducted a meta-analysis of non-controlled studies with sub-group analysis for undergraduates, graduates, and medical staff. There are substantial differences with respect to the cognition of theory in medical statistics among undergraduates (73.5%), graduates (60.7%), and medical staff (39.6%). The demand for theory in medical statistics is high among graduates (94.6%), undergraduates (86.1%), and medical staff (88.3%). Regarding specific statistical methods, the cognition of basic statistical methods is higher than of advanced statistical methods. The demand for certain advanced statistical methods, including (but not limited to) multiple analysis of variance (ANOVA), multiple linear regression, and logistic regression, is higher than that for basic statistical methods. The use rates of the Statistical Package for the Social Sciences (SPSS) software and statistical analysis software (SAS) are only 55% and 15%, respectively. The overall statistical competence of undergraduates, graduates, and medical staff is insufficient, and their ability to practically apply their statistical knowledge is limited, which constitutes an unsatisfactory state of affairs for medical statistics education. Because the demand for skills in this area is increasing, the need to reform medical statistics education in China has become urgent.

  2. mESAdb: microRNA Expression and Sequence Analysis Database

    PubMed Central

    Kaya, Koray D.; Karakülah, Gökhan; Yakıcıer, Cengiz M.; Acar, Aybar C.; Konu, Özlen

    2011-01-01

    microRNA expression and sequence analysis database (http://konulab.fen.bilkent.edu.tr/mirna/) (mESAdb) is a regularly updated database for the multivariate analysis of sequences and expression of microRNAs from multiple taxa. mESAdb is modular and has a user interface implemented in PHP and JavaScript and coupled with statistical analysis and visualization packages written for the R language. The database primarily comprises mature microRNA sequences and their target data, along with selected human, mouse and zebrafish expression data sets. mESAdb analysis modules allow (i) mining of microRNA expression data sets for subsets of microRNAs selected manually or by motif; (ii) pair-wise multivariate analysis of expression data sets within and between taxa; and (iii) association of microRNA subsets with annotation databases, HUGE Navigator, KEGG and GO. The use of existing and customized R packages facilitates future addition of data sets and analysis tools. Furthermore, the ability to upload and analyze user-specified data sets makes mESAdb an interactive and expandable analysis tool for microRNA sequence and expression data. PMID:21177657

  3. mESAdb: microRNA expression and sequence analysis database.

    PubMed

    Kaya, Koray D; Karakülah, Gökhan; Yakicier, Cengiz M; Acar, Aybar C; Konu, Ozlen

    2011-01-01

    microRNA expression and sequence analysis database (http://konulab.fen.bilkent.edu.tr/mirna/) (mESAdb) is a regularly updated database for the multivariate analysis of sequences and expression of microRNAs from multiple taxa. mESAdb is modular and has a user interface implemented in PHP and JavaScript and coupled with statistical analysis and visualization packages written for the R language. The database primarily comprises mature microRNA sequences and their target data, along with selected human, mouse and zebrafish expression data sets. mESAdb analysis modules allow (i) mining of microRNA expression data sets for subsets of microRNAs selected manually or by motif; (ii) pair-wise multivariate analysis of expression data sets within and between taxa; and (iii) association of microRNA subsets with annotation databases, HUGE Navigator, KEGG and GO. The use of existing and customized R packages facilitates future addition of data sets and analysis tools. Furthermore, the ability to upload and analyze user-specified data sets makes mESAdb an interactive and expandable analysis tool for microRNA sequence and expression data.

  4. User’s guide for MapMark4GUI—A graphical user interface for the MapMark4 R package

    USGS Publications Warehouse

    Shapiro, Jason

    2018-05-29

    MapMark4GUI is an R graphical user interface (GUI) developed by the U.S. Geological Survey to support user implementation of the MapMark4 R statistical software package. MapMark4 was developed by the U.S. Geological Survey to implement probability calculations for simulating undiscovered mineral resources in quantitative mineral resource assessments. The GUI provides an easy-to-use tool to input data, run simulations, and format output results for the MapMark4 package. The GUI is written and accessed in the R statistical programming language. This user’s guide includes instructions on installing and running MapMark4GUI and descriptions of the statistical output processes, output files, and test data files.

  5. Guidelines for the analysis of free energy calculations.

    PubMed

    Klimovich, Pavel V; Shirts, Michael R; Mobley, David L

    2015-05-01

    Free energy calculations based on molecular dynamics simulations show considerable promise for applications ranging from drug discovery to prediction of physical properties and structure-function studies. But these calculations are still difficult and tedious to analyze, and best practices for analysis are not well defined or propagated. Essentially, each group analyzing these calculations needs to decide how to conduct the analysis and, usually, develop its own analysis tools. Here, we review and recommend best practices for analysis yielding reliable free energies from molecular simulations. Additionally, we provide a Python tool, alchemical-analysis.py, freely available on GitHub as part of the pymbar package (located at http://github.com/choderalab/pymbar), that implements the analysis practices reviewed here for several reference simulation packages, which can be adapted to handle data from other packages. Both this review and the tool covers analysis of alchemical calculations generally, including free energy estimates via both thermodynamic integration and free energy perturbation-based estimators. Our Python tool also handles output from multiple types of free energy calculations, including expanded ensemble and Hamiltonian replica exchange, as well as standard fixed ensemble calculations. We also survey a range of statistical and graphical ways of assessing the quality of the data and free energy estimates, and provide prototypes of these in our tool. We hope this tool and discussion will serve as a foundation for more standardization of and agreement on best practices for analysis of free energy calculations.

  6. The Risa R/Bioconductor package: integrative data analysis from experimental metadata and back again.

    PubMed

    González-Beltrán, Alejandra; Neumann, Steffen; Maguire, Eamonn; Sansone, Susanna-Assunta; Rocca-Serra, Philippe

    2014-01-01

    The ISA-Tab format and software suite have been developed to break the silo effect induced by technology-specific formats for a variety of data types and to better support experimental metadata tracking. Experimentalists seldom use a single technique to monitor biological signals. Providing a multi-purpose, pragmatic and accessible format that abstracts away common constructs for describing Investigations, Studies and Assays, ISA is increasingly popular. To attract further interest towards the format and extend support to ensure reproducible research and reusable data, we present the Risa package, which delivers a central component to support the ISA format by enabling effortless integration with R, the popular, open source data crunching environment. The Risa package bridges the gap between the metadata collection and curation in an ISA-compliant way and the data analysis using the widely used statistical computing environment R. The package offers functionality for: i) parsing ISA-Tab datasets into R objects, ii) augmenting annotation with extra metadata not explicitly stated in the ISA syntax; iii) interfacing with domain specific R packages iv) suggesting potentially useful R packages available in Bioconductor for subsequent processing of the experimental data described in the ISA format; and finally v) saving back to ISA-Tab files augmented with analysis specific metadata from R. We demonstrate these features by presenting use cases for mass spectrometry data and DNA microarray data. The Risa package is open source (with LGPL license) and freely available through Bioconductor. By making Risa available, we aim to facilitate the task of processing experimental data, encouraging a uniform representation of experimental information and results while delivering tools for ensuring traceability and provenance tracking. The Risa package is available since Bioconductor 2.11 (version 1.0.0) and version 1.2.1 appeared in Bioconductor 2.12, both along with documentation and examples. The latest version of the code is at the development branch in Bioconductor and can also be accessed from GitHub https://github.com/ISA-tools/Risa, where the issue tracker allows users to report bugs or feature requests.

  7. The Risa R/Bioconductor package: integrative data analysis from experimental metadata and back again

    PubMed Central

    2014-01-01

    Background The ISA-Tab format and software suite have been developed to break the silo effect induced by technology-specific formats for a variety of data types and to better support experimental metadata tracking. Experimentalists seldom use a single technique to monitor biological signals. Providing a multi-purpose, pragmatic and accessible format that abstracts away common constructs for describing Investigations, Studies and Assays, ISA is increasingly popular. To attract further interest towards the format and extend support to ensure reproducible research and reusable data, we present the Risa package, which delivers a central component to support the ISA format by enabling effortless integration with R, the popular, open source data crunching environment. Results The Risa package bridges the gap between the metadata collection and curation in an ISA-compliant way and the data analysis using the widely used statistical computing environment R. The package offers functionality for: i) parsing ISA-Tab datasets into R objects, ii) augmenting annotation with extra metadata not explicitly stated in the ISA syntax; iii) interfacing with domain specific R packages iv) suggesting potentially useful R packages available in Bioconductor for subsequent processing of the experimental data described in the ISA format; and finally v) saving back to ISA-Tab files augmented with analysis specific metadata from R. We demonstrate these features by presenting use cases for mass spectrometry data and DNA microarray data. Conclusions The Risa package is open source (with LGPL license) and freely available through Bioconductor. By making Risa available, we aim to facilitate the task of processing experimental data, encouraging a uniform representation of experimental information and results while delivering tools for ensuring traceability and provenance tracking. Software availability The Risa package is available since Bioconductor 2.11 (version 1.0.0) and version 1.2.1 appeared in Bioconductor 2.12, both along with documentation and examples. The latest version of the code is at the development branch in Bioconductor and can also be accessed from GitHub https://github.com/ISA-tools/Risa, where the issue tracker allows users to report bugs or feature requests. PMID:24564732

  8. Immunohistochemical Analysis of the Role Connective Tissue Growth Factor in Drug-induced Gingival Overgrowth in Response to Phenytoin, Cyclosporine, and Nifedipine

    PubMed Central

    Anand, A. J.; Gopalakrishnan, Sivaram; Karthikeyan, R.; Mishra, Debasish; Mohapatra, Shreeyam

    2018-01-01

    Objective: To evaluate for the presence of connective tissue growth factor (CTGF) in drug (phenytoin, cyclosporine, and nifedipine)-induced gingival overgrowth (DIGO) and to compare it with healthy controls in the absence of overgrowth. Materials and Methods: Thirty-five patients were chosen for the study and segregated into study (25) and control groups (10). The study group consisted of phenytoin-induced (10), cyclosporine-induced (10), and nifedipine-induced (5) gingival overgrowth. After completing necessary medical evaluations, biopsy was done. The tissue samples were fixed in 10% formalin and then immunohistochemically evaluated for the presence of CTGF. The statistical analysis of the values was done using statistical package SPSS PC+ (Statistical Package for the Social Sciences, version 4.01). Results: The outcome of immunohistochemistry shows that DIGO samples express more CTGF than control group and phenytoin expresses more CTGF followed by nifedipine and cyclosporine. Conclusion: The study shows that there is an increase in the levels of CTGF in patients with DIGO in comparison to the control group without any gingival overgrowth. In the study, we compared the levels of CTGF in DIGO induced by three most commonly used drugs phenytoin, cyclosporine, and nifedipine. By comparing the levels of CTGF, we find that cyclosporine induces the production of least amount of CTGF. Therefore, it might be a more viable drug choice with reduced side effects. PMID:29629324

  9. diffuStats: an R package to compute diffusion-based scores on biological networks.

    PubMed

    Picart-Armada, Sergio; Thompson, Wesley K; Buil, Alfonso; Perera-Lluna, Alexandre

    2018-02-01

    Label propagation and diffusion over biological networks are a common mathematical formalism in computational biology for giving context to molecular entities and prioritizing novel candidates in the area of study. There are several choices in conceiving the diffusion process-involving the graph kernel, the score definitions and the presence of a posterior statistical normalization-which have an impact on the results. This manuscript describes diffuStats, an R package that provides a collection of graph kernels and diffusion scores, as well as a parallel permutation analysis for the normalized scores, that eases the computation of the scores and their benchmarking for an optimal choice. The R package diffuStats is publicly available in Bioconductor, https://bioconductor.org, under the GPL-3 license. sergi.picart@upc.edu. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com

  10. MutSpec: a Galaxy toolbox for streamlined analyses of somatic mutation spectra in human and mouse cancer genomes.

    PubMed

    Ardin, Maude; Cahais, Vincent; Castells, Xavier; Bouaoun, Liacine; Byrnes, Graham; Herceg, Zdenko; Zavadil, Jiri; Olivier, Magali

    2016-04-18

    The nature of somatic mutations observed in human tumors at single gene or genome-wide levels can reveal information on past carcinogenic exposures and mutational processes contributing to tumor development. While large amounts of sequencing data are being generated, the associated analysis and interpretation of mutation patterns that may reveal clues about the natural history of cancer present complex and challenging tasks that require advanced bioinformatics skills. To make such analyses accessible to a wider community of researchers with no programming expertise, we have developed within the web-based user-friendly platform Galaxy a first-of-its-kind package called MutSpec. MutSpec includes a set of tools that perform variant annotation and use advanced statistics for the identification of mutation signatures present in cancer genomes and for comparing the obtained signatures with those published in the COSMIC database and other sources. MutSpec offers an accessible framework for building reproducible analysis pipelines, integrating existing methods and scripts developed in-house with publicly available R packages. MutSpec may be used to analyse data from whole-exome, whole-genome or targeted sequencing experiments performed on human or mouse genomes. Results are provided in various formats including rich graphical outputs. An example is presented to illustrate the package functionalities, the straightforward workflow analysis and the richness of the statistics and publication-grade graphics produced by the tool. MutSpec offers an easy-to-use graphical interface embedded in the popular Galaxy platform that can be used by researchers with limited programming or bioinformatics expertise to analyse mutation signatures present in cancer genomes. MutSpec can thus effectively assist in the discovery of complex mutational processes resulting from exogenous and endogenous carcinogenic insults.

  11. Using Cell-ID 1.4 with R for Microscope-Based Cytometry

    PubMed Central

    Bush, Alan; Chernomoretz, Ariel; Yu, Richard; Gordon, Andrew

    2012-01-01

    This unit describes a method for quantifying various cellular features (e.g., volume, total and subcellular fluorescence localization) from sets of microscope images of individual cells. It includes procedures for tracking cells over time. One purposefully defocused transmission image (sometimes referred to as bright-field or BF) is acquired to segment the image and locate each cell. Fluorescent images (one for each of the color channels to be analyzed) are then acquired by conventional wide-field epifluorescence or confocal microscopy. This method uses the image processing capabilities of Cell-ID (Gordon et al., 2007, as updated here) and data analysis by the statistical programming framework R (R-Development-Team, 2008), which we have supplemented with a package of routines for analyzing Cell-ID output. Both Cell-ID and the analysis package are open-source. PMID:23026908

  12. MutAIT: an online genetic toxicology data portal and analysis tools.

    PubMed

    Avancini, Daniele; Menzies, Georgina E; Morgan, Claire; Wills, John; Johnson, George E; White, Paul A; Lewis, Paul D

    2016-05-01

    Assessment of genetic toxicity and/or carcinogenic activity is an essential element of chemical screening programs employed to protect human health. Dose-response and gene mutation data are frequently analysed by industry, academia and governmental agencies for regulatory evaluations and decision making. Over the years, a number of efforts at different institutions have led to the creation and curation of databases to house genetic toxicology data, largely, with the aim of providing public access to facilitate research and regulatory assessments. This article provides a brief introduction to a new genetic toxicology portal called Mutation Analysis Informatics Tools (MutAIT) (www.mutait.org) that provides easy access to two of the largest genetic toxicology databases, the Mammalian Gene Mutation Database (MGMD) and TransgenicDB. TransgenicDB is a comprehensive collection of transgenic rodent mutation data initially compiled and collated by Health Canada. The updated MGMD contains approximately 50 000 individual mutation spectral records from the published literature. The portal not only gives access to an enormous quantity of genetic toxicology data, but also provides statistical tools for dose-response analysis and calculation of benchmark dose. Two important R packages for dose-response analysis are provided as web-distributed applications with user-friendly graphical interfaces. The 'drsmooth' package performs dose-response shape analysis and determines various points of departure (PoD) metrics and the 'PROAST' package provides algorithms for dose-response modelling. The MutAIT statistical tools, which are currently being enhanced, provide users with an efficient and comprehensive platform to conduct quantitative dose-response analyses and determine PoD values that can then be used to calculate human exposure limits or margins of exposure. © The Author 2015. Published by Oxford University Press on behalf of the UK Environmental Mutagen Society. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  13. Parallel Climate Data Assimilation PSAS Package Achieves 18 GFLOPs on 512-Node Intel Paragon

    NASA Technical Reports Server (NTRS)

    Ding, H. Q.; Chan, C.; Gennery, D. B.; Ferraro, R. D.

    1995-01-01

    Several algorithms were added to the Physical-space Statistical Analysis System (PSAS) from Goddard, which assimilates observational weather data by correcting for different levels of uncertainty about the data and different locations for mobile observation platforms. The new algorithms and use of the 512-node Intel Paragon allowed a hundred-fold decrease in processing time.

  14. Exact and Monte carlo resampling procedures for the Wilcoxon-Mann-Whitney and Kruskal-Wallis tests.

    PubMed

    Berry, K J; Mielke, P W

    2000-12-01

    Exact and Monte Carlo resampling FORTRAN programs are described for the Wilcoxon-Mann-Whitney rank sum test and the Kruskal-Wallis one-way analysis of variance for ranks test. The program algorithms compensate for tied values and do not depend on asymptotic approximations for probability values, unlike most algorithms contained in PC-based statistical software packages.

  15. DR-Integrator: a new analytic tool for integrating DNA copy number and gene expression data.

    PubMed

    Salari, Keyan; Tibshirani, Robert; Pollack, Jonathan R

    2010-02-01

    DNA copy number alterations (CNA) frequently underlie gene expression changes by increasing or decreasing gene dosage. However, only a subset of genes with altered dosage exhibit concordant changes in gene expression. This subset is likely to be enriched for oncogenes and tumor suppressor genes, and can be identified by integrating these two layers of genome-scale data. We introduce DNA/RNA-Integrator (DR-Integrator), a statistical software tool to perform integrative analyses on paired DNA copy number and gene expression data. DR-Integrator identifies genes with significant correlations between DNA copy number and gene expression, and implements a supervised analysis that captures genes with significant alterations in both DNA copy number and gene expression between two sample classes. DR-Integrator is freely available for non-commercial use from the Pollack Lab at http://pollacklab.stanford.edu/ and can be downloaded as a plug-in application to Microsoft Excel and as a package for the R statistical computing environment. The R package is available under the name 'DRI' at http://cran.r-project.org/. An example analysis using DR-Integrator is included as supplemental material. Supplementary data are available at Bioinformatics online.

  16. THE CAUSAL ANALYSIS / DIAGNOSIS DECISION ...

    EPA Pesticide Factsheets

    CADDIS is an on-line decision support system that helps investigators in the regions, states and tribes find, access, organize, use and share information to produce causal evaluations in aquatic systems. It is based on the US EPA's Stressor Identification process which is a formal method for identifying causes of impairments in aquatic systems. CADDIS 2007 increases access to relevant information useful for causal analysis and provides methods and tools that practitioners can use to analyze their own data. The new Candidate Cause section provides overviews of commonly encountered causes of impairments to aquatic systems: metals, sediments, nutrients, flow alteration, temperature, ionic strength, and low dissolved oxygen. CADDIS includes new Conceptual Models that illustrate the relationships from sources to stressors to biological effects. An Interactive Conceptual Model for phosphorus links the diagram with supporting literature citations. The new Analyzing Data section helps practitioners analyze their data sets and interpret and use those results as evidence within the USEPA causal assessment process. Downloadable tools include a graphical user interface statistical package (CADStat), and programs for use with the freeware R statistical package, and a Microsoft Excel template. These tools can be used to quantify associations between causes and biological impairments using innovative methods such as species-sensitivity distributions, biological inferenc

  17. Readability Analysis of the Package Leaflets for Biological Medicines Available on the Internet Between 2007 and 2013: An Analytical Longitudinal Study

    PubMed Central

    2016-01-01

    Background The package leaflet included in the packaging of all medicinal products plays an important role in the transmission of medicine-related information to patients. Therefore, in 2009, the European Commission published readability guidelines to try to ensure that the information contained in the package leaflet is understood by patients. Objective The main objective of this study was to calculate and compare the readability levels and length (number of words) of the package leaflets for biological medicines in 2007, 2010, and 2013. Methods The sample of this study included 36 biological medicine package leaflets that were downloaded from the European Medicines Agency website in three different years: 2007, 2010, and 2013. The readability of the selected package leaflets was obtained using the following readability formulas: SMOG grade, Flesch-Kincaid grade level, and Szigriszt’s perspicuity index. The length (number of words) of the package leaflets was also measured. Afterwards, the relationship between these quantitative variables (three readability indexes and length) and categorical (or qualitative) variables were analyzed. The categorical variables were the year when the package leaflet was downloaded, the package leaflet section, type of medicine, year of authorization of biological medicine, and marketing authorization holder. Results The readability values of all the package leaflets exceeded the sixth-grade reading level, which is the recommended value for health-related written materials. No statistically significant differences were found between the three years of study in the readability indexes, although differences were observed in the case of the length (P=.002), which increased over the study period. When the relationship between readability indexes and length and the other variables was analyzed, statistically significant differences were found between package leaflet sections (P<.001) and between the groups of medicine only with regard to the length over the three studied years (P=.002 in 2007, P=.007 in 2010, P=.009 in 2013). Linear correlation was observed between the readability indexes (SMOG grade and Flesch-Kincaid grade level: r2=.92; SMOG grade and Szigriszt’s perspicuity index: r2=.81; Flesch-Kincaid grade level and Szigriszt’s perspicuity index: r2=.95), but not between the readability indexes and the length (length and SMOG grade: r2=.05; length and Flesch-Kincaid grade level: r2=.03; length and Szigriszt’s perspicuity index: r2=.02). Conclusions There was no improvement in the readability of the package leaflets studied between 2007 and 2013 despite the European Commission’s 2009 guideline on the readability of package leaflets. The results obtained from the different readability formulas coincided from a qualitative point of view. Efforts to improve the readability of package leaflets for biological medicines are required to promote the understandability and accessibility of this online health information by patients and thereby contribute to the appropriate use of medicines and medicine safety. PMID:27226241

  18. Readability Analysis of the Package Leaflets for Biological Medicines Available on the Internet Between 2007 and 2013: An Analytical Longitudinal Study.

    PubMed

    Piñero-López, María Ángeles; Modamio, Pilar; Lastra, Cecilia F; Mariño, Eduardo L

    2016-05-25

    The package leaflet included in the packaging of all medicinal products plays an important role in the transmission of medicine-related information to patients. Therefore, in 2009, the European Commission published readability guidelines to try to ensure that the information contained in the package leaflet is understood by patients. The main objective of this study was to calculate and compare the readability levels and length (number of words) of the package leaflets for biological medicines in 2007, 2010, and 2013. The sample of this study included 36 biological medicine package leaflets that were downloaded from the European Medicines Agency website in three different years: 2007, 2010, and 2013. The readability of the selected package leaflets was obtained using the following readability formulas: SMOG grade, Flesch-Kincaid grade level, and Szigriszt's perspicuity index. The length (number of words) of the package leaflets was also measured. Afterwards, the relationship between these quantitative variables (three readability indexes and length) and categorical (or qualitative) variables were analyzed. The categorical variables were the year when the package leaflet was downloaded, the package leaflet section, type of medicine, year of authorization of biological medicine, and marketing authorization holder. The readability values of all the package leaflets exceeded the sixth-grade reading level, which is the recommended value for health-related written materials. No statistically significant differences were found between the three years of study in the readability indexes, although differences were observed in the case of the length (P=.002), which increased over the study period. When the relationship between readability indexes and length and the other variables was analyzed, statistically significant differences were found between package leaflet sections (P<.001) and between the groups of medicine only with regard to the length over the three studied years (P=.002 in 2007, P=.007 in 2010, P=.009 in 2013). Linear correlation was observed between the readability indexes (SMOG grade and Flesch-Kincaid grade level: r(2)=.92; SMOG grade and Szigriszt's perspicuity index: r(2)=.81; Flesch-Kincaid grade level and Szigriszt's perspicuity index: r(2)=.95), but not between the readability indexes and the length (length and SMOG grade: r(2)=.05; length and Flesch-Kincaid grade level: r(2)=.03; length and Szigriszt's perspicuity index: r(2)=.02). There was no improvement in the readability of the package leaflets studied between 2007 and 2013 despite the European Commission's 2009 guideline on the readability of package leaflets. The results obtained from the different readability formulas coincided from a qualitative point of view. Efforts to improve the readability of package leaflets for biological medicines are required to promote the understandability and accessibility of this online health information by patients and thereby contribute to the appropriate use of medicines and medicine safety.

  19. The Psychometric Toolbox: An Excel Package for Use in Measurement and Psychometrics Courses

    ERIC Educational Resources Information Center

    Ferrando, Pere J.; Masip-Cabrera, Antoni; Navarro-González, David; Lorenzo-Seva, Urbano

    2017-01-01

    The Psychometric Toolbox (PT) is a user-friendly, non-commercial package mainly intended to be used for instructional purposes in introductory courses of educational and psychological measurement, psychometrics and statistics. The PT package is organized in six separate modules or sub-programs: Data preprocessor (descriptive analyses and data…

  20. Quality Assurance Information for R Packages "aqfig" and "M3"

    EPA Science Inventory

    R packages “aqfig" and “M3" are optional modules for use with R statistical software (http://www.r-project.org). Package “aqfig" contains functions to aid users in the preparation of publication-quality figures for the display of air quality and other environmental data (e.g., le...

  1. Optimisation of colour stability of cured ham during packaging and retail display by a multifactorial design.

    PubMed

    Møller, Jens K S; Jakobsen, Marianne; Weber, Claus J; Martinussen, Torben; Skibsted, Leif H; Bertelsen, Grete

    2003-02-01

    A multifactorial design, including (1) percent residual oxygen, (2) oxygen transmission rate of packaging film (OTR), (3) product to headspace volume ratio, (4) illuminance level and (5) nitrite level during curing, was established to investigate factors affecting light-induced oxidative discoloration of cured ham (packaged in modified atmosphere of 20% carbon dioxide and balanced with nitrogen) during 14 days of chill storage. Univariate statistical analysis found significant effects of all main factors on the redness (tristimulus a-value) of the ham. Subsequently, Response Surface Modelling of the data further proved that the interactions between packaging and storage conditions are important when optimising colour stability. The measured content of oxygen in the headspace was incorporated in the model and the interaction between measured oxygen content in the headspace and the product to headspace volume ratio was found to be crucial. Thus, it is not enough to keep the headspace oxygen level low, if the headspace volume at the same time is large, there will still be sufficient oxygen for colour deteriorating processes to take place.

  2. ChIPpeakAnno: a Bioconductor package to annotate ChIP-seq and ChIP-chip data

    PubMed Central

    2010-01-01

    Background Chromatin immunoprecipitation (ChIP) followed by high-throughput sequencing (ChIP-seq) or ChIP followed by genome tiling array analysis (ChIP-chip) have become standard technologies for genome-wide identification of DNA-binding protein target sites. A number of algorithms have been developed in parallel that allow identification of binding sites from ChIP-seq or ChIP-chip datasets and subsequent visualization in the University of California Santa Cruz (UCSC) Genome Browser as custom annotation tracks. However, summarizing these tracks can be a daunting task, particularly if there are a large number of binding sites or the binding sites are distributed widely across the genome. Results We have developed ChIPpeakAnno as a Bioconductor package within the statistical programming environment R to facilitate batch annotation of enriched peaks identified from ChIP-seq, ChIP-chip, cap analysis of gene expression (CAGE) or any experiments resulting in a large number of enriched genomic regions. The binding sites annotated with ChIPpeakAnno can be viewed easily as a table, a pie chart or plotted in histogram form, i.e., the distribution of distances to the nearest genes for each set of peaks. In addition, we have implemented functionalities for determining the significance of overlap between replicates or binding sites among transcription factors within a complex, and for drawing Venn diagrams to visualize the extent of the overlap between replicates. Furthermore, the package includes functionalities to retrieve sequences flanking putative binding sites for PCR amplification, cloning, or motif discovery, and to identify Gene Ontology (GO) terms associated with adjacent genes. Conclusions ChIPpeakAnno enables batch annotation of the binding sites identified from ChIP-seq, ChIP-chip, CAGE or any technology that results in a large number of enriched genomic regions within the statistical programming environment R. Allowing users to pass their own annotation data such as a different Chromatin immunoprecipitation (ChIP) preparation and a dataset from literature, or existing annotation packages, such as GenomicFeatures and BSgenome, provides flexibility. Tight integration to the biomaRt package enables up-to-date annotation retrieval from the BioMart database. PMID:20459804

  3. User manual for Blossom statistical package for R

    USGS Publications Warehouse

    Talbert, Marian; Cade, Brian S.

    2005-01-01

    Blossom is an R package with functions for making statistical comparisons with distance-function based permutation tests developed by P.W. Mielke, Jr. and colleagues at Colorado State University (Mielke and Berry, 2001) and for testing parameters estimated in linear models with permutation procedures developed by B. S. Cade and colleagues at the Fort Collins Science Center, U.S. Geological Survey. This manual is intended to provide identical documentation of the statistical methods and interpretations as the manual by Cade and Richards (2005) does for the original Fortran program, but with changes made with respect to command inputs and outputs to reflect the new implementation as a package for R (R Development Core Team, 2012). This implementation in R has allowed for numerous improvements not supported by the Cade and Richards (2005) Fortran implementation, including use of categorical predictor variables in most routines.

  4. Controlling the joint local false discovery rate is more powerful than meta-analysis methods in joint analysis of summary statistics from multiple genome-wide association studies.

    PubMed

    Jiang, Wei; Yu, Weichuan

    2017-02-15

    In genome-wide association studies (GWASs) of common diseases/traits, we often analyze multiple GWASs with the same phenotype together to discover associated genetic variants with higher power. Since it is difficult to access data with detailed individual measurements, summary-statistics-based meta-analysis methods have become popular to jointly analyze datasets from multiple GWASs. In this paper, we propose a novel summary-statistics-based joint analysis method based on controlling the joint local false discovery rate (Jlfdr). We prove that our method is the most powerful summary-statistics-based joint analysis method when controlling the false discovery rate at a certain level. In particular, the Jlfdr-based method achieves higher power than commonly used meta-analysis methods when analyzing heterogeneous datasets from multiple GWASs. Simulation experiments demonstrate the superior power of our method over meta-analysis methods. Also, our method discovers more associations than meta-analysis methods from empirical datasets of four phenotypes. The R-package is available at: http://bioinformatics.ust.hk/Jlfdr.html . eeyu@ust.hk. Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com

  5. CompareTests-R package

    Cancer.gov

    CompareTests is an R package to estimate agreement and diagnostic accuracy statistics for two diagnostic tests when one is conducted on only a subsample of specimens. A standard test is observed on all specimens.

  6. Assessing landslide susceptibility by statistical data analysis and GIS: the case of Daunia (Apulian Apennines, Italy)

    NASA Astrophysics Data System (ADS)

    Ceppi, C.; Mancini, F.; Ritrovato, G.

    2009-04-01

    This study aim at the landslide susceptibility mapping within an area of the Daunia (Apulian Apennines, Italy) by a multivariate statistical method and data manipulation in a Geographical Information System (GIS) environment. Among the variety of existing statistical data analysis techniques, the logistic regression was chosen to produce a susceptibility map all over an area where small settlements are historically threatened by landslide phenomena. By logistic regression a best fitting between the presence or absence of landslide (dependent variable) and the set of independent variables is performed on the basis of a maximum likelihood criterion, bringing to the estimation of regression coefficients. The reliability of such analysis is therefore due to the ability to quantify the proneness to landslide occurrences by the probability level produced by the analysis. The inventory of dependent and independent variables were managed in a GIS, where geometric properties and attributes have been translated into raster cells in order to proceed with the logistic regression by means of SPSS (Statistical Package for the Social Sciences) package. A landslide inventory was used to produce the bivariate dependent variable whereas the independent set of variable concerned with slope, aspect, elevation, curvature, drained area, lithology and land use after their reductions to dummy variables. The effect of independent parameters on landslide occurrence was assessed by the corresponding coefficient in the logistic regression function, highlighting a major role played by the land use variable in determining occurrence and distribution of phenomena. Once the outcomes of the logistic regression are determined, data are re-introduced in the GIS to produce a map reporting the proneness to landslide as predicted level of probability. As validation of results and regression model a cell-by-cell comparison between the susceptibility map and the initial inventory of landslide events was performed and an agreement at 75% level achieved.

  7. Comparison of classical statistical methods and artificial neural network in traffic noise prediction

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Nedic, Vladimir, E-mail: vnedic@kg.ac.rs; Despotovic, Danijela, E-mail: ddespotovic@kg.ac.rs; Cvetanovic, Slobodan, E-mail: slobodan.cvetanovic@eknfak.ni.ac.rs

    2014-11-15

    Traffic is the main source of noise in urban environments and significantly affects human mental and physical health and labor productivity. Therefore it is very important to model the noise produced by various vehicles. Techniques for traffic noise prediction are mainly based on regression analysis, which generally is not good enough to describe the trends of noise. In this paper the application of artificial neural networks (ANNs) for the prediction of traffic noise is presented. As input variables of the neural network, the proposed structure of the traffic flow and the average speed of the traffic flow are chosen. Themore » output variable of the network is the equivalent noise level in the given time period L{sub eq}. Based on these parameters, the network is modeled, trained and tested through a comparative analysis of the calculated values and measured levels of traffic noise using the originally developed user friendly software package. It is shown that the artificial neural networks can be a useful tool for the prediction of noise with sufficient accuracy. In addition, the measured values were also used to calculate equivalent noise level by means of classical methods, and comparative analysis is given. The results clearly show that ANN approach is superior in traffic noise level prediction to any other statistical method. - Highlights: • We proposed an ANN model for prediction of traffic noise. • We developed originally designed user friendly software package. • The results are compared with classical statistical methods. • The results are much better predictive capabilities of ANN model.« less

  8. Multi-response permutation procedure as an alternative to the analysis of variance: an SPSS implementation.

    PubMed

    Cai, Li

    2006-02-01

    A permutation test typically requires fewer assumptions than does a comparable parametric counterpart. The multi-response permutation procedure (MRPP) is a class of multivariate permutation tests of group difference useful for the analysis of experimental data. However, psychologists seldom make use of the MRPP in data analysis, in part because the MRPP is not implemented in popular statistical packages that psychologists use. A set of SPSS macros implementing the MRPP test is provided in this article. The use of the macros is illustrated by analyzing example data sets.

  9. Analytic programming with FMRI data: a quick-start guide for statisticians using R.

    PubMed

    Eloyan, Ani; Li, Shanshan; Muschelli, John; Pekar, Jim J; Mostofsky, Stewart H; Caffo, Brian S

    2014-01-01

    Functional magnetic resonance imaging (fMRI) is a thriving field that plays an important role in medical imaging analysis, biological and neuroscience research and practice. This manuscript gives a didactic introduction to the statistical analysis of fMRI data using the R project, along with the relevant R code. The goal is to give statisticians who would like to pursue research in this area a quick tutorial for programming with fMRI data. References of relevant packages and papers are provided for those interested in more advanced analysis.

  10. An improved multiple linear regression and data analysis computer program package

    NASA Technical Reports Server (NTRS)

    Sidik, S. M.

    1972-01-01

    NEWRAP, an improved version of a previous multiple linear regression program called RAPIER, CREDUC, and CRSPLT, allows for a complete regression analysis including cross plots of the independent and dependent variables, correlation coefficients, regression coefficients, analysis of variance tables, t-statistics and their probability levels, rejection of independent variables, plots of residuals against the independent and dependent variables, and a canonical reduction of quadratic response functions useful in optimum seeking experimentation. A major improvement over RAPIER is that all regression calculations are done in double precision arithmetic.

  11. [Intranarcotic infusion therapy -- a computer interpretation using the program package SPSS (Statistical Package for the Social Sciences)].

    PubMed

    Link, J; Pachaly, J

    1975-08-01

    In a retrospective 18-month study the infusion therapy applied in a great anesthesia institute is examined. The data of the course of anesthesia recorded on magnetic tape by routine are analysed for this purpose bya computer with the statistical program SPSS. It could be proved that the behaviour of the several anesthetists is very different. Various correlations are discussed.

  12. 'spup' - an R package for uncertainty propagation analysis in spatial environmental modelling

    NASA Astrophysics Data System (ADS)

    Sawicka, Kasia; Heuvelink, Gerard

    2017-04-01

    Computer models have become a crucial tool in engineering and environmental sciences for simulating the behaviour of complex static and dynamic systems. However, while many models are deterministic, the uncertainty in their predictions needs to be estimated before they are used for decision support. Currently, advances in uncertainty propagation and assessment have been paralleled by a growing number of software tools for uncertainty analysis, but none has gained recognition for a universal applicability and being able to deal with case studies with spatial models and spatial model inputs. Due to the growing popularity and applicability of the open source R programming language we undertook a project to develop an R package that facilitates uncertainty propagation analysis in spatial environmental modelling. In particular, the 'spup' package provides functions for examining the uncertainty propagation starting from input data and model parameters, via the environmental model onto model predictions. The functions include uncertainty model specification, stochastic simulation and propagation of uncertainty using Monte Carlo (MC) techniques, as well as several uncertainty visualization functions. Uncertain environmental variables are represented in the package as objects whose attribute values may be uncertain and described by probability distributions. Both numerical and categorical data types are handled. Spatial auto-correlation within an attribute and cross-correlation between attributes is also accommodated for. For uncertainty propagation the package has implemented the MC approach with efficient sampling algorithms, i.e. stratified random sampling and Latin hypercube sampling. The design includes facilitation of parallel computing to speed up MC computation. The MC realizations may be used as an input to the environmental models called from R, or externally. Selected visualization methods that are understandable by non-experts with limited background in statistics can be used to summarize and visualize uncertainty about the measured input, model parameters and output of the uncertainty propagation. We demonstrate that the 'spup' package is an effective and easy tool to apply and can be used in multi-disciplinary research and model-based decision support.

  13. User guide to Exploration and Graphics for RivEr Trends (EGRET) and dataRetrieval: R packages for hydrologic data

    USGS Publications Warehouse

    Hirsch, Robert M.; De Cicco, Laura A.

    2015-01-01

    Evaluating long-term changes in river conditions (water quality and discharge) is an important use of hydrologic data. To carry out such evaluations, the hydrologist needs tools to facilitate several key steps in the process: acquiring the data records from a variety of sources, structuring it in ways that facilitate the analysis, processing the data with routines that extract information about changes that may be happening, and displaying findings with graphical techniques. A pair of tightly linked R packages, called dataRetrieval and EGRET (Exploration and Graphics for RivEr Trends), have been developed for carrying out each of these steps in an integrated manner. They are designed to easily accept data from three sources: U.S. Geological Survey hydrologic data, U.S. Environmental Protection Agency (EPA) STORET data, and user-supplied flat files. The dataRetrieval package not only serves as a “front end” to the EGRET package, it can also be used to easily download many types of hydrologic data and organize it in ways that facilitate many other hydrologic applications. The EGRET package has components oriented towards the description of long-term changes in streamflow statistics (high flow, average flow, and low flow) as well as changes in water quality. For the water-quality analysis, it uses Weighted Regressions on Time, Discharge and Season (WRTDS) to describe long-term trends in both concentration and flux. EGRET also creates a wide range of graphical presentations of the water-quality data and of the WRTDS results. This report serves as a user guide to these two R packages, providing detailed guidance on installation and use of the software, documentation of the analysis methods used, as well as guidance on some of the kinds of questions and approaches that the software can facilitate.

  14. Evaluation of bond strength of resin cements using different general-purpose statistical software packages for two-parameter Weibull statistics.

    PubMed

    Roos, Malgorzata; Stawarczyk, Bogna

    2012-07-01

    This study evaluated and compared Weibull parameters of resin bond strength values using six different general-purpose statistical software packages for two-parameter Weibull distribution. Two-hundred human teeth were randomly divided into 4 groups (n=50), prepared and bonded on dentin according to the manufacturers' instructions using the following resin cements: (i) Variolink (VAN, conventional resin cement), (ii) Panavia21 (PAN, conventional resin cement), (iii) RelyX Unicem (RXU, self-adhesive resin cement) and (iv) G-Cem (GCM, self-adhesive resin cement). Subsequently, all specimens were stored in water for 24h at 37°C. Shear bond strength was measured and the data were analyzed using Anderson-Darling goodness-of-fit (MINITAB 16) and two-parameter Weibull statistics with the following statistical software packages: Excel 2011, SPSS 19, MINITAB 16, R 2.12.1, SAS 9.1.3. and STATA 11.2 (p≤0.05). Additionally, the three-parameter Weibull was fitted using MNITAB 16. Two-parameter Weibull calculated with MINITAB and STATA can be compared using an omnibus test and using 95% CI. In SAS only 95% CI were directly obtained from the output. R provided no estimates of 95% CI. In both SAS and R the global comparison of the characteristic bond strength among groups is provided by means of the Weibull regression. EXCEL and SPSS provided no default information about 95% CI and no significance test for the comparison of Weibull parameters among the groups. In summary, conventional resin cement VAN showed the highest Weibull modulus and characteristic bond strength. There are discrepancies in the Weibull statistics depending on the software package and the estimation method. The information content in the default output provided by the software packages differs to very high extent. Copyright © 2012 Academy of Dental Materials. Published by Elsevier Ltd. All rights reserved.

  15. Linear mixed-effects models for within-participant psychology experiments: an introductory tutorial and free, graphical user interface (LMMgui).

    PubMed

    Magezi, David A

    2015-01-01

    Linear mixed-effects models (LMMs) are increasingly being used for data analysis in cognitive neuroscience and experimental psychology, where within-participant designs are common. The current article provides an introductory review of the use of LMMs for within-participant data analysis and describes a free, simple, graphical user interface (LMMgui). LMMgui uses the package lme4 (Bates et al., 2014a,b) in the statistical environment R (R Core Team).

  16. Open-source Software for Exoplanet Atmospheric Modeling

    NASA Astrophysics Data System (ADS)

    Cubillos, Patricio; Blecic, Jasmina; Harrington, Joseph

    2018-01-01

    I will present a suite of self-standing open-source tools to model and retrieve exoplanet spectra implemented for Python. These include: (1) a Bayesian-statistical package to run Levenberg-Marquardt optimization and Markov-chain Monte Carlo posterior sampling, (2) a package to compress line-transition data from HITRAN or Exomol without loss of information, (3) a package to compute partition functions for HITRAN molecules, (4) a package to compute collision-induced absorption, and (5) a package to produce radiative-transfer spectra of transit and eclipse exoplanet observations and atmospheric retrievals.

  17. Bayesian Atmospheric Radiative Transfer (BART): Model, Statistics Driver, and Application to HD 209458b

    NASA Astrophysics Data System (ADS)

    Cubillos, Patricio; Harrington, Joseph; Blecic, Jasmina; Stemm, Madison M.; Lust, Nate B.; Foster, Andrew S.; Rojo, Patricio M.; Loredo, Thomas J.

    2014-11-01

    Multi-wavelength secondary-eclipse and transit depths probe the thermo-chemical properties of exoplanets. In recent years, several research groups have developed retrieval codes to analyze the existing data and study the prospects of future facilities. However, the scientific community has limited access to these packages. Here we premiere the open-source Bayesian Atmospheric Radiative Transfer (BART) code. We discuss the key aspects of the radiative-transfer algorithm and the statistical package. The radiation code includes line databases for all HITRAN molecules, high-temperature H2O, TiO, and VO, and includes a preprocessor for adding additional line databases without recompiling the radiation code. Collision-induced absorption lines are available for H2-H2 and H2-He. The parameterized thermal and molecular abundance profiles can be modified arbitrarily without recompilation. The generated spectra are integrated over arbitrary bandpasses for comparison to data. BART's statistical package, Multi-core Markov-chain Monte Carlo (MC3), is a general-purpose MCMC module. MC3 implements the Differental-evolution Markov-chain Monte Carlo algorithm (ter Braak 2006, 2009). MC3 converges 20-400 times faster than the usual Metropolis-Hastings MCMC algorithm, and in addition uses the Message Passing Interface (MPI) to parallelize the MCMC chains. We apply the BART retrieval code to the HD 209458b data set to estimate the planet's temperature profile and molecular abundances. This work was supported by NASA Planetary Atmospheres grant NNX12AI69G and NASA Astrophysics Data Analysis Program grant NNX13AF38G. JB holds a NASA Earth and Space Science Fellowship.

  18. Proceedings: USACERL/ASCE First Joint Conference on Expert Systems, 29-30 June 1988

    DTIC Science & Technology

    1989-01-01

    Wong KOWLEDGE -BASED GRAPHIC DIALOGUES . o ...................... .... 80 D. L Mw 4 CONTENTS (Cont’d) ABSTRACTS ACCEPTED FOR PUBLICATION MAD, AN EXPERT...methodology of inductive shallow modeling was developed. Inductive systems may become powerful shallow modeling tools applicable to a large class of...analysis was conducted using a statistical package, Trajectories. Four different types of relationships were analyzed: linear, logarithmic, power , and

  19. A Vignette (User’s Guide) for “An R Package for Statistical Analysis of Chemistry, Histopathology, and Reproduction Endpoints Including Repeated Measures and Multi-Generation Studies (StatCharrms).”

    EPA Science Inventory

    StatCharrms is a graphical user front-end for ease of use in analyzing data generated from OCSPP 890.2200, Medaka Extended One Generation Reproduction Test (MEOGRT) and OCSPP 890.2300, Larval Amphibian Gonad Development Assay (LAGDA). The analyses StatCharrms is capable of perfor...

  20. Use of Computer Statistical Packages to Generate Quality Control Reports on Training

    DTIC Science & Technology

    1980-01-01

    Quality Control Statistical Analysis 126. Th~rAcr ivowhis. sim oeva.e ebb VI .eseem mu 111160#0 by block nuaber; OU6btaining timely and efficient...DISSAI.SFIEC 4 31 EXTRE.,AELY SATISFIED 4 32 8. HUW MAN-Y MEN IN YOU 1QNIT hA:,T TO DO A GOCO JOB IN 5. 2 TRAIING -?5- ə> F01 UF THEM ɚ> SCME CF THEM...permanent disk storage space within the coma- puteor account.* The user may not wish to run the "Audit" program in the s a batch flow as the 6th.: three

  1. Tolerancing aspheres based on manufacturing statistics

    NASA Astrophysics Data System (ADS)

    Wickenhagen, S.; Möhl, A.; Fuchs, U.

    2017-11-01

    A standard way of tolerancing optical elements or systems is to perform a Monte Carlo based analysis within a common optical design software package. Although, different weightings and distributions are assumed they are all counting on statistics, which usually means several hundreds or thousands of systems for reliable results. Thus, employing these methods for small batch sizes is unreliable, especially when aspheric surfaces are involved. The huge database of asphericon was used to investigate the correlation between the given tolerance values and measured data sets. The resulting probability distributions of these measured data were analyzed aiming for a robust optical tolerancing process.

  2. iScreen: Image-Based High-Content RNAi Screening Analysis Tools.

    PubMed

    Zhong, Rui; Dong, Xiaonan; Levine, Beth; Xie, Yang; Xiao, Guanghua

    2015-09-01

    High-throughput RNA interference (RNAi) screening has opened up a path to investigating functional genomics in a genome-wide pattern. However, such studies are often restricted to assays that have a single readout format. Recently, advanced image technologies have been coupled with high-throughput RNAi screening to develop high-content screening, in which one or more cell image(s), instead of a single readout, were generated from each well. This image-based high-content screening technology has led to genome-wide functional annotation in a wider spectrum of biological research studies, as well as in drug and target discovery, so that complex cellular phenotypes can be measured in a multiparametric format. Despite these advances, data analysis and visualization tools are still largely lacking for these types of experiments. Therefore, we developed iScreen (image-Based High-content RNAi Screening Analysis Tool), an R package for the statistical modeling and visualization of image-based high-content RNAi screening. Two case studies were used to demonstrate the capability and efficiency of the iScreen package. iScreen is available for download on CRAN (http://cran.cnr.berkeley.edu/web/packages/iScreen/index.html). The user manual is also available as a supplementary document. © 2014 Society for Laboratory Automation and Screening.

  3. Easing access to R using 'shiny' to create graphical user interfaces: An example for the R package 'Luminescence'

    NASA Astrophysics Data System (ADS)

    Burow, Christoph; Kreutzer, Sebastian; Dietze, Michael; Fuchs, Margret C.; Schmidt, Christoph; Fischer, Manfred; Brückner, Helmut

    2017-04-01

    Since the release of the R package 'Luminescence' (Kreutzer et al., 2012) the functionality of the package has been greatly enhanced by implementing further functions for measurement data processing, statistical analysis and graphical output. Despite its capabilities for complex and non-standard analysis of luminescence data, working with the command-line interface (CLI) of R can be tedious at best and overwhelming at worst, especially for users without experience in programming languages. Even though much work is put into simplifying the usage of the package to continuously lower the entry threshold, at least basic knowledge of R will always be required. Thus, the potential user base of the package cannot be exhausted, at least as long as the CLI is the only means of utilising the 'Luminescence' package. But even experienced users may find it tedious to iteratively run a function until a satisfying results is produced. For example, plotting data is also at least partly subject to personal aesthetic tastes in accordance with the information it is supposed to convey and iterating through all the possible options in the R CLI can be a time-consuming task. An alternative approach to the CLI is the graphical user interface (GUI), which allows direct, interactive manipulation and interaction with the underlying software. For users with little or no experience with command-lines a GUI offers intuitive access that counteracts the perceived steep learning curve of a CLI. Even though R lacks native support for GUI functions, its capabilities of linking it to other programming languages allows to utilise external frameworks to build graphical user interfaces. A recent attempt to provide a GUI toolkit for R was the introduction of the 'shiny' package (Chang et al., 2016), which allows automatic construction of HTML, CSS and JavaScript based user interfaces straight from R. Here, we give (1) a brief introduction to the 'shiny' framework for R, before we (2) present a GUI for the R package 'Luminescence' in the form of interactive web applications. These applications can be accessed online so that a user is not even required to have a local installation of R and which provide access to most of the plotting functions of the R package 'Luminescence'. These functionalities will be demonstrated live during the PICO session. References Chang, W., Cheng, J., Allaire, JJ., Xie, Y., McPherson, J., 2016. shiny: Web Application Framework for R. R package version 0.13.2. https://CRAN.R-project.org/package=shiny Kreutzer, S., Schmidt, C., Fuchs, M.C., Dietze, M., Fischer, M., Fuchs, M., 2012. Introducing an R package for luminescence dating analysis. Ancient TL, 30: 1-8, 2012.

  4. PCIPS 2.0: Powerful multiprofile image processing implemented on PCs

    NASA Technical Reports Server (NTRS)

    Smirnov, O. M.; Piskunov, N. E.

    1992-01-01

    Over the years, the processing power of personal computers has steadily increased. Now, 386- and 486-based PC's are fast enough for many image processing applications, and inexpensive enough even for amateur astronomers. PCIPS is an image processing system based on these platforms that was designed to satisfy a broad range of data analysis needs, while requiring minimum hardware and providing maximum expandability. It will run (albeit at a slow pace) even on a 80286 with 640K memory, but will take full advantage of bigger memory and faster CPU's. Because the actual image processing is performed by external modules, the system can be easily upgraded by the user for all sorts of scientific data analysis. PCIPS supports large format lD and 2D images in any numeric type from 8-bit integer to 64-bit floating point. The images can be displayed, overlaid, printed and any part of the data examined via an intuitive graphical user interface that employs buttons, pop-up menus, and a mouse. PCIPS automatically converts images between different types and sizes to satisfy the requirements of various applications. PCIPS features an API that lets users develop custom applications in C or FORTRAN. While doing so, a programmer can concentrate on the actual data processing, because PCIPS assumes responsibility for accessing images and interacting with the user. This also ensures that all applications, even custom ones, have a consistent and user-friendly interface. The API is compatible with factory programming, a metaphor for constructing image processing procedures that will be implemented in future versions of the system. Several application packages were created under PCIPS. The basic package includes elementary arithmetics and statistics, geometric transformations and import/export in various formats (FITS, binary, ASCII, and GIF). The CCD processing package and the spectral analysis package were successfully used to reduce spectra from the Nordic Telescope at La Palma. A photometry package is also available, and other packages are being developed. A multitasking version of PCIPS that utilizes the factory programming concept is currently under development. This version will remain compatible (on the source code level) with existing application packages and custom applications.

  5. KMWin--a convenient tool for graphical presentation of results from Kaplan-Meier survival time analysis.

    PubMed

    Gross, Arnd; Ziepert, Marita; Scholz, Markus

    2012-01-01

    Analysis of clinical studies often necessitates multiple graphical representations of the results. Many professional software packages are available for this purpose. Most packages are either only commercially available or hard to use especially if one aims to generate or customize a huge number of similar graphical outputs. We developed a new, freely available software tool called KMWin (Kaplan-Meier for Windows) facilitating Kaplan-Meier survival time analysis. KMWin is based on the statistical software environment R and provides an easy to use graphical interface. Survival time data can be supplied as SPSS (sav), SAS export (xpt) or text file (dat), which is also a common export format of other applications such as Excel. Figures can directly be exported in any graphical file format supported by R. On the basis of a working example, we demonstrate how to use KMWin and present its main functions. We show how to control the interface, customize the graphical output, and analyse survival time data. A number of comparisons are performed between KMWin and SPSS regarding graphical output, statistical output, data management and development. Although the general functionality of SPSS is larger, KMWin comprises a number of features useful for survival time analysis in clinical trials and other applications. These are for example number of cases and number of cases under risk within the figure or provision of a queue system for repetitive analyses of updated data sets. Moreover, major adjustments of graphical settings can be performed easily on a single window. We conclude that our tool is well suited and convenient for repetitive analyses of survival time data. It can be used by non-statisticians and provides often used functions as well as functions which are not supplied by standard software packages. The software is routinely applied in several clinical study groups.

  6. KMWin – A Convenient Tool for Graphical Presentation of Results from Kaplan-Meier Survival Time Analysis

    PubMed Central

    Gross, Arnd; Ziepert, Marita; Scholz, Markus

    2012-01-01

    Background Analysis of clinical studies often necessitates multiple graphical representations of the results. Many professional software packages are available for this purpose. Most packages are either only commercially available or hard to use especially if one aims to generate or customize a huge number of similar graphical outputs. We developed a new, freely available software tool called KMWin (Kaplan-Meier for Windows) facilitating Kaplan-Meier survival time analysis. KMWin is based on the statistical software environment R and provides an easy to use graphical interface. Survival time data can be supplied as SPSS (sav), SAS export (xpt) or text file (dat), which is also a common export format of other applications such as Excel. Figures can directly be exported in any graphical file format supported by R. Results On the basis of a working example, we demonstrate how to use KMWin and present its main functions. We show how to control the interface, customize the graphical output, and analyse survival time data. A number of comparisons are performed between KMWin and SPSS regarding graphical output, statistical output, data management and development. Although the general functionality of SPSS is larger, KMWin comprises a number of features useful for survival time analysis in clinical trials and other applications. These are for example number of cases and number of cases under risk within the figure or provision of a queue system for repetitive analyses of updated data sets. Moreover, major adjustments of graphical settings can be performed easily on a single window. Conclusions We conclude that our tool is well suited and convenient for repetitive analyses of survival time data. It can be used by non-statisticians and provides often used functions as well as functions which are not supplied by standard software packages. The software is routinely applied in several clinical study groups. PMID:22723912

  7. Multivariate Statistical Analysis Software Technologies for Astrophysical Research Involving Large Data Bases

    NASA Technical Reports Server (NTRS)

    Djorgovski, S. G.

    1994-01-01

    We developed a package to process and analyze the data from the digital version of the Second Palomar Sky Survey. This system, called SKICAT, incorporates the latest in machine learning and expert systems software technology, in order to classify the detected objects objectively and uniformly, and facilitate handling of the enormous data sets from digital sky surveys and other sources. The system provides a powerful, integrated environment for the manipulation and scientific investigation of catalogs from virtually any source. It serves three principal functions: image catalog construction, catalog management, and catalog analysis. Through use of the GID3* Decision Tree artificial induction software, SKICAT automates the process of classifying objects within CCD and digitized plate images. To exploit these catalogs, the system also provides tools to merge them into a large, complex database which may be easily queried and modified when new data or better methods of calibrating or classifying become available. The most innovative feature of SKICAT is the facility it provides to experiment with and apply the latest in machine learning technology to the tasks of catalog construction and analysis. SKICAT provides a unique environment for implementing these tools for any number of future scientific purposes. Initial scientific verification and performance tests have been made using galaxy counts and measurements of galaxy clustering from small subsets of the survey data, and a search for very high redshift quasars. All of the tests were successful and produced new and interesting scientific results. Attachments to this report give detailed accounts of the technical aspects of the SKICAT system, and of some of the scientific results achieved to date. We also developed a user-friendly package for multivariate statistical analysis of small and moderate-size data sets, called STATPROG. The package was tested extensively on a number of real scientific applications and has produced real, published results.

  8. LFSTAT - An R-Package for Low-Flow Analysis

    NASA Astrophysics Data System (ADS)

    Koffler, D.; Laaha, G.

    2012-04-01

    When analysing daily streamflow data focusing on low flow and drought, the state of the art is well documented in the Manual on Low-Flow Estimation and Prediction [1] published by the WMO. While it is clear what has to be done, it is not so clear how to preform the analysis and make the calculation as reproducible as possible. Our software solution expands the high preforming statistical open source software package R to analyse daily stream flow data focusing on low-flows. As command-line based programs are not everyone's preference, we also offer a plug-in for the R-Commander, an easy to use graphical user interface (GUI) to analyse data in R. Functionality includes estimation of the most important low-flow indices. Beside standardly used flow indices also BFI and Recession constants can be computed. The main applications of L-moment based Extreme value analysis and regional frequency analysis (RFA) are available. Calculation of streamflow deficits is another important feature. The most common graphics are prepared and can easily be modified according to the users preferences. Graphics include hydrographs for different periods, flexible streamflow deficit plots, baseflow visualisation, flow duration curves as well as double mass curves just to name a few. The package uses a S3-class called lfobj (low-flow objects). Once this objects are created, analysis can be preformed by mouse-click, and a script can be saved to make the analysis easy reproducible. At the moment we are offering implementation of all major methods proposed in the WMO manual on Low-flow Estimation and Predictions. Future plans include e.g. report export in odt-file using odf-weave. We hope to offer a tool to ease and structure the analysis of stream flow data focusing on low-flows and to make analysis transparent and communicable. The package is designed for hydrological research and water management practice, but can also be used in teaching students the first steps in low-flow hydrology.

  9. P-MartCancer–Interactive Online Software to Enable Analysis of Shotgun Cancer Proteomic Datasets

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Webb-Robertson, Bobbie-Jo M.; Bramer, Lisa M.; Jensen, Jeffrey L.

    P-MartCancer is a new interactive web-based software environment that enables biomedical and biological scientists to perform in-depth analyses of global proteomics data without requiring direct interaction with the data or with statistical software. P-MartCancer offers a series of statistical modules associated with quality assessment, peptide and protein statistics, protein quantification and exploratory data analyses driven by the user via customized workflows and interactive visualization. Currently, P-MartCancer offers access to multiple cancer proteomic datasets generated through the Clinical Proteomics Tumor Analysis Consortium (CPTAC) at the peptide, gene and protein levels. P-MartCancer is deployed using Azure technologies (http://pmart.labworks.org/cptac.html), the web-service is alternativelymore » available via Docker Hub (https://hub.docker.com/r/pnnl/pmart-web/) and many statistical functions can be utilized directly from an R package available on GitHub (https://github.com/pmartR).« less

  10. Low-Income Employees’ Choices Regarding Employment Benefits Aimed at Improving the Socioeconomic Determinants of Health

    PubMed Central

    Danis, Marion; Lovett, Francis; Sabik, Lindsay; Adikes, Katherin; Cheng, Glen; Aomo, Tom

    2007-01-01

    Objectives. Socioeconomic factors are associated with reduced health status in low-income populations. We sought to identify affordable employment benefit packages that might ameliorate these socioeconomic factors and would be consonant with employees’ priorities. Methods. Working in groups (n = 53), low-income employees (n = 408; 62% women, 65% Black) from the Washington, DC, and Baltimore, Md, metropolitan area, participated in a computerized exercise in which they expressed their preference for employment benefit packages intended to address socioeconomic determinants of health. The hypothetical costs of these benefits reflected those of the average US benefit package available to low-income employees. Questionnaires ascertained sociodemographic information and attitudes. Descriptive statistics and logistic regression analysis were used to examine benefit choices. Results. Groups chose offered benefits in the following descending rank order: health care, retirement, vacation, disability pay, training, job flexibility, family time, dependent care, monetary advice, anxiety assistance, wellness, housing assistance, and nutrition programs. Participants varied in their personal choices, but 78% expressed willingness to abide by their groups’ choices. Conclusions. It is possible to design employment benefits that ameliorate socioeconomic determinants of health and are acceptable to low-income employees. These benefit packages can be provided at the cost of benefit packages currently available to some low-income employees. PMID:17666702

  11. Low-income employees' choices regarding employment benefits aimed at improving the socioeconomic determinants of health.

    PubMed

    Danis, Marion; Lovett, Francis; Sabik, Lindsay; Adikes, Katherin; Cheng, Glen; Aomo, Tom

    2007-09-01

    Socioeconomic factors are associated with reduced health status in low-income populations. We sought to identify affordable employment benefit packages that might ameliorate these socioeconomic factors and would be consonant with employees' priorities. Working in groups (n = 53), low-income employees (n = 408; 62% women, 65% Black) from the Washington, DC, and Baltimore, Md, metropolitan area, participated in a computerized exercise in which they expressed their preference for employment benefit packages intended to address socioeconomic determinants of health. The hypothetical costs of these benefits reflected those of the average US benefit package available to low-income employees. Questionnaires ascertained sociodemographic information and attitudes. Descriptive statistics and logistic regression analysis were used to examine benefit choices. Groups chose offered benefits in the following descending rank order: health care, retirement, vacation, disability pay, training, job flexibility, family time, dependent care, monetary advice, anxiety assistance, wellness, housing assistance, and nutrition programs. Participants varied in their personal choices, but 78% expressed willingness to abide by their groups' choices. It is possible to design employment benefits that ameliorate socioeconomic determinants of health and are acceptable to low-income employees. These benefit packages can be provided at the cost of benefit packages currently available to some low-income employees.

  12. Application of the AMBUR R package for spatio-temporal analysis of shoreline change: Jekyll Island, Georgia, USA

    NASA Astrophysics Data System (ADS)

    Jackson, Chester W.; Alexander, Clark R.; Bush, David M.

    2012-04-01

    The AMBUR (Analyzing Moving Boundaries Using R) package for the R software environment provides a collection of functions for assisting with analyzing and visualizing historical shoreline change. The package allows import and export of geospatial data in ESRI shapefile format, which is compatible with most commercial and open-source GIS software. The "baseline and transect" method is the primary technique used to quantify distances and rates of shoreline movement, and to detect classification changes across time. Along with the traditional "perpendicular" transect method, two new transect methods, "near" and "filtered," assist with quantifying changes along curved shorelines that are problematic for perpendicular transect methods. Output from the analyses includes data tables, graphics, and geospatial data, which are useful in rapidly assessing trends and potential errors in the dataset. A forecasting function also allows the user to estimate the future location of the shoreline and store the results in a shapefile. Other utilities and tools provided in the package assist with preparing and manipulating geospatial data, error checking, and generating supporting graphics and shapefiles. The package can be customized to perform additional statistical, graphical, and geospatial functions, and, it is capable of analyzing the movement of any boundary (e.g., shorelines, glacier terminus, fire edge, and marine and terrestrial ecozones).

  13. Comparison of software packages for detecting differential expression in RNA-seq studies

    PubMed Central

    Seyednasrollah, Fatemeh; Laiho, Asta

    2015-01-01

    RNA-sequencing (RNA-seq) has rapidly become a popular tool to characterize transcriptomes. A fundamental research problem in many RNA-seq studies is the identification of reliable molecular markers that show differential expression between distinct sample groups. Together with the growing popularity of RNA-seq, a number of data analysis methods and pipelines have already been developed for this task. Currently, however, there is no clear consensus about the best practices yet, which makes the choice of an appropriate method a daunting task especially for a basic user without a strong statistical or computational background. To assist the choice, we perform here a systematic comparison of eight widely used software packages and pipelines for detecting differential expression between sample groups in a practical research setting and provide general guidelines for choosing a robust pipeline. In general, our results demonstrate how the data analysis tool utilized can markedly affect the outcome of the data analysis, highlighting the importance of this choice. PMID:24300110

  14. Comparison of software packages for detecting differential expression in RNA-seq studies.

    PubMed

    Seyednasrollah, Fatemeh; Laiho, Asta; Elo, Laura L

    2015-01-01

    RNA-sequencing (RNA-seq) has rapidly become a popular tool to characterize transcriptomes. A fundamental research problem in many RNA-seq studies is the identification of reliable molecular markers that show differential expression between distinct sample groups. Together with the growing popularity of RNA-seq, a number of data analysis methods and pipelines have already been developed for this task. Currently, however, there is no clear consensus about the best practices yet, which makes the choice of an appropriate method a daunting task especially for a basic user without a strong statistical or computational background. To assist the choice, we perform here a systematic comparison of eight widely used software packages and pipelines for detecting differential expression between sample groups in a practical research setting and provide general guidelines for choosing a robust pipeline. In general, our results demonstrate how the data analysis tool utilized can markedly affect the outcome of the data analysis, highlighting the importance of this choice. © The Author 2013. Published by Oxford University Press.

  15. Hydrological analysis in R: Topmodel and beyond

    NASA Astrophysics Data System (ADS)

    Buytaert, W.; Reusser, D.

    2011-12-01

    R is quickly gaining popularity in the hydrological sciences community. The wide range of statistical and mathematical functionality makes it an excellent tool for data analysis, modelling and uncertainty analysis. Topmodel was one of the first hydrological models being implemented as an R package and distributed through R's own distribution network CRAN. This facilitated pre- and postprocessing of data such as parameter sampling, calculation of prediction bounds, and advanced visualisation. However, apart from these basic functionalities, the package did not use many of the more advanced features of the R environment, especially from R's object oriented functionality. With R's increasing expansion in arenas such as high performance computing, big data analysis, and cloud services, we revisit the topmodel package, and use it as an example of how to build and deploy the next generation of hydrological models. R provides a convenient environment and attractive features to build and couple hydrological - and in extension other environmental - models, to develop flexible and effective data assimilation strategies, and to take the model beyond the individual computer by linking into cloud services for both data provision and computing. However, in order to maximise the benefit of these approaches, it will be necessary to adopt standards and ontologies for model interaction and information exchange. Some of those are currently being developed, such as the OGC web processing standards, while other will need to be developed.

  16. GENE-Counter: A Computational Pipeline for the Analysis of RNA-Seq Data for Gene Expression Differences

    PubMed Central

    Di, Yanming; Schafer, Daniel W.; Wilhelm, Larry J.; Fox, Samuel E.; Sullivan, Christopher M.; Curzon, Aron D.; Carrington, James C.; Mockler, Todd C.; Chang, Jeff H.

    2011-01-01

    GENE-counter is a complete Perl-based computational pipeline for analyzing RNA-Sequencing (RNA-Seq) data for differential gene expression. In addition to its use in studying transcriptomes of eukaryotic model organisms, GENE-counter is applicable for prokaryotes and non-model organisms without an available genome reference sequence. For alignments, GENE-counter is configured for CASHX, Bowtie, and BWA, but an end user can use any Sequence Alignment/Map (SAM)-compliant program of preference. To analyze data for differential gene expression, GENE-counter can be run with any one of three statistics packages that are based on variations of the negative binomial distribution. The default method is a new and simple statistical test we developed based on an over-parameterized version of the negative binomial distribution. GENE-counter also includes three different methods for assessing differentially expressed features for enriched gene ontology (GO) terms. Results are transparent and data are systematically stored in a MySQL relational database to facilitate additional analyses as well as quality assessment. We used next generation sequencing to generate a small-scale RNA-Seq dataset derived from the heavily studied defense response of Arabidopsis thaliana and used GENE-counter to process the data. Collectively, the support from analysis of microarrays as well as the observed and substantial overlap in results from each of the three statistics packages demonstrates that GENE-counter is well suited for handling the unique characteristics of small sample sizes and high variability in gene counts. PMID:21998647

  17. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bennett, Janine Camille; Thompson, David; Pebay, Philippe Pierre

    Statistical analysis is typically used to reduce the dimensionality of and infer meaning from data. A key challenge of any statistical analysis package aimed at large-scale, distributed data is to address the orthogonal issues of parallel scalability and numerical stability. Many statistical techniques, e.g., descriptive statistics or principal component analysis, are based on moments and co-moments and, using robust online update formulas, can be computed in an embarrassingly parallel manner, amenable to a map-reduce style implementation. In this paper we focus on contingency tables, through which numerous derived statistics such as joint and marginal probability, point-wise mutual information, information entropy,more » and {chi}{sup 2} independence statistics can be directly obtained. However, contingency tables can become large as data size increases, requiring a correspondingly large amount of communication between processors. This potential increase in communication prevents optimal parallel speedup and is the main difference with moment-based statistics (which we discussed in [1]) where the amount of inter-processor communication is independent of data size. Here we present the design trade-offs which we made to implement the computation of contingency tables in parallel. We also study the parallel speedup and scalability properties of our open source implementation. In particular, we observe optimal speed-up and scalability when the contingency statistics are used in their appropriate context, namely, when the data input is not quasi-diffuse.« less

  18. chipPCR: an R package to pre-process raw data of amplification curves.

    PubMed

    Rödiger, Stefan; Burdukiewicz, Michał; Schierack, Peter

    2015-09-01

    Both the quantitative real-time polymerase chain reaction (qPCR) and quantitative isothermal amplification (qIA) are standard methods for nucleic acid quantification. Numerous real-time read-out technologies have been developed. Despite the continuous interest in amplification-based techniques, there are only few tools for pre-processing of amplification data. However, a transparent tool for precise control of raw data is indispensable in several scenarios, for example, during the development of new instruments. chipPCR is an R: package for the pre-processing and quality analysis of raw data of amplification curves. The package takes advantage of R: 's S4 object model and offers an extensible environment. chipPCR contains tools for raw data exploration: normalization, baselining, imputation of missing values, a powerful wrapper for amplification curve smoothing and a function to detect the start and end of an amplification curve. The capabilities of the software are enhanced by the implementation of algorithms unavailable in R: , such as a 5-point stencil for derivative interpolation. Simulation tools, statistical tests, plots for data quality management, amplification efficiency/quantification cycle calculation, and datasets from qPCR and qIA experiments are part of the package. Core functionalities are integrated in GUIs (web-based and standalone shiny applications), thus streamlining analysis and report generation. http://cran.r-project.org/web/packages/chipPCR. Source code: https://github.com/michbur/chipPCR. stefan.roediger@b-tu.de Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  19. HDBStat!: a platform-independent software suite for statistical analysis of high dimensional biology data.

    PubMed

    Trivedi, Prinal; Edwards, Jode W; Wang, Jelai; Gadbury, Gary L; Srinivasasainagendra, Vinodh; Zakharkin, Stanislav O; Kim, Kyoungmi; Mehta, Tapan; Brand, Jacob P L; Patki, Amit; Page, Grier P; Allison, David B

    2005-04-06

    Many efforts in microarray data analysis are focused on providing tools and methods for the qualitative analysis of microarray data. HDBStat! (High-Dimensional Biology-Statistics) is a software package designed for analysis of high dimensional biology data such as microarray data. It was initially developed for the analysis of microarray gene expression data, but it can also be used for some applications in proteomics and other aspects of genomics. HDBStat! provides statisticians and biologists a flexible and easy-to-use interface to analyze complex microarray data using a variety of methods for data preprocessing, quality control analysis and hypothesis testing. Results generated from data preprocessing methods, quality control analysis and hypothesis testing methods are output in the form of Excel CSV tables, graphs and an Html report summarizing data analysis. HDBStat! is a platform-independent software that is freely available to academic institutions and non-profit organizations. It can be downloaded from our website http://www.soph.uab.edu/ssg_content.asp?id=1164.

  20. EQS Goes R: Simulations for SEM Using the Package REQS

    ERIC Educational Resources Information Center

    Mair, Patrick; Wu, Eric; Bentler, Peter M.

    2010-01-01

    The REQS package is an interface between the R environment of statistical computing and the EQS software for structural equation modeling. The package consists of 3 main functions that read EQS script files and import the results into R, call EQS script files from R, and run EQS script files from R and import the results after EQS computations.…

  1. Mousetrap: An integrated, open-source mouse-tracking package.

    PubMed

    Kieslich, Pascal J; Henninger, Felix

    2017-10-01

    Mouse-tracking - the analysis of mouse movements in computerized experiments - is becoming increasingly popular in the cognitive sciences. Mouse movements are taken as an indicator of commitment to or conflict between choice options during the decision process. Using mouse-tracking, researchers have gained insight into the temporal development of cognitive processes across a growing number of psychological domains. In the current article, we present software that offers easy and convenient means of recording and analyzing mouse movements in computerized laboratory experiments. In particular, we introduce and demonstrate the mousetrap plugin that adds mouse-tracking to OpenSesame, a popular general-purpose graphical experiment builder. By integrating with this existing experimental software, mousetrap allows for the creation of mouse-tracking studies through a graphical interface, without requiring programming skills. Thus, researchers can benefit from the core features of a validated software package and the many extensions available for it (e.g., the integration with auxiliary hardware such as eye-tracking, or the support of interactive experiments). In addition, the recorded data can be imported directly into the statistical programming language R using the mousetrap package, which greatly facilitates analysis. Mousetrap is cross-platform, open-source and available free of charge from https://github.com/pascalkieslich/mousetrap-os .

  2. TCGAbiolinks: an R/Bioconductor package for integrative analysis of TCGA data

    PubMed Central

    Colaprico, Antonio; Silva, Tiago C.; Olsen, Catharina; Garofano, Luciano; Cava, Claudia; Garolini, Davide; Sabedot, Thais S.; Malta, Tathiane M.; Pagnotta, Stefano M.; Castiglioni, Isabella; Ceccarelli, Michele; Bontempi, Gianluca; Noushmehr, Houtan

    2016-01-01

    The Cancer Genome Atlas (TCGA) research network has made public a large collection of clinical and molecular phenotypes of more than 10 000 tumor patients across 33 different tumor types. Using this cohort, TCGA has published over 20 marker papers detailing the genomic and epigenomic alterations associated with these tumor types. Although many important discoveries have been made by TCGA's research network, opportunities still exist to implement novel methods, thereby elucidating new biological pathways and diagnostic markers. However, mining the TCGA data presents several bioinformatics challenges, such as data retrieval and integration with clinical data and other molecular data types (e.g. RNA and DNA methylation). We developed an R/Bioconductor package called TCGAbiolinks to address these challenges and offer bioinformatics solutions by using a guided workflow to allow users to query, download and perform integrative analyses of TCGA data. We combined methods from computer science and statistics into the pipeline and incorporated methodologies developed in previous TCGA marker studies and in our own group. Using four different TCGA tumor types (Kidney, Brain, Breast and Colon) as examples, we provide case studies to illustrate examples of reproducibility, integrative analysis and utilization of different Bioconductor packages to advance and accelerate novel discoveries. PMID:26704973

  3. A Study of the NASS-CDS System for Injury/Fatality Rates of Occupants in Various Restraints and A Discussion of Alternative Presentation Methods

    PubMed Central

    Stucki, Sheldon Lee; Biss, David J.

    2000-01-01

    An analysis was performed using the National Automotive Sampling System Crashworthiness Data System (NASS-CDS) database to compare the injury/fatality rates of variously restrained driver occupants as compared to unrestrained driver occupants in the total database of drivers/frontals, and also by Delta-V. A structured search of the NASS-CDS was done using the SAS® statistical analysis software to extract the data for this analysis and the SUDAAN software package was used to arrive at statistical significance indicators. In addition, this paper goes on to investigate different methods for presenting results of accident database searches including significance results; a risk versus Delta-V format for specific exposures; and, a percent cumulative injury versus Delta-V format to characterize injury trends. These alternative analysis presentation methods are then discussed by example using the present study results. PMID:11558105

  4. An Overview of R in Health Decision Sciences.

    PubMed

    Jalal, Hawre; Pechlivanoglou, Petros; Krijkamp, Eline; Alarid-Escudero, Fernando; Enns, Eva; Hunink, M G Myriam

    2017-10-01

    As the complexity of health decision science applications increases, high-level programming languages are increasingly adopted for statistical analyses and numerical computations. These programming languages facilitate sophisticated modeling, model documentation, and analysis reproducibility. Among the high-level programming languages, the statistical programming framework R is gaining increased recognition. R is freely available, cross-platform compatible, and open source. A large community of users who have generated an extensive collection of well-documented packages and functions supports it. These functions facilitate applications of health decision science methodology as well as the visualization and communication of results. Although R's popularity is increasing among health decision scientists, methodological extensions of R in the field of decision analysis remain isolated. The purpose of this article is to provide an overview of existing R functionality that is applicable to the various stages of decision analysis, including model design, input parameter estimation, and analysis of model outputs.

  5. MEG and EEG data analysis with MNE-Python.

    PubMed

    Gramfort, Alexandre; Luessi, Martin; Larson, Eric; Engemann, Denis A; Strohmeier, Daniel; Brodbeck, Christian; Goj, Roman; Jas, Mainak; Brooks, Teon; Parkkonen, Lauri; Hämäläinen, Matti

    2013-12-26

    Magnetoencephalography and electroencephalography (M/EEG) measure the weak electromagnetic signals generated by neuronal activity in the brain. Using these signals to characterize and locate neural activation in the brain is a challenge that requires expertise in physics, signal processing, statistics, and numerical methods. As part of the MNE software suite, MNE-Python is an open-source software package that addresses this challenge by providing state-of-the-art algorithms implemented in Python that cover multiple methods of data preprocessing, source localization, statistical analysis, and estimation of functional connectivity between distributed brain regions. All algorithms and utility functions are implemented in a consistent manner with well-documented interfaces, enabling users to create M/EEG data analysis pipelines by writing Python scripts. Moreover, MNE-Python is tightly integrated with the core Python libraries for scientific comptutation (NumPy, SciPy) and visualization (matplotlib and Mayavi), as well as the greater neuroimaging ecosystem in Python via the Nibabel package. The code is provided under the new BSD license allowing code reuse, even in commercial products. Although MNE-Python has only been under heavy development for a couple of years, it has rapidly evolved with expanded analysis capabilities and pedagogical tutorials because multiple labs have collaborated during code development to help share best practices. MNE-Python also gives easy access to preprocessed datasets, helping users to get started quickly and facilitating reproducibility of methods by other researchers. Full documentation, including dozens of examples, is available at http://martinos.org/mne.

  6. MEG and EEG data analysis with MNE-Python

    PubMed Central

    Gramfort, Alexandre; Luessi, Martin; Larson, Eric; Engemann, Denis A.; Strohmeier, Daniel; Brodbeck, Christian; Goj, Roman; Jas, Mainak; Brooks, Teon; Parkkonen, Lauri; Hämäläinen, Matti

    2013-01-01

    Magnetoencephalography and electroencephalography (M/EEG) measure the weak electromagnetic signals generated by neuronal activity in the brain. Using these signals to characterize and locate neural activation in the brain is a challenge that requires expertise in physics, signal processing, statistics, and numerical methods. As part of the MNE software suite, MNE-Python is an open-source software package that addresses this challenge by providing state-of-the-art algorithms implemented in Python that cover multiple methods of data preprocessing, source localization, statistical analysis, and estimation of functional connectivity between distributed brain regions. All algorithms and utility functions are implemented in a consistent manner with well-documented interfaces, enabling users to create M/EEG data analysis pipelines by writing Python scripts. Moreover, MNE-Python is tightly integrated with the core Python libraries for scientific comptutation (NumPy, SciPy) and visualization (matplotlib and Mayavi), as well as the greater neuroimaging ecosystem in Python via the Nibabel package. The code is provided under the new BSD license allowing code reuse, even in commercial products. Although MNE-Python has only been under heavy development for a couple of years, it has rapidly evolved with expanded analysis capabilities and pedagogical tutorials because multiple labs have collaborated during code development to help share best practices. MNE-Python also gives easy access to preprocessed datasets, helping users to get started quickly and facilitating reproducibility of methods by other researchers. Full documentation, including dozens of examples, is available at http://martinos.org/mne. PMID:24431986

  7. ProUCL version 4.1.00 Documentation Downloads

    EPA Pesticide Factsheets

    ProUCL version 4.1.00 represents a comprehensive statistical software package equipped with statistical methods and graphical tools needed to address many environmental sampling and statistical issues as described in various these guidance documents.

  8. Metacoder: An R package for visualization and manipulation of community taxonomic diversity data.

    PubMed

    Foster, Zachary S L; Sharpton, Thomas J; Grünwald, Niklaus J

    2017-02-01

    Community-level data, the type generated by an increasing number of metabarcoding studies, is often graphed as stacked bar charts or pie graphs that use color to represent taxa. These graph types do not convey the hierarchical structure of taxonomic classifications and are limited by the use of color for categories. As an alternative, we developed metacoder, an R package for easily parsing, manipulating, and graphing publication-ready plots of hierarchical data. Metacoder includes a dynamic and flexible function that can parse most text-based formats that contain taxonomic classifications, taxon names, taxon identifiers, or sequence identifiers. Metacoder can then subset, sample, and order this parsed data using a set of intuitive functions that take into account the hierarchical nature of the data. Finally, an extremely flexible plotting function enables quantitative representation of up to 4 arbitrary statistics simultaneously in a tree format by mapping statistics to the color and size of tree nodes and edges. Metacoder also allows exploration of barcode primer bias by integrating functions to run digital PCR. Although it has been designed for data from metabarcoding research, metacoder can easily be applied to any data that has a hierarchical component such as gene ontology or geographic location data. Our package complements currently available tools for community analysis and is provided open source with an extensive online user manual.

  9. Metacoder: An R package for visualization and manipulation of community taxonomic diversity data

    PubMed Central

    Foster, Zachary S. L.; Sharpton, Thomas J.

    2017-01-01

    Community-level data, the type generated by an increasing number of metabarcoding studies, is often graphed as stacked bar charts or pie graphs that use color to represent taxa. These graph types do not convey the hierarchical structure of taxonomic classifications and are limited by the use of color for categories. As an alternative, we developed metacoder, an R package for easily parsing, manipulating, and graphing publication-ready plots of hierarchical data. Metacoder includes a dynamic and flexible function that can parse most text-based formats that contain taxonomic classifications, taxon names, taxon identifiers, or sequence identifiers. Metacoder can then subset, sample, and order this parsed data using a set of intuitive functions that take into account the hierarchical nature of the data. Finally, an extremely flexible plotting function enables quantitative representation of up to 4 arbitrary statistics simultaneously in a tree format by mapping statistics to the color and size of tree nodes and edges. Metacoder also allows exploration of barcode primer bias by integrating functions to run digital PCR. Although it has been designed for data from metabarcoding research, metacoder can easily be applied to any data that has a hierarchical component such as gene ontology or geographic location data. Our package complements currently available tools for community analysis and is provided open source with an extensive online user manual. PMID:28222096

  10. Implementation of building information modeling in Malaysian construction industry

    NASA Astrophysics Data System (ADS)

    Memon, Aftab Hameed; Rahman, Ismail Abdul; Harman, Nur Melly Edora

    2014-10-01

    This study has assessed the implementation level of Building Information Modeling (BIM) in the construction industry of Malaysia. It also investigated several computer software packages facilitating BIM and challenges affecting its implementation. Data collection for this study was carried out using questionnaire survey among the construction practitioners. 95 completed forms of questionnaire received against 150 distributed questionnaire sets from consultant, contractor and client organizations were analyzed statistically. Analysis findings indicated that the level of implementation of BIM in the construction industry of Malaysia is very low. Average index method employed to assess the effectiveness of various software packages of BIM highlighted that Bentley construction, AutoCAD and ArchiCAD are three most popular and effective software packages. Major challenges to BIM implementation are it requires enhanced collaboration, add work to a designer, interoperability and needs enhanced collaboration. For improving the level of implementing BIM in Malaysian industry, it is recommended that a flexible training program of BIM for all practitioners must be created.

  11. ROOT: A C++ framework for petabyte data storage, statistical analysis and visualization

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Antcheva, I.; /CERN; Ballintijn, M.

    2009-01-01

    ROOT is an object-oriented C++ framework conceived in the high-energy physics (HEP) community, designed for storing and analyzing petabytes of data in an efficient way. Any instance of a C++ class can be stored into a ROOT file in a machine-independent compressed binary format. In ROOT the TTree object container is optimized for statistical data analysis over very large data sets by using vertical data storage techniques. These containers can span a large number of files on local disks, the web or a number of different shared file systems. In order to analyze this data, the user can chose outmore » of a wide set of mathematical and statistical functions, including linear algebra classes, numerical algorithms such as integration and minimization, and various methods for performing regression analysis (fitting). In particular, the RooFit package allows the user to perform complex data modeling and fitting while the RooStats library provides abstractions and implementations for advanced statistical tools. Multivariate classification methods based on machine learning techniques are available via the TMVA package. A central piece in these analysis tools are the histogram classes which provide binning of one- and multi-dimensional data. Results can be saved in high-quality graphical formats like Postscript and PDF or in bitmap formats like JPG or GIF. The result can also be stored into ROOT macros that allow a full recreation and rework of the graphics. Users typically create their analysis macros step by step, making use of the interactive C++ interpreter CINT, while running over small data samples. Once the development is finished, they can run these macros at full compiled speed over large data sets, using on-the-fly compilation, or by creating a stand-alone batch program. Finally, if processing farms are available, the user can reduce the execution time of intrinsically parallel tasks - e.g. data mining in HEP - by using PROOF, which will take care of optimally distributing the work over the available resources in a transparent way.« less

  12. PhenStat | Informatics Technology for Cancer Research (ITCR)

    Cancer.gov

    PhenStat is a freely available R package that provides a variety of statistical methods for the identification of phenotypic associations from model organisms developed for the International Mouse Phenotyping Consortium (IMPC at www.mousephenotype.org ). The methods have been developed for high throughput phenotyping pipelines implemented across various experimental designs with an emphasis on managing temporal variation and is being adapted for analysis with PDX mouse strains.

  13. Data Preparation 101: How to Use Query-by-Example to Get Your Research Dataset Ready for Primetime

    ERIC Educational Resources Information Center

    Lazarony, Paul J.; Driscoll, Donna A.

    2011-01-01

    Researchers are often distressed to discover that the data they wanted to use in their landmark study is not configured in a way that is usable by a Statistical Analysis Software Package (SASP). For example, the data needed may come from two or more sources and it may not be clear to the researcher how to get them combined into one analyzable…

  14. Statistical package for improved analysis of hillslope monitoring data collected as part of the Board of Forestry's long-term monitoring program

    Treesearch

    Jack Lewis; Jim Baldwin

    1997-01-01

    The State of California has embarked upon a Long-Term Monitoring Program whose primary goal is to assess the effectiveness of the Forest Practice Rules and Review Process in protecting the beneficial uses of waters from the impacts of timber operations on private timberlands. The Board of Forestry's Monitoring Study Group concluded that hillslope monitoring should...

  15. climwin: An R Toolbox for Climate Window Analysis.

    PubMed

    Bailey, Liam D; van de Pol, Martijn

    2016-01-01

    When studying the impacts of climate change, there is a tendency to select climate data from a small set of arbitrary time periods or climate windows (e.g., spring temperature). However, these arbitrary windows may not encompass the strongest periods of climatic sensitivity and may lead to erroneous biological interpretations. Therefore, there is a need to consider a wider range of climate windows to better predict the impacts of future climate change. We introduce the R package climwin that provides a number of methods to test the effect of different climate windows on a chosen response variable and compare these windows to identify potential climate signals. climwin extracts the relevant data for each possible climate window and uses this data to fit a statistical model, the structure of which is chosen by the user. Models are then compared using an information criteria approach. This allows users to determine how well each window explains variation in the response variable and compare model support between windows. climwin also contains methods to detect type I and II errors, which are often a problem with this type of exploratory analysis. This article presents the statistical framework and technical details behind the climwin package and demonstrates the applicability of the method with a number of worked examples.

  16. Introduction to the Practice of Statistics David Moore Introduction to the Practice of Statistics and George McCabe WH. Freeman 850 £39.99 071676282X 071676282X [Formula: see text].

    PubMed

    2005-10-01

    This is a very well-written and beautifully presented book. It is north American in origin and, while it will be invaluable for teachers of statistics to nurses and other healthcare professionals, it is probably not suitable for many preor post-registration students in health in the UK. The material is quite advanced and, while well illustrated, exemplified and with numerous examples for students, it takes a fairly mathematical approach in places. Nevertheless, the book has much to commend it, including a CD-ROM package containing tutorials, a statistical package, solutions based on the exercises in the text and case studies.

  17. The Path Toward Universal Health Coverage.

    PubMed

    Yassoub, Rami; Alameddine, Mohamad; Saleh, Shadi

    2017-04-01

    Lebanon is a middle-income country with a market-maximized healthcare system that provides limited social protection for its citizens. Estimates reveal that half of the population lacks sufficient health coverage and resorts to out-of-pocket payments. This study triangulated data from a comprehensive review of health packages of countries similar to Lebanon, the Ministry of Public Health statistics, and services suggested by the World Health Organization for inclusion in a health benefits package (HBP). To determine the acceptability and viability of implementing the HBP, a stakeholder analysis was conducted to identify the knowledge, positions, and available resources for the package. The results revealed that the private health sector, having the most resources, is least in favor of implementing the package, whereas the political and civil society sectors support implementation. The main divergence in opinions among stakeholders was on the abolishment of out-of-pocket payments, mainly attributed to the potential abuse of the HBP's services by users. The study's findings encourage health decision makers to capitalize on the current political readiness by proposing the HBP for implementation in the path toward universal health coverage. This requires a consultative process, involving all stakeholders, in devising the strategy and implementation framework of a HBP.

  18. csaw: a Bioconductor package for differential binding analysis of ChIP-seq data using sliding windows

    PubMed Central

    Lun, Aaron T.L.; Smyth, Gordon K.

    2016-01-01

    Chromatin immunoprecipitation with massively parallel sequencing (ChIP-seq) is widely used to identify binding sites for a target protein in the genome. An important scientific application is to identify changes in protein binding between different treatment conditions, i.e. to detect differential binding. This can reveal potential mechanisms through which changes in binding may contribute to the treatment effect. The csaw package provides a framework for the de novo detection of differentially bound genomic regions. It uses a window-based strategy to summarize read counts across the genome. It exploits existing statistical software to test for significant differences in each window. Finally, it clusters windows into regions for output and controls the false discovery rate properly over all detected regions. The csaw package can handle arbitrarily complex experimental designs involving biological replicates. It can be applied to both transcription factor and histone mark datasets, and, more generally, to any type of sequencing data measuring genomic coverage. csaw performs favorably against existing methods for de novo DB analyses on both simulated and real data. csaw is implemented as a R software package and is freely available from the open-source Bioconductor project. PMID:26578583

  19. cit: hypothesis testing software for mediation analysis in genomic applications.

    PubMed

    Millstein, Joshua; Chen, Gary K; Breton, Carrie V

    2016-08-01

    The challenges of successfully applying causal inference methods include: (i) satisfying underlying assumptions, (ii) limitations in data/models accommodated by the software and (iii) low power of common multiple testing approaches. The causal inference test (CIT) is based on hypothesis testing rather than estimation, allowing the testable assumptions to be evaluated in the determination of statistical significance. A user-friendly software package provides P-values and optionally permutation-based FDR estimates (q-values) for potential mediators. It can handle single and multiple binary and continuous instrumental variables, binary or continuous outcome variables and adjustment covariates. Also, the permutation-based FDR option provides a non-parametric implementation. Simulation studies demonstrate the validity of the cit package and show a substantial advantage of permutation-based FDR over other common multiple testing strategies. The cit open-source R package is freely available from the CRAN website (https://cran.r-project.org/web/packages/cit/index.html) with embedded C ++ code that utilizes the GNU Scientific Library, also freely available (http://www.gnu.org/software/gsl/). joshua.millstein@usc.edu Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  20. Review of Statistical Learning Methods in Integrated Omics Studies (An Integrated Information Science).

    PubMed

    Zeng, Irene Sui Lan; Lumley, Thomas

    2018-01-01

    Integrated omics is becoming a new channel for investigating the complex molecular system in modern biological science and sets a foundation for systematic learning for precision medicine. The statistical/machine learning methods that have emerged in the past decade for integrated omics are not only innovative but also multidisciplinary with integrated knowledge in biology, medicine, statistics, machine learning, and artificial intelligence. Here, we review the nontrivial classes of learning methods from the statistical aspects and streamline these learning methods within the statistical learning framework. The intriguing findings from the review are that the methods used are generalizable to other disciplines with complex systematic structure, and the integrated omics is part of an integrated information science which has collated and integrated different types of information for inferences and decision making. We review the statistical learning methods of exploratory and supervised learning from 42 publications. We also discuss the strengths and limitations of the extended principal component analysis, cluster analysis, network analysis, and regression methods. Statistical techniques such as penalization for sparsity induction when there are fewer observations than the number of features and using Bayesian approach when there are prior knowledge to be integrated are also included in the commentary. For the completeness of the review, a table of currently available software and packages from 23 publications for omics are summarized in the appendix.

  1. Vertical bone measurements from cone beam computed tomography images using different software packages.

    PubMed

    Vasconcelos, Taruska Ventorini; Neves, Frederico Sampaio; Moraes, Lívia Almeida Bueno; Freitas, Deborah Queiroz

    2015-01-01

    This article aimed at comparing the accuracy of linear measurement tools of different commercial software packages. Eight fully edentulous dry mandibles were selected for this study. Incisor, canine, premolar, first molar and second molar regions were selected. Cone beam computed tomography (CBCT) images were obtained with i-CAT Next Generation. Linear bone measurements were performed by one observer on the cross-sectional images using three different software packages: XoranCat®, OnDemand3D® and KDIS3D®, all able to assess DICOM images. In addition, 25% of the sample was reevaluated for the purpose of reproducibility. The mandibles were sectioned to obtain the gold standard for each region. Intraclass coefficients (ICC) were calculated to examine the agreement between the two periods of evaluation; the one-way analysis of variance performed with the post-hoc Dunnett test was used to compare each of the software-derived measurements with the gold standard. The ICC values were excellent for all software packages. The least difference between the software-derived measurements and the gold standard was obtained with the OnDemand3D and KDIS3D (-0.11 and -0.14 mm, respectively), and the greatest, with the XoranCAT (+0.25 mm). However, there was no statistical significant difference between the measurements obtained with the different software packages and the gold standard (p> 0.05). In conclusion, linear bone measurements were not influenced by the software package used to reconstruct the image from CBCT DICOM data.

  2. PIV Data Validation Software Package

    NASA Technical Reports Server (NTRS)

    Blackshire, James L.

    1997-01-01

    A PIV data validation and post-processing software package was developed to provide semi-automated data validation and data reduction capabilities for Particle Image Velocimetry data sets. The software provides three primary capabilities including (1) removal of spurious vector data, (2) filtering, smoothing, and interpolating of PIV data, and (3) calculations of out-of-plane vorticity, ensemble statistics, and turbulence statistics information. The software runs on an IBM PC/AT host computer working either under Microsoft Windows 3.1 or Windows 95 operating systems.

  3. Bayesian inference for psychology. Part II: Example applications with JASP.

    PubMed

    Wagenmakers, Eric-Jan; Love, Jonathon; Marsman, Maarten; Jamil, Tahira; Ly, Alexander; Verhagen, Josine; Selker, Ravi; Gronau, Quentin F; Dropmann, Damian; Boutin, Bruno; Meerhoff, Frans; Knight, Patrick; Raj, Akash; van Kesteren, Erik-Jan; van Doorn, Johnny; Šmíra, Martin; Epskamp, Sacha; Etz, Alexander; Matzke, Dora; de Jong, Tim; van den Bergh, Don; Sarafoglou, Alexandra; Steingroever, Helen; Derks, Koen; Rouder, Jeffrey N; Morey, Richard D

    2018-02-01

    Bayesian hypothesis testing presents an attractive alternative to p value hypothesis testing. Part I of this series outlined several advantages of Bayesian hypothesis testing, including the ability to quantify evidence and the ability to monitor and update this evidence as data come in, without the need to know the intention with which the data were collected. Despite these and other practical advantages, Bayesian hypothesis tests are still reported relatively rarely. An important impediment to the widespread adoption of Bayesian tests is arguably the lack of user-friendly software for the run-of-the-mill statistical problems that confront psychologists for the analysis of almost every experiment: the t-test, ANOVA, correlation, regression, and contingency tables. In Part II of this series we introduce JASP ( http://www.jasp-stats.org ), an open-source, cross-platform, user-friendly graphical software package that allows users to carry out Bayesian hypothesis tests for standard statistical problems. JASP is based in part on the Bayesian analyses implemented in Morey and Rouder's BayesFactor package for R. Armed with JASP, the practical advantages of Bayesian hypothesis testing are only a mouse click away.

  4. Distance education methods are useful for delivering education to palliative caregivers: A single-arm trial of an education package (PalliativE Caregivers Education Package).

    PubMed

    Forbat, Liz; Robinson, Rowena; Bilton-Simek, Rachel; Francois, Karemah; Lewis, Marsha; Haraldsdottir, Erna

    2018-02-01

    Face-to-face/group education for palliative caregivers is successful, but relies on caregivers travelling, being absent from the patient, and rigid timings. This presents inequities for those in rural locations. To design and test an innovative distance-learning educational package (PrECEPt: PalliativE Caregivers Education Package). Single-arm mixed-method feasibility proof-of-concept trial (ACTRN12616000601437). The primary outcome was carer self-efficacy, with secondary outcomes focused on caregiver preparedness and carer tasks/needs. Analysis focused on three outcome measures (taken at baseline and 6 weeks) and feasibility/acceptability qualitative data. A single specialist palliative care service. Eligible informal caregivers were those of patients registered with the outpatient or community service, where the patient had a prognosis of ⩾12 weeks, supporting someone with nutrition/hydration and/or pain management needs, proficient in English and no major mental health diagnosis. Two modules were developed and tested (nutrition/hydration and pain management) with 18 caregivers. The materials did not have a statistically significant impact on carer self-efficacy. However, statistically significant improvements were observed on the two subsidiary measures of (1) caregiving tasks, consequences and needs ( p = 0.03, confidence interval: 0.72, 9.4) and (2) caregiver preparedness ( p = 0.001, confidence interval: -1.22, -0.46). The study determined that distance learning is acceptable and feasible for both caregivers and healthcare professionals. Distance education improves caregiver preparedness and is a feasible and acceptable approach. A two-arm trial would determine whether the materials benefitted caregivers and patients compared to a control group not receiving the materials. Additional modules could be fruitfully developed and offered.

  5. A RESEARCH DATABASE FOR IMPROVED DATA MANAGEMENT AND ANALYSIS IN LONGITUDINAL STUDIES

    PubMed Central

    BIELEFELD, ROGER A.; YAMASHITA, TOYOKO S.; KEREKES, EDWARD F.; ERCANLI, EHAT; SINGER, LYNN T.

    2014-01-01

    We developed a research database for a five-year prospective investigation of the medical, social, and developmental correlates of chronic lung disease during the first three years of life. We used the Ingres database management system and the Statit statistical software package. The database includes records containing 1300 variables each, the results of 35 psychological tests, each repeated five times (providing longitudinal data on the child, the parents, and behavioral interactions), both raw and calculated variables, and both missing and deferred values. The four-layer menu-driven user interface incorporates automatic activation of complex functions to handle data verification, missing and deferred values, static and dynamic backup, determination of calculated values, display of database status, reports, bulk data extraction, and statistical analysis. PMID:7596250

  6. Analysis to evaluate predictors of fiberboard aging to guide surveillance sampling for the 9975 life extension program

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kelly, Elizabeth J.; Daugherty, William L.; Hackney, Elizabeth R.

    During surveillance of the 9975 shipping package at the Savannah River Site K-Area Complex, several package dimensions are recorded. The analysis described in this report shows that, based on the current data analysis, two of these measurements, Upper Assembly Outer Diameter (UAOD) and Upper Assembly Inside Height (UAIH), do not have statistically significant aging trends regardless of wattage levels. In contrast, this analysis indicates that the measurement of Air Shield Gap (ASGap) does show a significant increase with age. It appears that the increase is greater for high wattage containers, but this result is dominated by two measurements from high-wattagemore » containers. For all three indicators, additional high-wattage, older containers need to be examined before any definitive conclusions can be reached. In addition, the current analysis indicates that ASGap measurements for low and medium wattage containers are increasing slowly over time. To reduce uncertainties and better capture the aging trend for these containers, additional low and medium wattage older containers should also be examined. Based on this analysis, surveillance guidance is to augment surveillance containers resulting from 3013 surveillance with 9975-focused sampling that targets older, high wattage containers and also includes some older, low and medium wattage containers. This focused sampling began in 2015 and will continue in 2016. The UAOD, UAIH and ASGap data are highly variable. It is possible that additional factors such as seasonal variation and packaging site location might reduce variability and be useful for focusing surveillance and predicting aging.« less

  7. Statistical analysis of water-quality data containing multiple detection limits: S-language software for regression on order statistics

    USGS Publications Warehouse

    Lee, L.; Helsel, D.

    2005-01-01

    Trace contaminants in water, including metals and organics, often are measured at sufficiently low concentrations to be reported only as values below the instrument detection limit. Interpretation of these "less thans" is complicated when multiple detection limits occur. Statistical methods for multiply censored, or multiple-detection limit, datasets have been developed for medical and industrial statistics, and can be employed to estimate summary statistics or model the distributions of trace-level environmental data. We describe S-language-based software tools that perform robust linear regression on order statistics (ROS). The ROS method has been evaluated as one of the most reliable procedures for developing summary statistics of multiply censored data. It is applicable to any dataset that has 0 to 80% of its values censored. These tools are a part of a software library, or add-on package, for the R environment for statistical computing. This library can be used to generate ROS models and associated summary statistics, plot modeled distributions, and predict exceedance probabilities of water-quality standards. ?? 2005 Elsevier Ltd. All rights reserved.

  8. User’s guide for MapMark4—An R package for the probability calculations in three-part mineral resource assessments

    USGS Publications Warehouse

    Ellefsen, Karl J.

    2017-06-27

    MapMark4 is a software package that implements the probability calculations in three-part mineral resource assessments. Functions within the software package are written in the R statistical programming language. These functions, their documentation, and a copy of this user’s guide are bundled together in R’s unit of shareable code, which is called a “package.” This user’s guide includes step-by-step instructions showing how the functions are used to carry out the probability calculations. The calculations are demonstrated using test data, which are included in the package.

  9. GWAMA: software for genome-wide association meta-analysis.

    PubMed

    Mägi, Reedik; Morris, Andrew P

    2010-05-28

    Despite the recent success of genome-wide association studies in identifying novel loci contributing effects to complex human traits, such as type 2 diabetes and obesity, much of the genetic component of variation in these phenotypes remains unexplained. One way to improving power to detect further novel loci is through meta-analysis of studies from the same population, increasing the sample size over any individual study. Although statistical software analysis packages incorporate routines for meta-analysis, they are ill equipped to meet the challenges of the scale and complexity of data generated in genome-wide association studies. We have developed flexible, open-source software for the meta-analysis of genome-wide association studies. The software incorporates a variety of error trapping facilities, and provides a range of meta-analysis summary statistics. The software is distributed with scripts that allow simple formatting of files containing the results of each association study and generate graphical summaries of genome-wide meta-analysis results. The GWAMA (Genome-Wide Association Meta-Analysis) software has been developed to perform meta-analysis of summary statistics generated from genome-wide association studies of dichotomous phenotypes or quantitative traits. Software with source files, documentation and example data files are freely available online at http://www.well.ox.ac.uk/GWAMA.

  10. airGRteaching: an R-package designed for teaching hydrology with lumped hydrological models

    NASA Astrophysics Data System (ADS)

    Thirel, Guillaume; Delaigue, Olivier; Coron, Laurent; Andréassian, Vazken; Brigode, Pierre

    2017-04-01

    Lumped hydrological models are useful and convenient tools for research, engineering and educational purposes. They propose catchment-scale representations of the precipitation-discharge relationship. Thanks to their limited data requirements, they can be easily implemented and run. With such models, it is possible to simulate a number of hydrological key processes over the catchment with limited structural and parametric complexity, typically evapotranspiration, runoff, underground losses, etc. The Hydrology Group at Irstea (Antony) has been developing a suite of rainfall-runoff models over the past 30 years. This resulted in a suite of models running at different time steps (from hourly to annual) applicable for various issues including water balance estimation, forecasting, simulation of impacts and scenario testing. Recently, Irstea has developed an easy-to-use R-package (R Core Team, 2016), called airGR (Coron et al., 2016, 2017), to make these models widely available. Although its initial target public was hydrological modellers, the package is already used for educational purposes. Indeed, simple models allow for rapidly visualising the effects of parameterizations and model components on flows hydrographs. In order to avoid the difficulties that students may have when manipulating R and datasets, we developed (Delaigue and Coron, 2016): - Three simplified functions to prepare data, calibrate a model and run a simulation - Simplified and dynamic plot functions - A shiny (Chang et al., 2016) interface that connects this R-package to a browser-based visualisation tool. On this interface, the students can use different hydrological models (including the possibility to use a snow-accounting model), manually modify their parameters and automatically calibrate their parameters with diverse objective functions. One of the visualisation tabs of the interface includes observed precipitation and temperature, simulated snowpack (if any), observed and simulated discharges, which are updated immediately (a calibration only needs a couple of seconds or less, a simulation is almost immediate). In addition, time series of internal variables, live-visualisation of internal variables evolution and performance statistics are provided. This interface allows for hands-on exercises that can include for instance the analysis by students of: - The effects of each parameter and model components on simulated discharge - The effects of objective functions based on high flows- or low flows-focused criteria on simulated discharge - The seasonality of the model components. References Winston Chang, Joe Cheng, JJ Allaire, Yihui Xie and Jonathan McPherson (2016). shiny: Web Application Framework for R. R package version 0.13.2. https://CRAN.R-project.org/package=shiny Coron L., Thirel G., Perrin C., Delaigue O., Andréassian V., airGR: a suite of lumped hydrological models in an R-package, Environmental Modelling and software, 2017, submitted. Coron, L., Perrin, C. and Michel, C. (2016). airGR: Suite of GR hydrological models for precipitation-runoff modelling. R package version 1.0.3. https://webgr.irstea.fr/airGR/?lang=en. Olivier Delaigue and Laurent Coron (2016). airGRteaching: Tools to simplify the use of the airGR hydrological package by students. R package version 0.0.1. https://webgr.irstea.fr/airGR/?lang=en R Core Team (2016). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/.

  11. Time-series RNA-seq analysis package (TRAP) and its application to the analysis of rice, Oryza sativa L. ssp. Japonica, upon drought stress.

    PubMed

    Jo, Kyuri; Kwon, Hawk-Bin; Kim, Sun

    2014-06-01

    Measuring expression levels of genes at the whole genome level can be useful for many purposes, especially for revealing biological pathways underlying specific phenotype conditions. When gene expression is measured over a time period, we have opportunities to understand how organisms react to stress conditions over time. Thus many biologists routinely measure whole genome level gene expressions at multiple time points. However, there are several technical difficulties for analyzing such whole genome expression data. In addition, these days gene expression data is often measured by using RNA-sequencing rather than microarray technologies and then analysis of expression data is much more complicated since the analysis process should start with mapping short reads and produce differentially activated pathways and also possibly interactions among pathways. In addition, many useful tools for analyzing microarray gene expression data are not applicable for the RNA-seq data. Thus a comprehensive package for analyzing time series transcriptome data is much needed. In this article, we present a comprehensive package, Time-series RNA-seq Analysis Package (TRAP), integrating all necessary tasks such as mapping short reads, measuring gene expression levels, finding differentially expressed genes (DEGs), clustering and pathway analysis for time-series data in a single environment. In addition to implementing useful algorithms that are not available for RNA-seq data, we extended existing pathway analysis methods, ORA and SPIA, for time series analysis and estimates statistical values for combined dataset by an advanced metric. TRAP also produces visual summary of pathway interactions. Gene expression change labeling, a practical clustering method used in TRAP, enables more accurate interpretation of the data when combined with pathway analysis. We applied our methods on a real dataset for the analysis of rice (Oryza sativa L. Japonica nipponbare) upon drought stress. The result showed that TRAP was able to detect pathways more accurately than several existing methods. TRAP is available at http://biohealth.snu.ac.kr/software/TRAP/. Copyright © 2014 Elsevier Inc. All rights reserved.

  12. Scout 2008 Version 1.0 User Guide

    EPA Science Inventory

    The Scout 2008 version 1.0 software package provides a wide variety of classical and robust statistical methods that are not typically available in other commercial software packages. A major part of Scout deals with classical, robust, and resistant univariate and multivariate ou...

  13. smwrGraphs—An R package for graphing hydrologic data, version 1.1.2

    USGS Publications Warehouse

    Lorenz, David L.; Diekoff, Aliesha L.

    2017-01-31

    This report describes an R package called smwrGraphs, which consists of a collection of graphing functions for hydrologic data within R, a programming language and software environment for statistical computing. The functions in the package have been developed by the U.S. Geological Survey to create high-quality graphs for publication or presentation of hydrologic data that meet U.S. Geological Survey graphics guidelines.

  14. Play It Again: Teaching Statistics with Monte Carlo Simulation

    ERIC Educational Resources Information Center

    Sigal, Matthew J.; Chalmers, R. Philip

    2016-01-01

    Monte Carlo simulations (MCSs) provide important information about statistical phenomena that would be impossible to assess otherwise. This article introduces MCS methods and their applications to research and statistical pedagogy using a novel software package for the R Project for Statistical Computing constructed to lessen the often steep…

  15. Rtop - an R package for interpolation of data with a variable spatial support - examples from river networks

    NASA Astrophysics Data System (ADS)

    Olav Skøien, Jon; Laaha, Gregor; Koffler, Daniel; Blöschl, Günter; Pebesma, Edzer; Parajka, Juraj; Viglione, Alberto

    2013-04-01

    Geostatistical methods have been applied only to a limited extent for spatial interpolation in applications where the observations have an irregular support, such as runoff characteristics or population health data. Several studies have shown the potential of such methods (Gottschalk 1993, Sauquet et al. 2000, Gottschalk et al. 2006, Skøien et al. 2006, Goovaerts 2008), but these developments have so far not led to easily accessible, versatile, easy to apply and open source software. Based on the top-kriging approach suggested by Skøien et al. (2006), we will here present the package rtop, which has been implemented in the statistical environment R (R Core Team 2012). Taking advantage of the existing methods in R for analysis of spatial objects (Bivand et al. 2008), and the extensive possibilities for visualizing the results, rtop makes it easy to apply geostatistical interpolation methods when observations have a non-point spatial support. Although the package is flexible regarding data input, the main application so far has been for interpolation along river networks. We will present some examples showing how the package can easily be used for such interpolation. The model will soon be uploaded to CRAN, but is in the meantime also available from R-forge and can be installed by: > install.packages("rtop", repos="http://R-Forge.R-project.org") Bivand, R.S., Pebesma, E.J. & Gómez-Rubio, V., 2008. Applied spatial data analysis with r: Springer. Goovaerts, P., 2008. Kriging and semivariogram deconvolution in the presence of irregular geographical units. Mathematical Geosciences, 40 (1), 101-128. Gottschalk, L., 1993. Interpolation of runoff applying objective methods. Stochastic Hydrology and Hydraulics, 7, 269-281. Gottschalk, L., Krasovskaia, I., Leblois, E. & Sauquet, E., 2006. Mapping mean and variance of runoff in a river basin. Hydrology and Earth System Sciences, 10, 469-484. R Core Team, 2012. R: A language and environment for statistical computing. Vienna, Austria, ISBN 3-900051-07-0. Sauquet, E., Gottschalk, L. & Leblois, E., 2000. Mapping average annual runoff: A hierarchical approach applying a stochastic interpolation scheme. Hydrological Sciences Journal, 45 (6), 799-815. Skøien, J.O., Merz, R. & Blöschl, G., 2006. Top-kriging - geostatistics on stream networks. Hydrology and Earth System Sciences, 10, 277-287.

  16. Intrinsic noise analyzer: a software package for the exploration of stochastic biochemical kinetics using the system size expansion.

    PubMed

    Thomas, Philipp; Matuschek, Hannes; Grima, Ramon

    2012-01-01

    The accepted stochastic descriptions of biochemical dynamics under well-mixed conditions are given by the Chemical Master Equation and the Stochastic Simulation Algorithm, which are equivalent. The latter is a Monte-Carlo method, which, despite enjoying broad availability in a large number of existing software packages, is computationally expensive due to the huge amounts of ensemble averaging required for obtaining accurate statistical information. The former is a set of coupled differential-difference equations for the probability of the system being in any one of the possible mesoscopic states; these equations are typically computationally intractable because of the inherently large state space. Here we introduce the software package intrinsic Noise Analyzer (iNA), which allows for systematic analysis of stochastic biochemical kinetics by means of van Kampen's system size expansion of the Chemical Master Equation. iNA is platform independent and supports the popular SBML format natively. The present implementation is the first to adopt a complementary approach that combines state-of-the-art analysis tools using the computer algebra system Ginac with traditional methods of stochastic simulation. iNA integrates two approximation methods based on the system size expansion, the Linear Noise Approximation and effective mesoscopic rate equations, which to-date have not been available to non-expert users, into an easy-to-use graphical user interface. In particular, the present methods allow for quick approximate analysis of time-dependent mean concentrations, variances, covariances and correlations coefficients, which typically outperforms stochastic simulations. These analytical tools are complemented by automated multi-core stochastic simulations with direct statistical evaluation and visualization. We showcase iNA's performance by using it to explore the stochastic properties of cooperative and non-cooperative enzyme kinetics and a gene network associated with circadian rhythms. The software iNA is freely available as executable binaries for Linux, MacOSX and Microsoft Windows, as well as the full source code under an open source license.

  17. Intrinsic Noise Analyzer: A Software Package for the Exploration of Stochastic Biochemical Kinetics Using the System Size Expansion

    PubMed Central

    Grima, Ramon

    2012-01-01

    The accepted stochastic descriptions of biochemical dynamics under well-mixed conditions are given by the Chemical Master Equation and the Stochastic Simulation Algorithm, which are equivalent. The latter is a Monte-Carlo method, which, despite enjoying broad availability in a large number of existing software packages, is computationally expensive due to the huge amounts of ensemble averaging required for obtaining accurate statistical information. The former is a set of coupled differential-difference equations for the probability of the system being in any one of the possible mesoscopic states; these equations are typically computationally intractable because of the inherently large state space. Here we introduce the software package intrinsic Noise Analyzer (iNA), which allows for systematic analysis of stochastic biochemical kinetics by means of van Kampen’s system size expansion of the Chemical Master Equation. iNA is platform independent and supports the popular SBML format natively. The present implementation is the first to adopt a complementary approach that combines state-of-the-art analysis tools using the computer algebra system Ginac with traditional methods of stochastic simulation. iNA integrates two approximation methods based on the system size expansion, the Linear Noise Approximation and effective mesoscopic rate equations, which to-date have not been available to non-expert users, into an easy-to-use graphical user interface. In particular, the present methods allow for quick approximate analysis of time-dependent mean concentrations, variances, covariances and correlations coefficients, which typically outperforms stochastic simulations. These analytical tools are complemented by automated multi-core stochastic simulations with direct statistical evaluation and visualization. We showcase iNA’s performance by using it to explore the stochastic properties of cooperative and non-cooperative enzyme kinetics and a gene network associated with circadian rhythms. The software iNA is freely available as executable binaries for Linux, MacOSX and Microsoft Windows, as well as the full source code under an open source license. PMID:22723865

  18. Evaluating performances of simplified physically based landslide susceptibility models.

    NASA Astrophysics Data System (ADS)

    Capparelli, Giovanna; Formetta, Giuseppe; Versace, Pasquale

    2015-04-01

    Rainfall induced shallow landslides cause significant damages involving loss of life and properties. Prediction of shallow landslides susceptible locations is a complex task that involves many disciplines: hydrology, geotechnical science, geomorphology, and statistics. Usually to accomplish this task two main approaches are used: statistical or physically based model. This paper presents a package of GIS based models for landslide susceptibility analysis. It was integrated in the NewAge-JGrass hydrological model using the Object Modeling System (OMS) modeling framework. The package includes three simplified physically based models for landslides susceptibility analysis (M1, M2, and M3) and a component for models verifications. It computes eight goodness of fit indices (GOF) by comparing pixel-by-pixel model results and measurements data. Moreover, the package integration in NewAge-JGrass allows the use of other components such as geographic information system tools to manage inputs-output processes, and automatic calibration algorithms to estimate model parameters. The system offers the possibility to investigate and fairly compare the quality and the robustness of models and models parameters, according a procedure that includes: i) model parameters estimation by optimizing each of the GOF index separately, ii) models evaluation in the ROC plane by using each of the optimal parameter set, and iii) GOF robustness evaluation by assessing their sensitivity to the input parameter variation. This procedure was repeated for all three models. The system was applied for a case study in Calabria (Italy) along the Salerno-Reggio Calabria highway, between Cosenza and Altilia municipality. The analysis provided that among all the optimized indices and all the three models, Average Index (AI) optimization coupled with model M3 is the best modeling solution for our test case. This research was funded by PON Project No. 01_01503 "Integrated Systems for Hydrogeological Risk Monitoring, Early Warning and Mitigation Along the Main Lifelines", CUP B31H11000370005, in the framework of the National Operational Program for "Research and Competitiveness" 2007-2013.

  19. Quick Overview Scout 2008 Version 1.0

    EPA Science Inventory

    The Scout 2008 version 1.0 statistical software package has been updated from past DOS and Windows versions to provide classical and robust univariate and multivariate graphical and statistical methods that are not typically available in commercial or freeware statistical softwar...

  20. Statistical analysis of solid waste composition data: Arithmetic mean, standard deviation and correlation coefficients.

    PubMed

    Edjabou, Maklawe Essonanawe; Martín-Fernández, Josep Antoni; Scheutz, Charlotte; Astrup, Thomas Fruergaard

    2017-11-01

    Data for fractional solid waste composition provide relative magnitudes of individual waste fractions, the percentages of which always sum to 100, thereby connecting them intrinsically. Due to this sum constraint, waste composition data represent closed data, and their interpretation and analysis require statistical methods, other than classical statistics that are suitable only for non-constrained data such as absolute values. However, the closed characteristics of waste composition data are often ignored when analysed. The results of this study showed, for example, that unavoidable animal-derived food waste amounted to 2.21±3.12% with a confidence interval of (-4.03; 8.45), which highlights the problem of the biased negative proportions. A Pearson's correlation test, applied to waste fraction generation (kg mass), indicated a positive correlation between avoidable vegetable food waste and plastic packaging. However, correlation tests applied to waste fraction compositions (percentage values) showed a negative association in this regard, thus demonstrating that statistical analyses applied to compositional waste fraction data, without addressing the closed characteristics of these data, have the potential to generate spurious or misleading results. Therefore, ¨compositional data should be transformed adequately prior to any statistical analysis, such as computing mean, standard deviation and correlation coefficients. Copyright © 2017 Elsevier Ltd. All rights reserved.

  1. [Development of an Excel spreadsheet for meta-analysis of indirect and mixed treatment comparisons].

    PubMed

    Tobías, Aurelio; Catalá-López, Ferrán; Roqué, Marta

    2014-01-01

    Meta-analyses in clinical research usually aimed to evaluate treatment efficacy and safety in direct comparison with a unique comparator. Indirect comparisons, using the Bucher's method, can summarize primary data when information from direct comparisons is limited or nonexistent. Mixed comparisons allow combining estimates from direct and indirect comparisons, increasing statistical power. There is a need for simple applications for meta-analysis of indirect and mixed comparisons. These can easily be conducted using a Microsoft Office Excel spreadsheet. We developed a spreadsheet for indirect and mixed effects comparisons of friendly use for clinical researchers interested in systematic reviews, but non-familiarized with the use of more advanced statistical packages. The use of the proposed Excel spreadsheet for indirect and mixed comparisons can be of great use in clinical epidemiology to extend the knowledge provided by traditional meta-analysis when evidence from direct comparisons is limited or nonexistent.

  2. Trend-surface analysis of morphometric parameters: A case study in southeastern Brazil

    NASA Astrophysics Data System (ADS)

    Grohmann, Carlos Henrique

    2005-10-01

    Trend-surface analysis was carried out on data from morphometric parameters isobase and hydraulic gradient. The study area, located in the eastern border of Quadrilátero Ferrífero, southeastern Brazil, presents four main geomorphological units, one characterized by fluvial dissection, two of mountainous relief, with a scarp of hundreds of meters of fall between them, and a flat plateau in the central portion of the fluvially dissected terrains. Morphometric maps were evaluated in GRASS-GIS and statistics were made on R statistical language, using the spatial package. Analysis of variance (ANOVA) was made to test the significance of each surface and the significance of increasing polynomial degree. The best results were achieved with sixth-order surface for isobase and second-order surface for hydraulic gradient. Shape and orientation of residual maps contours for selected trends were compared with structures inferred from several morphometric maps, and a good correlation is present.

  3. An evaluation of object-oriented image analysis techniques to identify motorized vehicle effects in semi-arid to arid ecosystems of the American West

    USGS Publications Warehouse

    Mladinich, C.

    2010-01-01

    Human disturbance is a leading ecosystem stressor. Human-induced modifications include transportation networks, areal disturbances due to resource extraction, and recreation activities. High-resolution imagery and object-oriented classification rather than pixel-based techniques have successfully identified roads, buildings, and other anthropogenic features. Three commercial, automated feature-extraction software packages (Visual Learning Systems' Feature Analyst, ENVI Feature Extraction, and Definiens Developer) were evaluated by comparing their ability to effectively detect the disturbed surface patterns from motorized vehicle traffic. Each package achieved overall accuracies in the 70% range, demonstrating the potential to map the surface patterns. The Definiens classification was more consistent and statistically valid. Copyright ?? 2010 by Bellwether Publishing, Ltd. All rights reserved.

  4. Statistical Measurement and Analysis of Claimant and Demographic Variables Affecting Processing and Adjudication Duration in The United States Army Physical Disability Evaluation System.

    DTIC Science & Technology

    1997-02-06

    Adjudication Duration 2 2. INTRODUCTION This retrospective study analyzes relationships of variables to adjudication and processing duration in the Army...Package for Social Scientists (SPSS), Standard Version 6.1, June 1994, to determine relationships among the dependent and independent variables... consanguinity between variables. Content and criterion validity is employed to determine the measure of scientific validity. Reliability is also

  5. Quantifying the Relationship between AMC Resources and U.S. Army Materiel Readiness

    DTIC Science & Technology

    1989-08-25

    Resource Management report 984 for the same period. Insufficient data precluded analysis of the OMA PEs Total Package Fielding and Life Cycle Software...procurement, had the greatest failure rates when subjected to the statistical tests merely because of the reduced number of data pairs. Analyses of...ENGINEERING DEVELOPMENT 6.5 - MANAGEMENT AND SUPPORT 6.7 - OPERATIONAL SYSTEM DEVELOPMENT P2 - GENERAL PURPOSE FORCES P3 - INTELIGENCE AND COMMUNICATIONS P7

  6. Data exploration, quality control and statistical analysis of ChIP-exo/nexus experiments

    PubMed Central

    Welch, Rene; Chung, Dongjun; Grass, Jeffrey; Landick, Robert

    2017-01-01

    Abstract ChIP-exo/nexus experiments rely on innovative modifications of the commonly used ChIP-seq protocol for high resolution mapping of transcription factor binding sites. Although many aspects of the ChIP-exo data analysis are similar to those of ChIP-seq, these high throughput experiments pose a number of unique quality control and analysis challenges. We develop a novel statistical quality control pipeline and accompanying R/Bioconductor package, ChIPexoQual, to enable exploration and analysis of ChIP-exo and related experiments. ChIPexoQual evaluates a number of key issues including strand imbalance, library complexity, and signal enrichment of data. Assessment of these features are facilitated through diagnostic plots and summary statistics computed over regions of the genome with varying levels of coverage. We evaluated our QC pipeline with both large collections of public ChIP-exo/nexus data and multiple, new ChIP-exo datasets from Escherichia coli. ChIPexoQual analysis of these datasets resulted in guidelines for using these QC metrics across a wide range of sequencing depths and provided further insights for modelling ChIP-exo data. PMID:28911122

  7. Data exploration, quality control and statistical analysis of ChIP-exo/nexus experiments.

    PubMed

    Welch, Rene; Chung, Dongjun; Grass, Jeffrey; Landick, Robert; Keles, Sündüz

    2017-09-06

    ChIP-exo/nexus experiments rely on innovative modifications of the commonly used ChIP-seq protocol for high resolution mapping of transcription factor binding sites. Although many aspects of the ChIP-exo data analysis are similar to those of ChIP-seq, these high throughput experiments pose a number of unique quality control and analysis challenges. We develop a novel statistical quality control pipeline and accompanying R/Bioconductor package, ChIPexoQual, to enable exploration and analysis of ChIP-exo and related experiments. ChIPexoQual evaluates a number of key issues including strand imbalance, library complexity, and signal enrichment of data. Assessment of these features are facilitated through diagnostic plots and summary statistics computed over regions of the genome with varying levels of coverage. We evaluated our QC pipeline with both large collections of public ChIP-exo/nexus data and multiple, new ChIP-exo datasets from Escherichia coli. ChIPexoQual analysis of these datasets resulted in guidelines for using these QC metrics across a wide range of sequencing depths and provided further insights for modelling ChIP-exo data. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  8. mapDIA: Preprocessing and statistical analysis of quantitative proteomics data from data independent acquisition mass spectrometry.

    PubMed

    Teo, Guoshou; Kim, Sinae; Tsou, Chih-Chiang; Collins, Ben; Gingras, Anne-Claude; Nesvizhskii, Alexey I; Choi, Hyungwon

    2015-11-03

    Data independent acquisition (DIA) mass spectrometry is an emerging technique that offers more complete detection and quantification of peptides and proteins across multiple samples. DIA allows fragment-level quantification, which can be considered as repeated measurements of the abundance of the corresponding peptides and proteins in the downstream statistical analysis. However, few statistical approaches are available for aggregating these complex fragment-level data into peptide- or protein-level statistical summaries. In this work, we describe a software package, mapDIA, for statistical analysis of differential protein expression using DIA fragment-level intensities. The workflow consists of three major steps: intensity normalization, peptide/fragment selection, and statistical analysis. First, mapDIA offers normalization of fragment-level intensities by total intensity sums as well as a novel alternative normalization by local intensity sums in retention time space. Second, mapDIA removes outlier observations and selects peptides/fragments that preserve the major quantitative patterns across all samples for each protein. Last, using the selected fragments and peptides, mapDIA performs model-based statistical significance analysis of protein-level differential expression between specified groups of samples. Using a comprehensive set of simulation datasets, we show that mapDIA detects differentially expressed proteins with accurate control of the false discovery rates. We also describe the analysis procedure in detail using two recently published DIA datasets generated for 14-3-3β dynamic interaction network and prostate cancer glycoproteome. The software was written in C++ language and the source code is available for free through SourceForge website http://sourceforge.net/projects/mapdia/.This article is part of a Special Issue entitled: Computational Proteomics. Copyright © 2015 Elsevier B.V. All rights reserved.

  9. Spotlight-8 Image Analysis Software

    NASA Technical Reports Server (NTRS)

    Klimek, Robert; Wright, Ted

    2006-01-01

    Spotlight is a cross-platform GUI-based software package designed to perform image analysis on sequences of images generated by combustion and fluid physics experiments run in a microgravity environment. Spotlight can perform analysis on a single image in an interactive mode or perform analysis on a sequence of images in an automated fashion. Image processing operations can be employed to enhance the image before various statistics and measurement operations are performed. An arbitrarily large number of objects can be analyzed simultaneously with independent areas of interest. Spotlight saves results in a text file that can be imported into other programs for graphing or further analysis. Spotlight can be run on Microsoft Windows, Linux, and Apple OS X platforms.

  10. 'spup' - an R package for uncertainty propagation in spatial environmental modelling

    NASA Astrophysics Data System (ADS)

    Sawicka, Kasia; Heuvelink, Gerard

    2016-04-01

    Computer models have become a crucial tool in engineering and environmental sciences for simulating the behaviour of complex static and dynamic systems. However, while many models are deterministic, the uncertainty in their predictions needs to be estimated before they are used for decision support. Currently, advances in uncertainty propagation and assessment have been paralleled by a growing number of software tools for uncertainty analysis, but none has gained recognition for a universal applicability, including case studies with spatial models and spatial model inputs. Due to the growing popularity and applicability of the open source R programming language we undertook a project to develop an R package that facilitates uncertainty propagation analysis in spatial environmental modelling. In particular, the 'spup' package provides functions for examining the uncertainty propagation starting from input data and model parameters, via the environmental model onto model predictions. The functions include uncertainty model specification, stochastic simulation and propagation of uncertainty using Monte Carlo (MC) techniques, as well as several uncertainty visualization functions. Uncertain environmental variables are represented in the package as objects whose attribute values may be uncertain and described by probability distributions. Both numerical and categorical data types are handled. Spatial auto-correlation within an attribute and cross-correlation between attributes is also accommodated for. For uncertainty propagation the package has implemented the MC approach with efficient sampling algorithms, i.e. stratified random sampling and Latin hypercube sampling. The design includes facilitation of parallel computing to speed up MC computation. The MC realizations may be used as an input to the environmental models called from R, or externally. Selected static and interactive visualization methods that are understandable by non-experts with limited background in statistics can be used to summarize and visualize uncertainty about the measured input, model parameters and output of the uncertainty propagation. We demonstrate that the 'spup' package is an effective and easy tool to apply and can be used in multi-disciplinary research and model-based decision support.

  11. Quantitative analysis of bayberry juice acidity based on visible and near-infrared spectroscopy

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Shao Yongni; He Yong; Mao Jingyuan

    Visible and near-infrared (Vis/NIR) reflectance spectroscopy has been investigated for its ability to nondestructively detect acidity in bayberry juice. What we believe to be a new, better mathematic model is put forward, which we have named principal component analysis-stepwise regression analysis-backpropagation neural network (PCA-SRA-BPNN), to build a correlation between the spectral reflectivity data and the acidity of bayberry juice. In this model, the optimum network parameters,such as the number of input nodes, hidden nodes, learning rate, and momentum, are chosen by the value of root-mean-square (rms) error. The results show that its prediction statistical parameters are correlation coefficient (r) ofmore » 0.9451 and root-mean-square error of prediction(RMSEP) of 0.1168. Partial least-squares (PLS) regression is also established to compare with this model. Before doing this, the influences of various spectral pretreatments (standard normal variate, multiplicative scatter correction, S. Golay first derivative, and wavelet package transform) are compared. The PLS approach with wavelet package transform preprocessing spectra is found to provide the best results, and its prediction statistical parameters are correlation coefficient (r) of 0.9061 and RMSEP of 0.1564. Hence, these two models are both desirable to analyze the data from Vis/NIR spectroscopy and to solve the problem of the acidity prediction of bayberry juice. This supplies basal research to ultimately realize the online measurements of the juice's internal quality through this Vis/NIR spectroscopy technique.« less

  12. Analysis of repeated measurement data in the clinical trials

    PubMed Central

    Singh, Vineeta; Rana, Rakesh Kumar; Singhal, Richa

    2013-01-01

    Statistics is an integral part of Clinical Trials. Elements of statistics span Clinical Trial design, data monitoring, analyses and reporting. A solid understanding of statistical concepts by clinicians improves the comprehension and the resulting quality of Clinical Trials. In biomedical research it has been seen that researcher frequently use t-test and ANOVA to compare means between the groups of interest irrespective of the nature of the data. In Clinical Trials we record the data on the patients more than two times. In such a situation using the standard ANOVA procedures is not appropriate as it does not consider dependencies between observations within subjects in the analysis. To deal with such types of study data Repeated Measure ANOVA should be used. In this article the application of One-way Repeated Measure ANOVA has been demonstrated by using the software SPSS (Statistical Package for Social Sciences) Version 15.0 on the data collected at four time points 0 day, 15th day, 30th day, and 45th day of multicentre clinical trial conducted on Pandu Roga (~Iron Deficiency Anemia) with an Ayurvedic formulation Dhatrilauha. PMID:23930038

  13. An automated approach to Litchfield and Wilcoxon's evaluation of dose–effect experiments using the R package LW1949

    USGS Publications Warehouse

    Adams, Jean V.; Slaght, Karen; Boogaard, Michael A.

    2016-01-01

    The authors developed a package, LW1949, for use with the statistical software R to automatically carry out the manual steps of Litchfield and Wilcoxon's method of evaluating dose–effect experiments. The LW1949 package consistently finds the best fitting dose–effect relation by minimizing the chi-squared statistic of the observed and expected number of affected individuals and substantially speeds up the line-fitting process and other calculations that Litchfield and Wilcoxon originally carried out by hand. Environ Toxicol Chem 2016;9999:1–4. Published 2016 Wiley Periodicals Inc. on behalf of SETAC. This article is a US Government work and, as such, is in the public domain in the United States of America.

  14. ppcor: An R Package for a Fast Calculation to Semi-partial Correlation Coefficients.

    PubMed

    Kim, Seongho

    2015-11-01

    Lack of a general matrix formula hampers implementation of the semi-partial correlation, also known as part correlation, to the higher-order coefficient. This is because the higher-order semi-partial correlation calculation using a recursive formula requires an enormous number of recursive calculations to obtain the correlation coefficients. To resolve this difficulty, we derive a general matrix formula of the semi-partial correlation for fast computation. The semi-partial correlations are then implemented on an R package ppcor along with the partial correlation. Owing to the general matrix formulas, users can readily calculate the coefficients of both partial and semi-partial correlations without computational burden. The package ppcor further provides users with the level of the statistical significance with its test statistic.

  15. Meta-analyses and Forest plots using a microsoft excel spreadsheet: step-by-step guide focusing on descriptive data analysis.

    PubMed

    Neyeloff, Jeruza L; Fuchs, Sandra C; Moreira, Leila B

    2012-01-20

    Meta-analyses are necessary to synthesize data obtained from primary research, and in many situations reviews of observational studies are the only available alternative. General purpose statistical packages can meta-analyze data, but usually require external macros or coding. Commercial specialist software is available, but may be expensive and focused in a particular type of primary data. Most available softwares have limitations in dealing with descriptive data, and the graphical display of summary statistics such as incidence and prevalence is unsatisfactory. Analyses can be conducted using Microsoft Excel, but there was no previous guide available. We constructed a step-by-step guide to perform a meta-analysis in a Microsoft Excel spreadsheet, using either fixed-effect or random-effects models. We have also developed a second spreadsheet capable of producing customized forest plots. It is possible to conduct a meta-analysis using only Microsoft Excel. More important, to our knowledge this is the first description of a method for producing a statistically adequate but graphically appealing forest plot summarizing descriptive data, using widely available software.

  16. Meta-analyses and Forest plots using a microsoft excel spreadsheet: step-by-step guide focusing on descriptive data analysis

    PubMed Central

    2012-01-01

    Background Meta-analyses are necessary to synthesize data obtained from primary research, and in many situations reviews of observational studies are the only available alternative. General purpose statistical packages can meta-analyze data, but usually require external macros or coding. Commercial specialist software is available, but may be expensive and focused in a particular type of primary data. Most available softwares have limitations in dealing with descriptive data, and the graphical display of summary statistics such as incidence and prevalence is unsatisfactory. Analyses can be conducted using Microsoft Excel, but there was no previous guide available. Findings We constructed a step-by-step guide to perform a meta-analysis in a Microsoft Excel spreadsheet, using either fixed-effect or random-effects models. We have also developed a second spreadsheet capable of producing customized forest plots. Conclusions It is possible to conduct a meta-analysis using only Microsoft Excel. More important, to our knowledge this is the first description of a method for producing a statistically adequate but graphically appealing forest plot summarizing descriptive data, using widely available software. PMID:22264277

  17. BCM: toolkit for Bayesian analysis of Computational Models using samplers.

    PubMed

    Thijssen, Bram; Dijkstra, Tjeerd M H; Heskes, Tom; Wessels, Lodewyk F A

    2016-10-21

    Computational models in biology are characterized by a large degree of uncertainty. This uncertainty can be analyzed with Bayesian statistics, however, the sampling algorithms that are frequently used for calculating Bayesian statistical estimates are computationally demanding, and each algorithm has unique advantages and disadvantages. It is typically unclear, before starting an analysis, which algorithm will perform well on a given computational model. We present BCM, a toolkit for the Bayesian analysis of Computational Models using samplers. It provides efficient, multithreaded implementations of eleven algorithms for sampling from posterior probability distributions and for calculating marginal likelihoods. BCM includes tools to simplify the process of model specification and scripts for visualizing the results. The flexible architecture allows it to be used on diverse types of biological computational models. In an example inference task using a model of the cell cycle based on ordinary differential equations, BCM is significantly more efficient than existing software packages, allowing more challenging inference problems to be solved. BCM represents an efficient one-stop-shop for computational modelers wishing to use sampler-based Bayesian statistics.

  18. REddyProc: Enabling researchers to process Eddy-Covariance data

    NASA Astrophysics Data System (ADS)

    Wutzler, Thomas; Moffat, Antje; Migliavacca, Mirco; Knauer, Jürgen; Menzer, Olaf; Sickel, Kerstin; Reichstein, Markus

    2017-04-01

    Analysing Eddy-Covariance measurements involves extensive processing, which puts technical labour to researchers. There is a need to overcome difficulties in data processing associated with deploying, adapting and using existing software and online tools. We tackled that need by developing the REddyProc package in the open source cross-platform language R that provides standard processing routines for reading half-hourly files from different formats, including from the recently released FLUXNET 2015 dataset, uStar threshold estimation and associated uncertainty, gap-filling, flux partitioning (both night-time or daytime based), and visualization of results. Although different in some features, the package mimics the online tool that has been extensively used by many users and site Principal Investigators (PIs) in the last years, and available on the website of the Max Planck Institute for Biogeochemistry. Generally, REddyProc results are statistically equal to results based on the state-of the art tools. The provided routines can be easily installed, configured, used, and integrated with further analysis. Hence the eddy covariance community will benefit from using the provided package allowing easier integration of standard processing with extended analysis. This complements activities by AmeriFlux, ICOS, NEON, and other regional networks for developing codes for standardized data processing of multiple sites in FLUXNET.

  19. GAC: Gene Associations with Clinical, a web based application.

    PubMed

    Zhang, Xinyan; Rupji, Manali; Kowalski, Jeanne

    2017-01-01

    We present GAC, a shiny R based tool for interactive visualization of clinical associations based on high-dimensional data. The tool provides a web-based suite to perform supervised principal component analysis (SuperPC), an approach that uses both high-dimensional data, such as gene expression, combined with clinical data to infer clinical associations. We extended the approach to address binary outcomes, in addition to continuous and time-to-event data in our package, thereby increasing the use and flexibility of SuperPC.  Additionally, the tool provides an interactive visualization for summarizing results based on a forest plot for both binary and time-to-event data.  In summary, the GAC suite of tools provide a one stop shop for conducting statistical analysis to identify and visualize the association between a clinical outcome of interest and high-dimensional data types, such as genomic data. Our GAC package has been implemented in R and is available via http://shinygispa.winship.emory.edu/GAC/. The developmental repository is available at https://github.com/manalirupji/GAC.

  20. dendextend: an R package for visualizing, adjusting and comparing trees of hierarchical clustering

    PubMed Central

    2015-01-01

    Summary: dendextend is an R package for creating and comparing visually appealing tree diagrams. dendextend provides utility functions for manipulating dendrogram objects (their color, shape and content) as well as several advanced methods for comparing trees to one another (both statistically and visually). As such, dendextend offers a flexible framework for enhancing R's rich ecosystem of packages for performing hierarchical clustering of items. Availability and implementation: The dendextend R package (including detailed introductory vignettes) is available under the GPL-2 Open Source license and is freely available to download from CRAN at: (http://cran.r-project.org/package=dendextend) Contact: Tal.Galili@math.tau.ac.il PMID:26209431

  1. A flexible, interpretable framework for assessing sensitivity to unmeasured confounding.

    PubMed

    Dorie, Vincent; Harada, Masataka; Carnegie, Nicole Bohme; Hill, Jennifer

    2016-09-10

    When estimating causal effects, unmeasured confounding and model misspecification are both potential sources of bias. We propose a method to simultaneously address both issues in the form of a semi-parametric sensitivity analysis. In particular, our approach incorporates Bayesian Additive Regression Trees into a two-parameter sensitivity analysis strategy that assesses sensitivity of posterior distributions of treatment effects to choices of sensitivity parameters. This results in an easily interpretable framework for testing for the impact of an unmeasured confounder that also limits the number of modeling assumptions. We evaluate our approach in a large-scale simulation setting and with high blood pressure data taken from the Third National Health and Nutrition Examination Survey. The model is implemented as open-source software, integrated into the treatSens package for the R statistical programming language. © 2016 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd. © 2016 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd.

  2. Parametric Analysis to Study the Influence of Aerogel-Based Renders' Components on Thermal and Mechanical Performance.

    PubMed

    Ximenes, Sofia; Silva, Ana; Soares, António; Flores-Colen, Inês; de Brito, Jorge

    2016-05-04

    Statistical models using multiple linear regression are some of the most widely used methods to study the influence of independent variables in a given phenomenon. This study's objective is to understand the influence of the various components of aerogel-based renders on their thermal and mechanical performance, namely cement (three types), fly ash, aerial lime, silica sand, expanded clay, type of aerogel, expanded cork granules, expanded perlite, air entrainers, resins (two types), and rheological agent. The statistical analysis was performed using SPSS (Statistical Package for Social Sciences), based on 85 mortar mixes produced in the laboratory and on their values of thermal conductivity and compressive strength obtained using tests in small-scale samples. The results showed that aerial lime assumes the main role in improving the thermal conductivity of the mortars. Aerogel type, fly ash, expanded perlite and air entrainers are also relevant components for a good thermal conductivity. Expanded clay can improve the mechanical behavior and aerogel has the opposite effect.

  3. Parametric Analysis to Study the Influence of Aerogel-Based Renders’ Components on Thermal and Mechanical Performance

    PubMed Central

    Ximenes, Sofia; Silva, Ana; Soares, António; Flores-Colen, Inês; de Brito, Jorge

    2016-01-01

    Statistical models using multiple linear regression are some of the most widely used methods to study the influence of independent variables in a given phenomenon. This study’s objective is to understand the influence of the various components of aerogel-based renders on their thermal and mechanical performance, namely cement (three types), fly ash, aerial lime, silica sand, expanded clay, type of aerogel, expanded cork granules, expanded perlite, air entrainers, resins (two types), and rheological agent. The statistical analysis was performed using SPSS (Statistical Package for Social Sciences), based on 85 mortar mixes produced in the laboratory and on their values of thermal conductivity and compressive strength obtained using tests in small-scale samples. The results showed that aerial lime assumes the main role in improving the thermal conductivity of the mortars. Aerogel type, fly ash, expanded perlite and air entrainers are also relevant components for a good thermal conductivity. Expanded clay can improve the mechanical behavior and aerogel has the opposite effect. PMID:28773460

  4. Mathematical and Statistical Software Index. Final Report.

    ERIC Educational Resources Information Center

    Black, Doris E., Comp.

    Brief descriptions are provided of general-purpose mathematical and statistical software, including 27 "stand-alone" programs, three subroutine systems, and two nationally recognized statistical packages, which are available in the Air Force Human Resources Laboratory (AFHRL) software library. This index was created to enable researchers…

  5. Automated spectral and timing analysis of AGNs

    NASA Astrophysics Data System (ADS)

    Munz, F.; Karas, V.; Guainazzi, M.

    2006-12-01

    % We have developed an autonomous script that helps the user to automate the XMM-Newton data analysis for the purposes of extensive statistical investigations. We test this approach by examining X-ray spectra of bright AGNs pre-selected from the public database. The event lists extracted in this process were studied further by constructing their energy-resolved Fourier power-spectrum density. This analysis combines energy distributions, light-curves, and their power-spectra and it proves useful to assess the variability patterns present is the data. As another example, an automated search was based on the XSPEC package to reveal the emission features in 2-8 keV range.

  6. TCC: an R package for comparing tag count data with robust normalization strategies

    PubMed Central

    2013-01-01

    Background Differential expression analysis based on “next-generation” sequencing technologies is a fundamental means of studying RNA expression. We recently developed a multi-step normalization method (called TbT) for two-group RNA-seq data with replicates and demonstrated that the statistical methods available in four R packages (edgeR, DESeq, baySeq, and NBPSeq) together with TbT can produce a well-ranked gene list in which true differentially expressed genes (DEGs) are top-ranked and non-DEGs are bottom ranked. However, the advantages of the current TbT method come at the cost of a huge computation time. Moreover, the R packages did not have normalization methods based on such a multi-step strategy. Results TCC (an acronym for Tag Count Comparison) is an R package that provides a series of functions for differential expression analysis of tag count data. The package incorporates multi-step normalization methods, whose strategy is to remove potential DEGs before performing the data normalization. The normalization function based on this DEG elimination strategy (DEGES) includes (i) the original TbT method based on DEGES for two-group data with or without replicates, (ii) much faster methods for two-group data with or without replicates, and (iii) methods for multi-group comparison. TCC provides a simple unified interface to perform such analyses with combinations of functions provided by edgeR, DESeq, and baySeq. Additionally, a function for generating simulation data under various conditions and alternative DEGES procedures consisting of functions in the existing packages are provided. Bioinformatics scientists can use TCC to evaluate their methods, and biologists familiar with other R packages can easily learn what is done in TCC. Conclusion DEGES in TCC is essential for accurate normalization of tag count data, especially when up- and down-regulated DEGs in one of the samples are extremely biased in their number. TCC is useful for analyzing tag count data in various scenarios ranging from unbiased to extremely biased differential expression. TCC is available at http://www.iu.a.u-tokyo.ac.jp/~kadota/TCC/ and will appear in Bioconductor (http://bioconductor.org/) from ver. 2.13. PMID:23837715

  7. GIS and statistical analysis for landslide susceptibility mapping in the Daunia area, Italy

    NASA Astrophysics Data System (ADS)

    Mancini, F.; Ceppi, C.; Ritrovato, G.

    2010-09-01

    This study focuses on landslide susceptibility mapping in the Daunia area (Apulian Apennines, Italy) and achieves this by using a multivariate statistical method and data processing in a Geographical Information System (GIS). The Logistic Regression (hereafter LR) method was chosen to produce a susceptibility map over an area of 130 000 ha where small settlements are historically threatened by landslide phenomena. By means of LR analysis, the tendency to landslide occurrences was, therefore, assessed by relating a landslide inventory (dependent variable) to a series of causal factors (independent variables) which were managed in the GIS, while the statistical analyses were performed by means of the SPSS (Statistical Package for the Social Sciences) software. The LR analysis produced a reliable susceptibility map of the investigated area and the probability level of landslide occurrence was ranked in four classes. The overall performance achieved by the LR analysis was assessed by local comparison between the expected susceptibility and an independent dataset extrapolated from the landslide inventory. Of the samples classified as susceptible to landslide occurrences, 85% correspond to areas where landslide phenomena have actually occurred. In addition, the consideration of the regression coefficients provided by the analysis demonstrated that a major role is played by the "land cover" and "lithology" causal factors in determining the occurrence and distribution of landslide phenomena in the Apulian Apennines.

  8. Interactive Visualization of Assessment Data: The Software Package Mondrian

    ERIC Educational Resources Information Center

    Unlu, Ali; Sargin, Anatol

    2009-01-01

    Mondrian is state-of-the-art statistical data visualization software featuring modern interactive visualization techniques for a wide range of data types. This article reviews the capabilities, functionality, and interactive properties of this software package. Key features of Mondrian are illustrated with data from the Programme for International…

  9. 77 FR 50677 - Proposed Information Collection; Comment Request; Boundary and Annexation Survey

    Federal Register 2010, 2011, 2012, 2013, 2014

    2012-08-22

    ..., preparing population estimates, and supporting other statistical programs of the Census Bureau, and the... survey. The typical BAS package contains: 1. Introductory letter from the Director of the Census Bureau... Census Bureau. The typical Digital BAS package contains: 1. Introductory letter from the Director of the...

  10. Analysis of longitudinal data from animals with missing values using SPSS.

    PubMed

    Duricki, Denise A; Soleman, Sara; Moon, Lawrence D F

    2016-06-01

    Testing of therapies for disease or injury often involves the analysis of longitudinal data from animals. Modern analytical methods have advantages over conventional methods (particularly when some data are missing), yet they are not used widely by preclinical researchers. Here we provide an easy-to-use protocol for the analysis of longitudinal data from animals, and we present a click-by-click guide for performing suitable analyses using the statistical package IBM SPSS Statistics software (SPSS). We guide readers through the analysis of a real-life data set obtained when testing a therapy for brain injury (stroke) in elderly rats. If a few data points are missing, as in this example data set (for example, because of animal dropout), repeated-measures analysis of covariance may fail to detect a treatment effect. An alternative analysis method, such as the use of linear models (with various covariance structures), and analysis using restricted maximum likelihood estimation (to include all available data) can be used to better detect treatment effects. This protocol takes 2 h to carry out.

  11. Young adolescents' engagement in dietary behaviour - the impact of gender, socio-economic status, self-efficacy and scientific literacy. Methodological aspects of constructing measures in nutrition literacy research using the Rasch model.

    PubMed

    Guttersrud, Øystein; Petterson, Kjell Sverre

    2015-10-01

    The present study validates a revised scale measuring individuals' level of the 'engagement in dietary behaviour' aspect of 'critical nutrition literacy' and describes how background factors affect this aspect of Norwegian tenth-grade students' nutrition literacy. Data were gathered electronically during a field trial of a standardised sample test in science. Test items and questionnaire constructs were distributed evenly across four electronic field-test booklets. Data management and analysis were performed using the RUMM2030 item analysis package and the IBM SPSS Statistics 20 statistical software package. Students responded on computers at school. Seven hundred and forty tenth-grade students at twenty-seven randomly sampled public schools were enrolled in the field-test study. The engagement in dietary behaviour scale and the self-efficacy in science scale were distributed to 178 of these students. The dietary behaviour scale and the self-efficacy in science scale came out as valid, reliable and well-targeted instruments usable for the construction of measurements. Girls and students with high self-efficacy reported higher engagement in dietary behaviour than other students. Socio-economic status and scientific literacy - measured as ability in science by applying an achievement test - did not correlate significantly different from zero with students' engagement in dietary behaviour.

  12. Detection of micro solder balls using active thermography and probabilistic neural network

    NASA Astrophysics Data System (ADS)

    He, Zhenzhi; Wei, Li; Shao, Minghui; Lu, Xingning

    2017-03-01

    Micro solder ball/bump has been widely used in electronic packaging. It has been challenging to inspect these structures as the solder balls/bumps are often embedded between the component and substrates, especially in flip-chip packaging. In this paper, a detection method for micro solder ball/bump based on the active thermography and the probabilistic neural network is investigated. A VH680 infrared imager is used to capture the thermal image of the test vehicle, SFA10 packages. The temperature curves are processed using moving average technique to remove the peak noise. And the principal component analysis (PCA) is adopted to reconstruct the thermal images. The missed solder balls can be recognized explicitly in the second principal component image. Probabilistic neural network (PNN) is then established to identify the defective bump intelligently. The hot spots corresponding to the solder balls are segmented from the PCA reconstructed image, and statistic parameters are calculated. To characterize the thermal properties of solder bump quantitatively, three representative features are selected and used as the input vector in PNN clustering. The results show that the actual outputs and the expected outputs are consistent in identification of the missed solder balls, and all the bumps were recognized accurately, which demonstrates the viability of the PNN in effective defect inspection in high-density microelectronic packaging.

  13. Dynamic assessment of microbial ecology (DAME): a web app for interactive analysis and visualization of microbial sequencing data.

    PubMed

    Piccolo, Brian D; Wankhade, Umesh D; Chintapalli, Sree V; Bhattacharyya, Sudeepa; Chunqiao, Luo; Shankar, Kartik

    2018-03-15

    Dynamic assessment of microbial ecology (DAME) is a Shiny-based web application for interactive analysis and visualization of microbial sequencing data. DAME provides researchers not familiar with R programming the ability to access the most current R functions utilized for ecology and gene sequencing data analyses. Currently, DAME supports group comparisons of several ecological estimates of α-diversity and β-diversity, along with differential abundance analysis of individual taxa. Using the Shiny framework, the user has complete control of all aspects of the data analysis, including sample/experimental group selection and filtering, estimate selection, statistical methods and visualization parameters. Furthermore, graphical and tabular outputs are supported by R packages using D3.js and are fully interactive. DAME was implemented in R but can be modified by Hypertext Markup Language (HTML), Cascading Style Sheets (CSS), and JavaScript. It is freely available on the web at https://acnc-shinyapps.shinyapps.io/DAME/. Local installation and source code are available through Github (https://github.com/bdpiccolo/ACNC-DAME). Any system with R can launch DAME locally provided the shiny package is installed. bdpiccolo@uams.edu.

  14. Software engineering the mixed model for genome-wide association studies on large samples.

    PubMed

    Zhang, Zhiwu; Buckler, Edward S; Casstevens, Terry M; Bradbury, Peter J

    2009-11-01

    Mixed models improve the ability to detect phenotype-genotype associations in the presence of population stratification and multiple levels of relatedness in genome-wide association studies (GWAS), but for large data sets the resource consumption becomes impractical. At the same time, the sample size and number of markers used for GWAS is increasing dramatically, resulting in greater statistical power to detect those associations. The use of mixed models with increasingly large data sets depends on the availability of software for analyzing those models. While multiple software packages implement the mixed model method, no single package provides the best combination of fast computation, ability to handle large samples, flexible modeling and ease of use. Key elements of association analysis with mixed models are reviewed, including modeling phenotype-genotype associations using mixed models, population stratification, kinship and its estimation, variance component estimation, use of best linear unbiased predictors or residuals in place of raw phenotype, improving efficiency and software-user interaction. The available software packages are evaluated, and suggestions made for future software development.

  15. TOMS and SBUV Data: Comparison to 3D Chemical-Transport Model Results

    NASA Technical Reports Server (NTRS)

    Stolarski, Richard S.; Douglass, Anne R.; Steenrod, Steve; Frith, Stacey

    2003-01-01

    We have updated our merged ozone data (MOD) set using the TOMS data from the new version 8 algorithm. We then analyzed these data for contributions from solar cycle, volcanoes, QBO, and halogens using a standard statistical time series model. We have recently completed a hindcast run of our 3D chemical-transport model for the same years. This model uses off-line winds from the finite-volume GCM, a full stratospheric photochemistry package, and time-varying forcing due to halogens, solar uv, and volcanic aerosols. We will report on a parallel analysis of these model results using the same statistical time series technique as used for the MOD data.

  16. Low-cost digital image processing at the University of Oklahoma

    NASA Technical Reports Server (NTRS)

    Harrington, J. A., Jr.

    1981-01-01

    Computer assisted instruction in remote sensing at the University of Oklahoma involves two separate approaches and is dependent upon initial preprocessing of a LANDSAT computer compatible tape using software developed for an IBM 370/158 computer. In-house generated preprocessing algorithms permits students or researchers to select a subset of a LANDSAT scene for subsequent analysis using either general purpose statistical packages or color graphic image processing software developed for Apple II microcomputers. Procedures for preprocessing the data and image analysis using either of the two approaches for low-cost LANDSAT data processing are described.

  17. FluxPyt: a Python-based free and open-source software for 13C-metabolic flux analyses.

    PubMed

    Desai, Trunil S; Srivastava, Shireesh

    2018-01-01

    13 C-Metabolic flux analysis (MFA) is a powerful approach to estimate intracellular reaction rates which could be used in strain analysis and design. Processing and analysis of labeling data for calculation of fluxes and associated statistics is an essential part of MFA. However, various software currently available for data analysis employ proprietary platforms and thus limit accessibility. We developed FluxPyt, a Python-based truly open-source software package for conducting stationary 13 C-MFA data analysis. The software is based on the efficient elementary metabolite unit framework. The standard deviations in the calculated fluxes are estimated using the Monte-Carlo analysis. FluxPyt also automatically creates flux maps based on a template for visualization of the MFA results. The flux distributions calculated by FluxPyt for two separate models: a small tricarboxylic acid cycle model and a larger Corynebacterium glutamicum model, were found to be in good agreement with those calculated by a previously published software. FluxPyt was tested in Microsoft™ Windows 7 and 10, as well as in Linux Mint 18.2. The availability of a free and open 13 C-MFA software that works in various operating systems will enable more researchers to perform 13 C-MFA and to further modify and develop the package.

  18. FluxPyt: a Python-based free and open-source software for 13C-metabolic flux analyses

    PubMed Central

    Desai, Trunil S.

    2018-01-01

    13C-Metabolic flux analysis (MFA) is a powerful approach to estimate intracellular reaction rates which could be used in strain analysis and design. Processing and analysis of labeling data for calculation of fluxes and associated statistics is an essential part of MFA. However, various software currently available for data analysis employ proprietary platforms and thus limit accessibility. We developed FluxPyt, a Python-based truly open-source software package for conducting stationary 13C-MFA data analysis. The software is based on the efficient elementary metabolite unit framework. The standard deviations in the calculated fluxes are estimated using the Monte-Carlo analysis. FluxPyt also automatically creates flux maps based on a template for visualization of the MFA results. The flux distributions calculated by FluxPyt for two separate models: a small tricarboxylic acid cycle model and a larger Corynebacterium glutamicum model, were found to be in good agreement with those calculated by a previously published software. FluxPyt was tested in Microsoft™ Windows 7 and 10, as well as in Linux Mint 18.2. The availability of a free and open 13C-MFA software that works in various operating systems will enable more researchers to perform 13C-MFA and to further modify and develop the package. PMID:29736347

  19. Program for narrow-band analysis of aircraft flyover noise using ensemble averaging techniques

    NASA Technical Reports Server (NTRS)

    Gridley, D.

    1982-01-01

    A package of computer programs was developed for analyzing acoustic data from an aircraft flyover. The package assumes the aircraft is flying at constant altitude and constant velocity in a fixed attitude over a linear array of ground microphones. Aircraft position is provided by radar and an option exists for including the effects of the aircraft's rigid-body attitude relative to the flight path. Time synchronization between radar and acoustic recording stations permits ensemble averaging techniques to be applied to the acoustic data thereby increasing the statistical accuracy of the acoustic results. Measured layered meteorological data obtained during the flyovers are used to compute propagation effects through the atmosphere. Final results are narrow-band spectra and directivities corrected for the flight environment to an equivalent static condition at a specified radius.

  20. A hands-on practical tutorial on performing meta-analysis with Stata.

    PubMed

    Chaimani, Anna; Mavridis, Dimitris; Salanti, Georgia

    2014-11-01

    Statistical synthesis of research findings via meta-analysis is widely used to assess the relative effectiveness of competing interventions. A series of three papers aimed at familiarising mental health scientists with the key statistical concepts and problems in meta-analysis was recently published in this journal. One paper focused on the selection and interpretation of the appropriate model to synthesise results (fixed effect or random effects model) whereas the other two papers focused on two major threats that compromise the validity of meta-analysis results, namely publication bias and missing outcome data. In this paper we provide guidance on how to undertake meta-analysis using Stata, one of the most commonly used software packages for meta-analysis. We address the three topics covered in the previous issues of the journal, focusing on their implementation in Stata using a working example from mental health research. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.

  1. TCGA Workflow: Analyze cancer genomics and epigenomics data using Bioconductor packages

    PubMed Central

    Bontempi, Gianluca; Ceccarelli, Michele; Noushmehr, Houtan

    2016-01-01

    Biotechnological advances in sequencing have led to an explosion of publicly available data via large international consortia such as The Cancer Genome Atlas (TCGA), The Encyclopedia of DNA Elements (ENCODE), and The NIH Roadmap Epigenomics Mapping Consortium (Roadmap). These projects have provided unprecedented opportunities to interrogate the epigenome of cultured cancer cell lines as well as normal and tumor tissues with high genomic resolution. The Bioconductor project offers more than 1,000 open-source software and statistical packages to analyze high-throughput genomic data. However, most packages are designed for specific data types (e.g. expression, epigenetics, genomics) and there is no one comprehensive tool that provides a complete integrative analysis of the resources and data provided by all three public projects. A need to create an integration of these different analyses was recently proposed. In this workflow, we provide a series of biologically focused integrative analyses of different molecular data. We describe how to download, process and prepare TCGA data and by harnessing several key Bioconductor packages, we describe how to extract biologically meaningful genomic and epigenomic data. Using Roadmap and ENCODE data, we provide a work plan to identify biologically relevant functional epigenomic elements associated with cancer. To illustrate our workflow, we analyzed two types of brain tumors: low-grade glioma (LGG) versus high-grade glioma (glioblastoma multiform or GBM). This workflow introduces the following Bioconductor packages: AnnotationHub, ChIPSeeker, ComplexHeatmap, pathview, ELMER, GAIA, MINET, RTCGAToolbox,  TCGAbiolinks. PMID:28232861

  2. TCGA Workflow: Analyze cancer genomics and epigenomics data using Bioconductor packages.

    PubMed

    Silva, Tiago C; Colaprico, Antonio; Olsen, Catharina; D'Angelo, Fulvio; Bontempi, Gianluca; Ceccarelli, Michele; Noushmehr, Houtan

    2016-01-01

    Biotechnological advances in sequencing have led to an explosion of publicly available data via large international consortia such as The Cancer Genome Atlas (TCGA), The Encyclopedia of DNA Elements (ENCODE), and The NIH Roadmap Epigenomics Mapping Consortium (Roadmap). These projects have provided unprecedented opportunities to interrogate the epigenome of cultured cancer cell lines as well as normal and tumor tissues with high genomic resolution. The Bioconductor project offers more than 1,000 open-source software and statistical packages to analyze high-throughput genomic data. However, most packages are designed for specific data types (e.g. expression, epigenetics, genomics) and there is no one comprehensive tool that provides a complete integrative analysis of the resources and data provided by all three public projects. A need to create an integration of these different analyses was recently proposed. In this workflow, we provide a series of biologically focused integrative analyses of different molecular data. We describe how to download, process and prepare TCGA data and by harnessing several key Bioconductor packages, we describe how to extract biologically meaningful genomic and epigenomic data. Using Roadmap and ENCODE data, we provide a work plan to identify biologically relevant functional epigenomic elements associated with cancer. To illustrate our workflow, we analyzed two types of brain tumors: low-grade glioma (LGG) versus high-grade glioma (glioblastoma multiform or GBM). This workflow introduces the following Bioconductor packages: AnnotationHub, ChIPSeeker, ComplexHeatmap, pathview, ELMER, GAIA, MINET, RTCGAToolbox,  TCGAbiolinks.

  3. smwrBase—An R package for managing hydrologic data, version 1.1.1

    USGS Publications Warehouse

    Lorenz, David L.

    2015-12-09

    This report describes an R package called smwrBase, which consists of a collection of functions to import, transform, manipulate, and manage hydrologic data within the R statistical environment. Functions in the package allow users to import surface-water and groundwater data from the U.S. Geological Survey’s National Water Information System database and other sources. Additional functions are provided to transform, manipulate, and manage hydrologic data in ways necessary for analyzing the data.

  4. Guidelines for the analysis of free energy calculations

    PubMed Central

    Klimovich, Pavel V.; Shirts, Michael R.; Mobley, David L.

    2015-01-01

    Free energy calculations based on molecular dynamics (MD) simulations show considerable promise for applications ranging from drug discovery to prediction of physical properties and structure-function studies. But these calculations are still difficult and tedious to analyze, and best practices for analysis are not well defined or propagated. Essentially, each group analyzing these calculations needs to decide how to conduct the analysis and, usually, develop its own analysis tools. Here, we review and recommend best practices for analysis yielding reliable free energies from molecular simulations. Additionally, we provide a Python tool, alchemical–analysis.py, freely available on GitHub at https://github.com/choderalab/pymbar–examples, that implements the analysis practices reviewed here for several reference simulation packages, which can be adapted to handle data from other packages. Both this review and the tool covers analysis of alchemical calculations generally, including free energy estimates via both thermodynamic integration and free energy perturbation-based estimators. Our Python tool also handles output from multiple types of free energy calculations, including expanded ensemble and Hamiltonian replica exchange, as well as standard fixed ensemble calculations. We also survey a range of statistical and graphical ways of assessing the quality of the data and free energy estimates, and provide prototypes of these in our tool. We hope these tools and discussion will serve as a foundation for more standardization of and agreement on best practices for analysis of free energy calculations. PMID:25808134

  5. Model-Based Linkage Analysis of a Quantitative Trait.

    PubMed

    Song, Yeunjoo E; Song, Sunah; Schnell, Audrey H

    2017-01-01

    Linkage Analysis is a family-based method of analysis to examine whether any typed genetic markers cosegregate with a given trait, in this case a quantitative trait. If linkage exists, this is taken as evidence in support of a genetic basis for the trait. Historically, linkage analysis was performed using a binary disease trait, but has been extended to include quantitative disease measures. Quantitative traits are desirable as they provide more information than binary traits. Linkage analysis can be performed using single-marker methods (one marker at a time) or multipoint (using multiple markers simultaneously). In model-based linkage analysis the genetic model for the trait of interest is specified. There are many software options for performing linkage analysis. Here, we use the program package Statistical Analysis for Genetic Epidemiology (S.A.G.E.). S.A.G.E. was chosen because it also includes programs to perform data cleaning procedures and to generate and test genetic models for a quantitative trait, in addition to performing linkage analysis. We demonstrate in detail the process of running the program LODLINK to perform single-marker analysis, and MLOD to perform multipoint analysis using output from SEGREG, where SEGREG was used to determine the best fitting statistical model for the trait.

  6. Teacher's Corner: Structural Equation Modeling with the Sem Package in R

    ERIC Educational Resources Information Center

    Fox, John

    2006-01-01

    R is free, open-source, cooperatively developed software that implements the S statistical programming language and computing environment. The current capabilities of R are extensive, and it is in wide use, especially among statisticians. The sem package provides basic structural equation modeling facilities in R, including the ability to fit…

  7. Meeting the needs of an ever-demanding market.

    PubMed

    Rigby, Richard

    2002-04-01

    Balancing cost and performance in packaging is critical. This article outlines techniques to assist in this whilst delivering added value and product differentiation. The techniques include a rigorous statistical process capable of delivering cost reduction and improved quality and a computer modelling process that can save time when validating new packaging options.

  8. Report of the 64th National Conference on Weights and Measures

    NASA Astrophysics Data System (ADS)

    Wollin, H. F.; Babeoq, L. E.; Heffernan, A. P.

    1980-03-01

    Major issues discussed at this conference include metric conversion in the United States, particularly the conversion of gasoline dispensers, problems relating to the quantity fill of packaged commodities especially as affected by moisture loss and statistical approach to package checking. Federal grain inspection, and a legal metrology control system are also discussed.

  9. CompGO: an R package for comparing and visualizing Gene Ontology enrichment differences between DNA binding experiments.

    PubMed

    Waardenberg, Ashley J; Basset, Samuel D; Bouveret, Romaric; Harvey, Richard P

    2015-09-02

    Gene ontology (GO) enrichment is commonly used for inferring biological meaning from systems biology experiments. However, determining differential GO and pathway enrichment between DNA-binding experiments or using the GO structure to classify experiments has received little attention. Herein, we present a bioinformatics tool, CompGO, for identifying Differentially Enriched Gene Ontologies, called DiEGOs, and pathways, through the use of a z-score derivation of log odds ratios, and visualizing these differences at GO and pathway level. Through public experimental data focused on the cardiac transcription factor NKX2-5, we illustrate the problems associated with comparing GO enrichments between experiments using a simple overlap approach. We have developed an R/Bioconductor package, CompGO, which implements a new statistic normally used in epidemiological studies for performing comparative GO analyses and visualizing comparisons from . BED data containing genomic coordinates as well as gene lists as inputs. We justify the statistic through inclusion of experimental data and compare to the commonly used overlap method. CompGO is freely available as a R/Bioconductor package enabling easy integration into existing pipelines and is available at: http://www.bioconductor.org/packages/release/bioc/html/CompGO.html packages/release/bioc/html/CompGO.html.

  10. LakeMetabolizer: An R package for estimating lake metabolism from free-water oxygen using diverse statistical models

    USGS Publications Warehouse

    Winslow, Luke; Zwart, Jacob A.; Batt, Ryan D.; Dugan, Hilary; Woolway, R. Iestyn; Corman, Jessica; Hanson, Paul C.; Read, Jordan S.

    2016-01-01

    Metabolism is a fundamental process in ecosystems that crosses multiple scales of organization from individual organisms to whole ecosystems. To improve sharing and reuse of published metabolism models, we developed LakeMetabolizer, an R package for estimating lake metabolism from in situ time series of dissolved oxygen, water temperature, and, optionally, additional environmental variables. LakeMetabolizer implements 5 different metabolism models with diverse statistical underpinnings: bookkeeping, ordinary least squares, maximum likelihood, Kalman filter, and Bayesian. Each of these 5 metabolism models can be combined with 1 of 7 models for computing the coefficient of gas exchange across the air–water interface (k). LakeMetabolizer also features a variety of supporting functions that compute conversions and implement calculations commonly applied to raw data prior to estimating metabolism (e.g., oxygen saturation and optical conversion models). These tools have been organized into an R package that contains example data, example use-cases, and function documentation. The release package version is available on the Comprehensive R Archive Network (CRAN), and the full open-source GPL-licensed code is freely available for examination and extension online. With this unified, open-source, and freely available package, we hope to improve access and facilitate the application of metabolism in studies and management of lentic ecosystems.

  11. The Application of a Statistical Analysis Software Package to Explosive Testing

    DTIC Science & Technology

    1993-12-01

    deviation not corrected for test interval. M refer to equation 2. s refer to equation 3. G refer to section 2.1, C 36 Appendix I : Program Structured ...APPENDIX I: Program Structured Diagrams 37 APPENDIX II: Bruceton Reference Graphs 39 APPENDIX III: Input and Output Data File Format 44 APPENDIX IV...directly from Graph II, which has been digitised and incorporated into the program . IfM falls below 0.3, the curve that is closest to diff( eq . 3a) is

  12. Characterizing Giant Exoplanets through Multiwavelength Transit Observations: HD 189733b

    NASA Astrophysics Data System (ADS)

    Kar, Aman; Cole, Jackson Lane; Gardner, Cristilyn N.; Garver, Bethany Ray; Jarka, Kyla L.; McGough, Aylin Marie; PeQueen, David Jeffrey; Rivera, Daniel Ivan; Kasper, David; Jang-Condell, Hannah; Kobulnicky, Henry; Dale, Daniel

    2018-01-01

    Observing the transits of exoplanets in multiple wavelengths enables the characterization of their atmospheres. We used the Wyoming Infrared Observatory to obtain high precision photometry on HD 189733b, one of the most studied exoplanets. We employed the photometry package AIJ and Bayesian statistics in our analysis. Preliminary results suggest a wavelength dependence in the size of the exoplanet, indicative of scattering in the atmosphere. This work is supported by the National Science Foundation under REU grant AST 1560461.

  13. The analysis using GENSTAT of anaemia, sugar intake and quetelet's index as prognostic indicators in women.

    PubMed

    Campbell, M J

    1983-01-01

    I describe methods of analysing possible aetiological factors in a follow-up survey, all of which are possible to carry out using the statistical package GENSTAT. A high haemoglobin level carried a significantly increased risk of ischaemic heart disease, and a low one an increased risk of cancer. Smoking was also an important factor. The increased risk was reasonably constant over time. Sugar intake and Quetelet's index did not significantly affect the relative risk.

  14. S.P.S.S. User's Manual #1-#4. Basic Program Construction in S.P.S.S.; S.P.S.S. Non-Procedural Statements and Procedural Commands; System Control Language and S.P.S.S.; Quick File Equate Statement Reference.

    ERIC Educational Resources Information Center

    Earl, Lorna L.

    This series of manuals describing and illustrating the Statistical Package for the Social Sciences (SPSS) was planned as a self-teaching instrument, beginning with the basics and progressing to an advanced level. Information on what the searcher must know to define the data and write a program for preliminary analysis is contained in manual 1,…

  15. An Integrated Nursing Management Information System: From Concept to Reality

    PubMed Central

    Pinkley, Connie L.; Sommer, Patricia K.

    1988-01-01

    This paper addresses the transition from the conceptualization of a Nursing Management Information System (NMIS) integrated and interdependent with the Hospital Information System (HIS) to its realization. Concepts of input, throughout, and output are presented to illustrate developmental strategies used to achieve nursing information products. Essential processing capabilities include: 1) ability to interact with multiple data sources; 2) database management, statistical, and graphics software packages; 3) online, batch and reporting; and 4) interactive data analysis. Challenges encountered in system construction are examined.

  16. Kinematics Simulation Analysis of Packaging Robot with Joint Clearance

    NASA Astrophysics Data System (ADS)

    Zhang, Y. W.; Meng, W. J.; Wang, L. Q.; Cui, G. H.

    2018-03-01

    Considering the influence of joint clearance on the motion error, repeated positioning accuracy and overall position of the machine, this paper presents simulation analysis of a packaging robot — 2 degrees of freedom(DOF) planar parallel robot based on the characteristics of high precision and fast speed of packaging equipment. The motion constraint equation of the mechanism is established, and the analysis and simulation of the motion error are carried out in the case of turning the revolute clearance. The simulation results show that the size of the joint clearance will affect the movement accuracy and packaging efficiency of the packaging robot. The analysis provides a reference point of view for the packaging equipment design and selection criteria and has a great significance on the packaging industry automation.

  17. Multivariate statistical analysis software technologies for astrophysical research involving large data bases

    NASA Technical Reports Server (NTRS)

    Djorgovski, S. George

    1994-01-01

    We developed a package to process and analyze the data from the digital version of the Second Palomar Sky Survey. This system, called SKICAT, incorporates the latest in machine learning and expert systems software technology, in order to classify the detected objects objectively and uniformly, and facilitate handling of the enormous data sets from digital sky surveys and other sources. The system provides a powerful, integrated environment for the manipulation and scientific investigation of catalogs from virtually any source. It serves three principal functions: image catalog construction, catalog management, and catalog analysis. Through use of the GID3* Decision Tree artificial induction software, SKICAT automates the process of classifying objects within CCD and digitized plate images. To exploit these catalogs, the system also provides tools to merge them into a large, complete database which may be easily queried and modified when new data or better methods of calibrating or classifying become available. The most innovative feature of SKICAT is the facility it provides to experiment with and apply the latest in machine learning technology to the tasks of catalog construction and analysis. SKICAT provides a unique environment for implementing these tools for any number of future scientific purposes. Initial scientific verification and performance tests have been made using galaxy counts and measurements of galaxy clustering from small subsets of the survey data, and a search for very high redshift quasars. All of the tests were successful, and produced new and interesting scientific results. Attachments to this report give detailed accounts of the technical aspects for multivariate statistical analysis of small and moderate-size data sets, called STATPROG. The package was tested extensively on a number of real scientific applications, and has produced real, published results.

  18. StatsDB: platform-agnostic storage and understanding of next generation sequencing run metrics

    PubMed Central

    Ramirez-Gonzalez, Ricardo H.; Leggett, Richard M.; Waite, Darren; Thanki, Anil; Drou, Nizar; Caccamo, Mario; Davey, Robert

    2014-01-01

    Modern sequencing platforms generate enormous quantities of data in ever-decreasing amounts of time. Additionally, techniques such as multiplex sequencing allow one run to contain hundreds of different samples. With such data comes a significant challenge to understand its quality and to understand how the quality and yield are changing across instruments and over time. As well as the desire to understand historical data, sequencing centres often have a duty to provide clear summaries of individual run performance to collaborators or customers. We present StatsDB, an open-source software package for storage and analysis of next generation sequencing run metrics. The system has been designed for incorporation into a primary analysis pipeline, either at the programmatic level or via integration into existing user interfaces. Statistics are stored in an SQL database and APIs provide the ability to store and access the data while abstracting the underlying database design. This abstraction allows simpler, wider querying across multiple fields than is possible by the manual steps and calculation required to dissect individual reports, e.g. ”provide metrics about nucleotide bias in libraries using adaptor barcode X, across all runs on sequencer A, within the last month”. The software is supplied with modules for storage of statistics from FastQC, a commonly used tool for analysis of sequence reads, but the open nature of the database schema means it can be easily adapted to other tools. Currently at The Genome Analysis Centre (TGAC), reports are accessed through our LIMS system or through a standalone GUI tool, but the API and supplied examples make it easy to develop custom reports and to interface with other packages. PMID:24627795

  19. Comparison of a non-stationary voxelation-corrected cluster-size test with TFCE for group-Level MRI inference.

    PubMed

    Li, Huanjie; Nickerson, Lisa D; Nichols, Thomas E; Gao, Jia-Hong

    2017-03-01

    Two powerful methods for statistical inference on MRI brain images have been proposed recently, a non-stationary voxelation-corrected cluster-size test (CST) based on random field theory and threshold-free cluster enhancement (TFCE) based on calculating the level of local support for a cluster, then using permutation testing for inference. Unlike other statistical approaches, these two methods do not rest on the assumptions of a uniform and high degree of spatial smoothness of the statistic image. Thus, they are strongly recommended for group-level fMRI analysis compared to other statistical methods. In this work, the non-stationary voxelation-corrected CST and TFCE methods for group-level analysis were evaluated for both stationary and non-stationary images under varying smoothness levels, degrees of freedom and signal to noise ratios. Our results suggest that, both methods provide adequate control for the number of voxel-wise statistical tests being performed during inference on fMRI data and they are both superior to current CSTs implemented in popular MRI data analysis software packages. However, TFCE is more sensitive and stable for group-level analysis of VBM data. Thus, the voxelation-corrected CST approach may confer some advantages by being computationally less demanding for fMRI data analysis than TFCE with permutation testing and by also being applicable for single-subject fMRI analyses, while the TFCE approach is advantageous for VBM data. Hum Brain Mapp 38:1269-1280, 2017. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.

  20. Is liver perfusion CT reproducible? A study on intra- and interobserver agreement of normal hepatic haemodynamic parameters obtained with two different software packages.

    PubMed

    Bretas, Elisa Almeida Sathler; Torres, Ulysses S; Torres, Lucas Rios; Bekhor, Daniel; Saito Filho, Celso Fernando; Racy, Douglas Jorge; Faggioni, Lorenzo; D'Ippolito, Giuseppe

    2017-10-01

    To evaluate the agreement between the measurements of perfusion CT parameters in normal livers by using two different software packages. This retrospective study was based on 78 liver perfusion CT examinations acquired for detecting suspected liver metastasis. Patients with any morphological or functional hepatic abnormalities were excluded. The final analysis included 37 patients (59.7 ± 14.9 y). Two readers (1 and 2) independently measured perfusion parameters using different software packages from two major manufacturers (A and B). Arterial perfusion (AP) and portal perfusion (PP) were determined using the dual-input vascular one-compartmental model. Inter-reader agreement for each package and intrareader agreement between both packages were assessed with intraclass correlation coefficients (ICC) and Bland-Altman statistics. Inter-reader agreement was substantial for AP using software A (ICC = 0.82) and B (ICC = 0.85-0.86), fair for PP using software A (ICC = 0.44) and fair to moderate for PP using software B (ICC = 0.56-0.77). Intrareader agreement between software A and B ranged from slight to moderate (ICC = 0.32-0.62) for readers 1 and 2 considering the AP parameters, and from fair to moderate (ICC = 0.40-0.69) for readers 1 and 2 considering the PP parameters. At best there was only moderate agreement between both software packages, resulting in some uncertainty and suboptimal reproducibility. Advances in knowledge: Software-dependent factors may contribute to variance in perfusion measurements, demanding further technical improvements. AP measurements seem to be the most reproducible parameter to be adopted when evaluating liver perfusion CT.

  1. Good analytical practice: statistics and handling data in biomedical science. A primer and directions for authors. Part 1: Introduction. Data within and between one or two sets of individuals.

    PubMed

    Blann, A D; Nation, B R

    2008-01-01

    The biomedical scientist is bombarded on a daily basis by information, almost all of which refers to the health status of an individual or groups of individuals. This review is the first of a two-part article written to explain some of the issues related to the presentation and analysis of data. The first part focuses on types of data and how to present and analyse data from an individual or from one or two groups of persons. The second part will examine data from three or more sets of persons, what methods are available to allow this analysis (i.e., statistical software packages), and will conclude with a statement on appropriate descriptors of data, their analyses, and presentation for authors considering submission of their data to this journal.

  2. HDX Workbench: Software for the Analysis of H/D Exchange MS Data

    NASA Astrophysics Data System (ADS)

    Pascal, Bruce D.; Willis, Scooter; Lauer, Janelle L.; Landgraf, Rachelle R.; West, Graham M.; Marciano, David; Novick, Scott; Goswami, Devrishi; Chalmers, Michael J.; Griffin, Patrick R.

    2012-09-01

    Hydrogen/deuterium exchange mass spectrometry (HDX-MS) is an established method for the interrogation of protein conformation and dynamics. While the data analysis challenge of HDX-MS has been addressed by a number of software packages, new computational tools are needed to keep pace with the improved methods and throughput of this technique. To address these needs, we report an integrated desktop program titled HDX Workbench, which facilitates automation, management, visualization, and statistical cross-comparison of large HDX data sets. Using the software, validated data analysis can be achieved at the rate of generation. The application is available at the project home page http://hdx.florida.scripps.edu.

  3. heatmaply: an R package for creating interactive cluster heatmaps for online publishing.

    PubMed

    Galili, Tal; O'Callaghan, Alan; Sidi, Jonathan; Sievert, Carson

    2018-05-01

    heatmaply is an R package for easily creating interactive cluster heatmaps that can be shared online as a stand-alone HTML file. Interactivity includes a tooltip display of values when hovering over cells, as well as the ability to zoom in to specific sections of the figure from the data matrix, the side dendrograms, or annotated labels. Thanks to the synergistic relationship between heatmaply and other R packages, the user is empowered by a refined control over the statistical and visual aspects of the heatmap layout. The heatmaply package is available under the GPL-2 Open Source license. It comes with a detailed vignette, and is freely available from: http://cran.r-project.org/package=heatmaply. tal.galili@math.tau.ac.il. Supplementary data are available at Bioinformatics online.

  4. Efficient and Flexible Climate Analysis with Python in a Cloud-Based Distributed Computing Framework

    NASA Astrophysics Data System (ADS)

    Gannon, C.

    2017-12-01

    As climate models become progressively more advanced, and spatial resolution further improved through various downscaling projects, climate projections at a local level are increasingly insightful and valuable. However, the raw size of climate datasets presents numerous hurdles for analysts wishing to develop customized climate risk metrics or perform site-specific statistical analysis. Four Twenty Seven, a climate risk consultancy, has implemented a Python-based distributed framework to analyze large climate datasets in the cloud. With the freedom afforded by efficiently processing these datasets, we are able to customize and continually develop new climate risk metrics using the most up-to-date data. Here we outline our process for using Python packages such as XArray and Dask to evaluate netCDF files in a distributed framework, StarCluster to operate in a cluster-computing environment, cloud computing services to access publicly hosted datasets, and how this setup is particularly valuable for generating climate change indicators and performing localized statistical analysis.

  5. Pathway-GPS and SIGORA: identifying relevant pathways based on the over-representation of their gene-pair signatures

    PubMed Central

    Foroushani, Amir B.K.; Brinkman, Fiona S.L.

    2013-01-01

    Motivation. Predominant pathway analysis approaches treat pathways as collections of individual genes and consider all pathway members as equally informative. As a result, at times spurious and misleading pathways are inappropriately identified as statistically significant, solely due to components that they share with the more relevant pathways. Results. We introduce the concept of Pathway Gene-Pair Signatures (Pathway-GPS) as pairs of genes that, as a combination, are specific to a single pathway. We devised and implemented a novel approach to pathway analysis, Signature Over-representation Analysis (SIGORA), which focuses on the statistically significant enrichment of Pathway-GPS in a user-specified gene list of interest. In a comparative evaluation of several published datasets, SIGORA outperformed traditional methods by delivering biologically more plausible and relevant results. Availability. An efficient implementation of SIGORA, as an R package with precompiled GPS data for several human and mouse pathway repositories is available for download from http://sigora.googlecode.com/svn/. PMID:24432194

  6. ProFound: Source Extraction and Application to Modern Survey Data

    NASA Astrophysics Data System (ADS)

    Robotham, A. S. G.; Davies, L. J. M.; Driver, S. P.; Koushan, S.; Taranu, D. S.; Casura, S.; Liske, J.

    2018-05-01

    We introduce PROFOUND, a source finding and image analysis package. PROFOUND provides methods to detect sources in noisy images, generate segmentation maps identifying the pixels belonging to each source, and measure statistics like flux, size, and ellipticity. These inputs are key requirements of PROFIT, our recently released galaxy profiling package, where the design aim is that these two software packages will be used in unison to semi-automatically profile large samples of galaxies. The key novel feature introduced in PROFOUND is that all photometry is executed on dilated segmentation maps that fully contain the identifiable flux, rather than using more traditional circular or ellipse-based photometry. Also, to be less sensitive to pathological segmentation issues, the de-blending is made across saddle points in flux. We apply PROFOUND in a number of simulated and real-world cases, and demonstrate that it behaves reasonably given its stated design goals. In particular, it offers good initial parameter estimation for PROFIT, and also segmentation maps that follow the sometimes complex geometry of resolved sources, whilst capturing nearly all of the flux. A number of bulge-disc decomposition projects are already making use of the PROFOUND and PROFIT pipeline, and adoption is being encouraged by publicly releasing the software for the open source R data analysis platform under an LGPL-3 license on GitHub (github.com/asgr/ProFound).

  7. From reads to regions: a Bioconductor workflow to detect differential binding in ChIP-seq data

    PubMed Central

    Lun, Aaron T. L.; Smyth, Gordon K.

    2016-01-01

    Chromatin immunoprecipitation with massively parallel sequencing (ChIP-seq) is widely used to identify the genomic binding sites for protein of interest. Most conventional approaches to ChIP-seq data analysis involve the detection of the absolute presence (or absence) of a binding site. However, an alternative strategy is to identify changes in the binding intensity between two biological conditions, i.e., differential binding (DB). This may yield more relevant results than conventional analyses, as changes in binding can be associated with the biological difference being investigated. The aim of this article is to facilitate the implementation of DB analyses, by comprehensively describing a computational workflow for the detection of DB regions from ChIP-seq data. The workflow is based primarily on R software packages from the open-source Bioconductor project and covers all steps of the analysis pipeline, from alignment of read sequences to interpretation and visualization of putative DB regions. In particular, detection of DB regions will be conducted using the counts for sliding windows from the csaw package, with statistical modelling performed using methods in the edgeR package. Analyses will be demonstrated on real histone mark and transcription factor data sets. This will provide readers with practical usage examples that can be applied in their own studies. PMID:26834993

  8. Beneficial effects of polyethylene packages containing micrometer-sized silver particles on the quality and shelf life of dried barberry (Berberis vulgaris).

    PubMed

    Motlagh, N Valipoor; Mosavian, M T Hamed; Mortazavi, S A; Tamizi, A

    2012-01-01

    In this research, the effects of low-density polyethylene (LDPE) packages containing micrometer-sized silver particles (LDPE-Ag) on microbial and sensory factors of dried barberry were investigated in comparison with the pure LDPE packages. LDPE-Ag packages with 1% and 2% concentrations of silver particles statistically caused a decrease in the microbial growth of barberry, especially in the case of mold and total bacteria count, compared with the pure LDPE packages. The taste, aroma, appearance, and total acceptance were evaluated by trained panelists using the 9-point hedonic scale. This test showed improvement of all these factors in the samples related to packages containing 1% and 2% concentrations of silver particles in comparison with other samples. Low-density polyethylene package containing micrometer-sized silver particles had beneficial effects on the sensory and microbial quality of barberry when compared with normal packing material. © 2011 Institute of Food Technologists®

  9. MetaGenyo: a web tool for meta-analysis of genetic association studies.

    PubMed

    Martorell-Marugan, Jordi; Toro-Dominguez, Daniel; Alarcon-Riquelme, Marta E; Carmona-Saez, Pedro

    2017-12-16

    Genetic association studies (GAS) aims to evaluate the association between genetic variants and phenotypes. In the last few years, the number of this type of study has increased exponentially, but the results are not always reproducible due to experimental designs, low sample sizes and other methodological errors. In this field, meta-analysis techniques are becoming very popular tools to combine results across studies to increase statistical power and to resolve discrepancies in genetic association studies. A meta-analysis summarizes research findings, increases statistical power and enables the identification of genuine associations between genotypes and phenotypes. Meta-analysis techniques are increasingly used in GAS, but it is also increasing the amount of published meta-analysis containing different errors. Although there are several software packages that implement meta-analysis, none of them are specifically designed for genetic association studies and in most cases their use requires advanced programming or scripting expertise. We have developed MetaGenyo, a web tool for meta-analysis in GAS. MetaGenyo implements a complete and comprehensive workflow that can be executed in an easy-to-use environment without programming knowledge. MetaGenyo has been developed to guide users through the main steps of a GAS meta-analysis, covering Hardy-Weinberg test, statistical association for different genetic models, analysis of heterogeneity, testing for publication bias, subgroup analysis and robustness testing of the results. MetaGenyo is a useful tool to conduct comprehensive genetic association meta-analysis. The application is freely available at http://bioinfo.genyo.es/metagenyo/ .

  10. Image analysis tools and emerging algorithms for expression proteomics

    PubMed Central

    English, Jane A.; Lisacek, Frederique; Morris, Jeffrey S.; Yang, Guang-Zhong; Dunn, Michael J.

    2012-01-01

    Since their origins in academic endeavours in the 1970s, computational analysis tools have matured into a number of established commercial packages that underpin research in expression proteomics. In this paper we describe the image analysis pipeline for the established 2-D Gel Electrophoresis (2-DE) technique of protein separation, and by first covering signal analysis for Mass Spectrometry (MS), we also explain the current image analysis workflow for the emerging high-throughput ‘shotgun’ proteomics platform of Liquid Chromatography coupled to MS (LC/MS). The bioinformatics challenges for both methods are illustrated and compared, whilst existing commercial and academic packages and their workflows are described from both a user’s and a technical perspective. Attention is given to the importance of sound statistical treatment of the resultant quantifications in the search for differential expression. Despite wide availability of proteomics software, a number of challenges have yet to be overcome regarding algorithm accuracy, objectivity and automation, generally due to deterministic spot-centric approaches that discard information early in the pipeline, propagating errors. We review recent advances in signal and image analysis algorithms in 2-DE, MS, LC/MS and Imaging MS. Particular attention is given to wavelet techniques, automated image-based alignment and differential analysis in 2-DE, Bayesian peak mixture models and functional mixed modelling in MS, and group-wise consensus alignment methods for LC/MS. PMID:21046614

  11. The Statistical Consulting Center for Astronomy (SCCA)

    NASA Technical Reports Server (NTRS)

    Akritas, Michael

    2001-01-01

    The process by which raw astronomical data acquisition is transformed into scientifically meaningful results and interpretation typically involves many statistical steps. Traditional astronomy limits itself to a narrow range of old and familiar statistical methods: means and standard deviations; least-squares methods like chi(sup 2) minimization; and simple nonparametric procedures such as the Kolmogorov-Smirnov tests. These tools are often inadequate for the complex problems and datasets under investigations, and recent years have witnessed an increased usage of maximum-likelihood, survival analysis, multivariate analysis, wavelet and advanced time-series methods. The Statistical Consulting Center for Astronomy (SCCA) assisted astronomers with the use of sophisticated tools, and to match these tools with specific problems. The SCCA operated with two professors of statistics and a professor of astronomy working together. Questions were received by e-mail, and were discussed in detail with the questioner. Summaries of those questions and answers leading to new approaches were posted on the Web (www.state.psu.edu/ mga/SCCA). In addition to serving individual astronomers, the SCCA established a Web site for general use that provides hypertext links to selected on-line public-domain statistical software and services. The StatCodes site (www.astro.psu.edu/statcodes) provides over 200 links in the areas of: Bayesian statistics; censored and truncated data; correlation and regression, density estimation and smoothing, general statistics packages and information; image analysis; interactive Web tools; multivariate analysis; multivariate clustering and classification; nonparametric analysis; software written by astronomers; spatial statistics; statistical distributions; time series analysis; and visualization tools. StatCodes has received a remarkable high and constant hit rate of 250 hits/week (over 10,000/year) since its inception in mid-1997. It is of interest to scientists both within and outside of astronomy. The most popular sections are multivariate techniques, image analysis, and time series analysis. Hundreds of copies of the ASURV, SLOPES and CENS-TAU codes developed by SCCA scientists were also downloaded from the StatCodes site. In addition to formal SCCA duties, SCCA scientists continued a variety of related activities in astrostatistics, including refereeing of statistically oriented papers submitted to the Astrophysical Journal, talks in meetings including Feigelson's talk to science journalists entitled "The reemergence of astrostatistics" at the American Association for the Advancement of Science meeting, and published papers of astrostatistical content.

  12. Identifying biologically relevant differences between metagenomic communities.

    PubMed

    Parks, Donovan H; Beiko, Robert G

    2010-03-15

    Metagenomics is the study of genetic material recovered directly from environmental samples. Taxonomic and functional differences between metagenomic samples can highlight the influence of ecological factors on patterns of microbial life in a wide range of habitats. Statistical hypothesis tests can help us distinguish ecological influences from sampling artifacts, but knowledge of only the P-value from a statistical hypothesis test is insufficient to make inferences about biological relevance. Current reporting practices for pairwise comparative metagenomics are inadequate, and better tools are needed for comparative metagenomic analysis. We have developed a new software package, STAMP, for comparative metagenomics that supports best practices in analysis and reporting. Examination of a pair of iron mine metagenomes demonstrates that deeper biological insights can be gained using statistical techniques available in our software. An analysis of the functional potential of 'Candidatus Accumulibacter phosphatis' in two enhanced biological phosphorus removal metagenomes identified several subsystems that differ between the A.phosphatis stains in these related communities, including phosphate metabolism, secretion and metal transport. Python source code and binaries are freely available from our website at http://kiwi.cs.dal.ca/Software/STAMP CONTACT: beiko@cs.dal.ca Supplementary data are available at Bioinformatics online.

  13. Facilitating the Transition from Bright to Dim Environments

    DTIC Science & Technology

    2016-03-04

    For the parametric data, a multivariate ANOVA was used in determining the systematic presence of any statistically significant performance differences...performed. All significance levels were p < 0.05, and statistical analyses were performed with the Statistical Package for Social Sciences ( SPSS ...1950. Age changes in rate and level of visual dark adaptation. Journal of Applied Physiology, 2, 407–411. Field, A. 2009. Discovering statistics

  14. Kinematic analysis of total knee prosthesis designed for Asian population.

    PubMed

    Low, F H; Khoo, L P; Chua, C K; Lo, N N

    2000-01-01

    In designing a total knee replacement (TKR) prosthesis catering for the Asian population, 62 sets of femur were harvested and analyzed. The morphometrical data obtained were found to be in good agreement with dimensions typical of the Asian knee and has reaffirmed the fact that Caucasian knees are generally larger than Asian knees. Subsequently, these data when treated using a multivariate statistical technique resulted in the establishment of major design parameters for six different sizes of femoral implants. An extra-small implant size with established dimensions and geometrical shape has surfaced from the study. The differences between the Asian knees and the Caucasian knees are discussed. Employing the established femoral dimensions and motion path of the knee joint, the articulating tibia profile was generated. All the sizes of implants were modeled using a computer-aided software package. Thereupon, these models that accurately fits the local Asian knee were transported into a dynamic and kinematic analysis software package. The tibiofemoral joint was modeled successfully as a slide curve joint to study intuitively the motion of the femur when articulating on the tibia surface. An optimal tibia profile could be synthesized to mimic the natural knee path motion. Details of the analysis are presented and discussed.

  15. Desensitized Optimal Filtering and Sensor Fusion Toolkit

    NASA Technical Reports Server (NTRS)

    Karlgaard, Christopher D.

    2015-01-01

    Analytical Mechanics Associates, Inc., has developed a software toolkit that filters and processes navigational data from multiple sensor sources. A key component of the toolkit is a trajectory optimization technique that reduces the sensitivity of Kalman filters with respect to model parameter uncertainties. The sensor fusion toolkit also integrates recent advances in adaptive Kalman and sigma-point filters for non-Gaussian problems with error statistics. This Phase II effort provides new filtering and sensor fusion techniques in a convenient package that can be used as a stand-alone application for ground support and/or onboard use. Its modular architecture enables ready integration with existing tools. A suite of sensor models and noise distribution as well as Monte Carlo analysis capability are included to enable statistical performance evaluations.

  16. Fast emulation of track reconstruction in the CMS simulation

    NASA Astrophysics Data System (ADS)

    Komm, Matthias; CMS Collaboration

    2017-10-01

    Simulated samples of various physics processes are a key ingredient within analyses to unlock the physics behind LHC collision data. Samples with more and more statistics are required to keep up with the increasing amounts of recorded data. During sample generation, significant computing time is spent on the reconstruction of charged particle tracks from energy deposits which additionally scales with the pileup conditions. In CMS, the FastSimulation package is developed for providing a fast alternative to the standard simulation and reconstruction workflow. It employs various techniques to emulate track reconstruction effects in particle collision events. Several analysis groups in CMS are utilizing the package, in particular those requiring many samples to scan the parameter space of physics models (e.g. SUSY) or for the purpose of estimating systematic uncertainties. The strategies for and recent developments in this emulation are presented, including a novel, flexible implementation of tracking emulation while retaining a sufficient, tuneable accuracy.

  17. A statistical method for measuring activation of gene regulatory networks.

    PubMed

    Esteves, Gustavo H; Reis, Luiz F L

    2018-06-13

    Gene expression data analysis is of great importance for modern molecular biology, given our ability to measure the expression profiles of thousands of genes and enabling studies rooted in systems biology. In this work, we propose a simple statistical model for the activation measuring of gene regulatory networks, instead of the traditional gene co-expression networks. We present the mathematical construction of a statistical procedure for testing hypothesis regarding gene regulatory network activation. The real probability distribution for the test statistic is evaluated by a permutation based study. To illustrate the functionality of the proposed methodology, we also present a simple example based on a small hypothetical network and the activation measuring of two KEGG networks, both based on gene expression data collected from gastric and esophageal samples. The two KEGG networks were also analyzed for a public database, available through NCBI-GEO, presented as Supplementary Material. This method was implemented in an R package that is available at the BioConductor project website under the name maigesPack.

  18. XAPiir: A recursive digital filtering package

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Harris, D.

    1990-09-21

    XAPiir is a basic recursive digital filtering package, containing both design and implementation subroutines. XAPiir was developed for the experimental array processor (XAP) software package, and is written in FORTRAN. However, it is intended to be incorporated into any general- or special-purpose signal analysis program. It replaces the older package RECFIL, offering several enhancements. RECFIL is used in several large analysis programs developed at LLNL, including the seismic analysis package SAC, several expert systems (NORSEA and NETSEA), and two general purpose signal analysis packages (SIG and VIEW). This report is divided into two sections: the first describes the use ofmore » the subroutine package, and the second, its internal organization. In the first section, the filter design problem is briefly reviewed, along with the definitions of the filter design parameters and their relationship to the subroutine input parameters. In the second section, the internal organization is documented to simplify maintenance and extensions to the package. 5 refs., 9 figs.« less

  19. Statistical approach for selection of biologically informative genes.

    PubMed

    Das, Samarendra; Rai, Anil; Mishra, D C; Rai, Shesh N

    2018-05-20

    Selection of informative genes from high dimensional gene expression data has emerged as an important research area in genomics. Many gene selection techniques have been proposed so far are either based on relevancy or redundancy measure. Further, the performance of these techniques has been adjudged through post selection classification accuracy computed through a classifier using the selected genes. This performance metric may be statistically sound but may not be biologically relevant. A statistical approach, i.e. Boot-MRMR, was proposed based on a composite measure of maximum relevance and minimum redundancy, which is both statistically sound and biologically relevant for informative gene selection. For comparative evaluation of the proposed approach, we developed two biological sufficient criteria, i.e. Gene Set Enrichment with QTL (GSEQ) and biological similarity score based on Gene Ontology (GO). Further, a systematic and rigorous evaluation of the proposed technique with 12 existing gene selection techniques was carried out using five gene expression datasets. This evaluation was based on a broad spectrum of statistically sound (e.g. subject classification) and biological relevant (based on QTL and GO) criteria under a multiple criteria decision-making framework. The performance analysis showed that the proposed technique selects informative genes which are more biologically relevant. The proposed technique is also found to be quite competitive with the existing techniques with respect to subject classification and computational time. Our results also showed that under the multiple criteria decision-making setup, the proposed technique is best for informative gene selection over the available alternatives. Based on the proposed approach, an R Package, i.e. BootMRMR has been developed and available at https://cran.r-project.org/web/packages/BootMRMR. This study will provide a practical guide to select statistical techniques for selecting informative genes from high dimensional expression data for breeding and system biology studies. Published by Elsevier B.V.

  20. SPARSKIT: A basic tool kit for sparse matrix computations

    NASA Technical Reports Server (NTRS)

    Saad, Youcef

    1990-01-01

    Presented here are the main features of a tool package for manipulating and working with sparse matrices. One of the goals of the package is to provide basic tools to facilitate the exchange of software and data between researchers in sparse matrix computations. The starting point is the Harwell/Boeing collection of matrices for which the authors provide a number of tools. Among other things, the package provides programs for converting data structures, printing simple statistics on a matrix, plotting a matrix profile, and performing linear algebra operations with sparse matrices.

  1. Geostatistics and GIS: tools for characterizing environmental contamination.

    PubMed

    Henshaw, Shannon L; Curriero, Frank C; Shields, Timothy M; Glass, Gregory E; Strickland, Paul T; Breysse, Patrick N

    2004-08-01

    Geostatistics is a set of statistical techniques used in the analysis of georeferenced data that can be applied to environmental contamination and remediation studies. In this study, the 1,1-dichloro-2,2-bis(p-chlorophenyl)ethylene (DDE) contamination at a Superfund site in western Maryland is evaluated. Concern about the site and its future clean up has triggered interest within the community because residential development surrounds the area. Spatial statistical methods, of which geostatistics is a subset, are becoming increasingly popular, in part due to the availability of geographic information system (GIS) software in a variety of application packages. In this article, the joint use of ArcGIS software and the R statistical computing environment are demonstrated as an approach for comprehensive geostatistical analyses. The spatial regression method, kriging, is used to provide predictions of DDE levels at unsampled locations both within the site and the surrounding areas where residential development is ongoing.

  2. A SAS macro for testing differences among three or more independent groups using Kruskal-Wallis and Nemenyi tests.

    PubMed

    Liu, Yuewei; Chen, Weihong

    2012-02-01

    As a nonparametric method, the Kruskal-Wallis test is widely used to compare three or more independent groups when an ordinal or interval level of data is available, especially when the assumptions of analysis of variance (ANOVA) are not met. If the Kruskal-Wallis statistic is statistically significant, Nemenyi test is an alternative method for further pairwise multiple comparisons to locate the source of significance. Unfortunately, most popular statistical packages do not integrate the Nemenyi test, which is not easy to be calculated by hand. We described the theory and applications of the Kruskal-Wallis and Nemenyi tests, and presented a flexible SAS macro to implement the two tests. The SAS macro was demonstrated by two examples from our cohort study in occupational epidemiology. It provides a useful tool for SAS users to test the differences among three or more independent groups using a nonparametric method.

  3. beachmat: A Bioconductor C++ API for accessing high-throughput biological data from a variety of R matrix types

    PubMed Central

    Pagès, Hervé

    2018-01-01

    Biological experiments involving genomics or other high-throughput assays typically yield a data matrix that can be explored and analyzed using the R programming language with packages from the Bioconductor project. Improvements in the throughput of these assays have resulted in an explosion of data even from routine experiments, which poses a challenge to the existing computational infrastructure for statistical data analysis. For example, single-cell RNA sequencing (scRNA-seq) experiments frequently generate large matrices containing expression values for each gene in each cell, requiring sparse or file-backed representations for memory-efficient manipulation in R. These alternative representations are not easily compatible with high-performance C++ code used for computationally intensive tasks in existing R/Bioconductor packages. Here, we describe a C++ interface named beachmat, which enables agnostic data access from various matrix representations. This allows package developers to write efficient C++ code that is interoperable with dense, sparse and file-backed matrices, amongst others. We evaluated the performance of beachmat for accessing data from each matrix representation using both simulated and real scRNA-seq data, and defined a clear memory/speed trade-off to motivate the choice of an appropriate representation. We also demonstrate how beachmat can be incorporated into the code of other packages to drive analyses of a very large scRNA-seq data set. PMID:29723188

  4. An Algorithm and R Program for Fitting and Simulation of Pharmacokinetic and Pharmacodynamic Data.

    PubMed

    Li, Jijie; Yan, Kewei; Hou, Lisha; Du, Xudong; Zhu, Ping; Zheng, Li; Zhu, Cairong

    2017-06-01

    Pharmacokinetic/pharmacodynamic link models are widely used in dose-finding studies. By applying such models, the results of initial pharmacokinetic/pharmacodynamic studies can be used to predict the potential therapeutic dose range. This knowledge can improve the design of later comparative large-scale clinical trials by reducing the number of participants and saving time and resources. However, the modeling process can be challenging, time consuming, and costly, even when using cutting-edge, powerful pharmacological software. Here, we provide a freely available R program for expediently analyzing pharmacokinetic/pharmacodynamic data, including data importation, parameter estimation, simulation, and model diagnostics. First, we explain the theory related to the establishment of the pharmacokinetic/pharmacodynamic link model. Subsequently, we present the algorithms used for parameter estimation and potential therapeutic dose computation. The implementation of the R program is illustrated by a clinical example. The software package is then validated by comparing the model parameters and the goodness-of-fit statistics generated by our R package with those generated by the widely used pharmacological software WinNonlin. The pharmacokinetic and pharmacodynamic parameters as well as the potential recommended therapeutic dose can be acquired with the R package. The validation process shows that the parameters estimated using our package are satisfactory. The R program developed and presented here provides pharmacokinetic researchers with a simple and easy-to-access tool for pharmacokinetic/pharmacodynamic analysis on personal computers.

  5. beachmat: A Bioconductor C++ API for accessing high-throughput biological data from a variety of R matrix types.

    PubMed

    Lun, Aaron T L; Pagès, Hervé; Smith, Mike L

    2018-05-01

    Biological experiments involving genomics or other high-throughput assays typically yield a data matrix that can be explored and analyzed using the R programming language with packages from the Bioconductor project. Improvements in the throughput of these assays have resulted in an explosion of data even from routine experiments, which poses a challenge to the existing computational infrastructure for statistical data analysis. For example, single-cell RNA sequencing (scRNA-seq) experiments frequently generate large matrices containing expression values for each gene in each cell, requiring sparse or file-backed representations for memory-efficient manipulation in R. These alternative representations are not easily compatible with high-performance C++ code used for computationally intensive tasks in existing R/Bioconductor packages. Here, we describe a C++ interface named beachmat, which enables agnostic data access from various matrix representations. This allows package developers to write efficient C++ code that is interoperable with dense, sparse and file-backed matrices, amongst others. We evaluated the performance of beachmat for accessing data from each matrix representation using both simulated and real scRNA-seq data, and defined a clear memory/speed trade-off to motivate the choice of an appropriate representation. We also demonstrate how beachmat can be incorporated into the code of other packages to drive analyses of a very large scRNA-seq data set.

  6. pyLIMA : an open source microlensing software

    NASA Astrophysics Data System (ADS)

    Bachelet, Etienne

    2017-01-01

    Planetary microlensing is a unique tool to detect cold planets around low-mass stars which is approaching a watershed in discoveries as near-future missions incorporate dedicated surveys. NASA and ESA have decided to complement WFIRST-AFTA and Euclid with microlensing programs to enrich our statistics about this planetary population. Of the nany challenges in- herent in these missions, the data analysis is of primary importance, yet is often perceived as time consuming, complex and daunting barrier to participation in the field. We present the first open source modeling software to conduct a microlensing analysis. This software is written in Python and use as much as possible existing packages.

  7. SPSS and SAS programs for determining the number of components using parallel analysis and velicer's MAP test.

    PubMed

    O'Connor, B P

    2000-08-01

    Popular statistical software packages do not have the proper procedures for determining the number of components in factor and principal components analyses. Parallel analysis and Velicer's minimum average partial (MAP) test are validated procedures, recommended widely by statisticians. However, many researchers continue to use alternative, simpler, but flawed procedures, such as the eigenvalues-greater-than-one rule. Use of the proper procedures might be increased if these procedures could be conducted within familiar software environments. This paper describes brief and efficient programs for using SPSS and SAS to conduct parallel analyses and the MAP test.

  8. USGS library for S-PLUS for Windows -- Release 4.0

    USGS Publications Warehouse

    Lorenz, David L.; Ahearn, Elizabeth A.; Carter, Janet M.; Cohn, Timothy A.; Danchuk, Wendy J.; Frey, Jeffrey W.; Helsel, Dennis R.; Lee, Kathy E.; Leeth, David C.; Martin, Jeffrey D.; McGuire, Virginia L.; Neitzert, Kathleen M.; Robertson, Dale M.; Slack, James R.; Starn, J. Jeffrey; Vecchia, Aldo V.; Wilkison, Donald H.; Williamson, Joyce E.

    2011-01-01

    Release 4.0 of the U.S. Geological Survey S-PLUS library supercedes release 2.1. It comprises functions, dialogs, and datasets used in the U.S. Geological Survey for the analysis of water-resources data. This version does not contain ESTREND, which was in version 2.1. See Release 2.1 for information and access to that version. This library requires Release 8.1 or later of S-PLUS for Windows. S-PLUS is a commercial statistical and graphical analysis software package produced by TIBCO corporation(http://www.tibco.com/). The USGS library is not supported by TIBCO or its technical support staff.

  9. Effects of perceived parental attitudes on children's views of smoking.

    PubMed

    Ozturk, Candan; Kahraman, Seniha; Bektas, Murat

    2013-01-01

    The aim of this study was to examine the effects of perceived parental attitudes on children's discernment of cigarettes. The study sample consisted of 250 children attending grades 6, 7 and 8. Data were collected via a socio-demographic survey questionnaire, the Parental Attitude Scale (PAS) and the Decisional Balance Scale (DBS). Data analysis covered percentages, medians, one-way analysis of variance (ANOVA) and post-hoc tests using a statistical package. There were 250 participants; 117 were male, 133 were female. The mean age was 13.1 ± 0.98 for the females and 13.3 ± 0.88 for the males. A statistically significant difference was found in the children's mean scores for 'pros' subscale on the Decisional Balance Scale (DBS) according to perceived parental attitudes (F=3.172, p=0.025). There were no statistically significant differences in the DBS 'cons' subscale scores by perceived parental attitudes. It was determined that while perceived parental attitudes affect children's views on advantages of smoking, they have no effect on children's views on its disadvantages.

  10. PEPA test: fast and powerful differential analysis from relative quantitative proteomics data using shared peptides.

    PubMed

    Jacob, Laurent; Combes, Florence; Burger, Thomas

    2018-06-18

    We propose a new hypothesis test for the differential abundance of proteins in mass-spectrometry based relative quantification. An important feature of this type of high-throughput analyses is that it involves an enzymatic digestion of the sample proteins into peptides prior to identification and quantification. Due to numerous homology sequences, different proteins can lead to peptides with identical amino acid chains, so that their parent protein is ambiguous. These so-called shared peptides make the protein-level statistical analysis a challenge and are often not accounted for. In this article, we use a linear model describing peptide-protein relationships to build a likelihood ratio test of differential abundance for proteins. We show that the likelihood ratio statistic can be computed in linear time with the number of peptides. We also provide the asymptotic null distribution of a regularized version of our statistic. Experiments on both real and simulated datasets show that our procedures outperforms state-of-the-art methods. The procedures are available via the pepa.test function of the DAPAR Bioconductor R package.

  11. A Database of Herbaceous Vegetation Responses to Elevated Atmospheric CO2 (NDP-073)

    DOE Data Explorer

    Jones, Michael H [The Ohio State Univ., Columbus, OH (United States); Curtis, Peter S [The Ohio State Univ., Columbus, OH (United States); Cushman, Robert M [Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States); Brenkert, Antoinette L [Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)

    1999-01-01

    To perform a statistically rigorous meta-analysis of research results on the response by herbaceous vegetation to increased atmospheric CO2 levels, a multiparameter database of responses was compiled from the published literature. Seventy-eight independent CO2-enrichment studies, covering 53 species and 26 response parameters, reported mean response, sample size, and variance of the response (either as standard deviation or standard error). An additional 43 studies, covering 25 species and 6 response parameters, did not report variances. This numeric data package accompanies the Carbon Dioxide Information Analysis Center's (CDIAC's) NDP- 072, which provides similar information for woody vegetation. This numeric data package contains a 30-field data set of CO2- exposure experiment responses by herbaceous plants (as both a flat ASCII file and a spreadsheet file), files listing the references to the CO2-exposure experiments and specific comments relevant to the data in the data sets, and this documentation file (which includes SAS and Fortran codes to read the ASCII data file; SAS is a registered trademark of the SAS Institute, Inc., Cary, North Carolina 27511).

  12. GAC: Gene Associations with Clinical, a web based application

    PubMed Central

    Zhang, Xinyan; Rupji, Manali; Kowalski, Jeanne

    2018-01-01

    We present GAC, a shiny R based tool for interactive visualization of clinical associations based on high-dimensional data. The tool provides a web-based suite to perform supervised principal component analysis (SuperPC), an approach that uses both high-dimensional data, such as gene expression, combined with clinical data to infer clinical associations. We extended the approach to address binary outcomes, in addition to continuous and time-to-event data in our package, thereby increasing the use and flexibility of SuperPC.  Additionally, the tool provides an interactive visualization for summarizing results based on a forest plot for both binary and time-to-event data.  In summary, the GAC suite of tools provide a one stop shop for conducting statistical analysis to identify and visualize the association between a clinical outcome of interest and high-dimensional data types, such as genomic data. Our GAC package has been implemented in R and is available via http://shinygispa.winship.emory.edu/GAC/. The developmental repository is available at https://github.com/manalirupji/GAC. PMID:29263780

  13. [Is there life beyond SPSS? Discover R].

    PubMed

    Elosua Oliden, Paula

    2009-11-01

    R is a GNU statistical and programming environment with very high graphical capabilities. It is very powerful for research purposes, but it is also an exceptional tool for teaching. R is composed of more than 1400 packages that allow using it for simple statistics and applying the most complex and most recent formal models. Using graphical interfaces like the Rcommander package, permits working in user-friendly environments which are similar to the graphical environment used by SPSS. This last characteristic allows non-statisticians to overcome the obstacle of accessibility, and it makes R the best tool for teaching. Is there anything better? Open, free, affordable, accessible and always on the cutting edge.

  14. Explorations in Statistics: Standard Deviations and Standard Errors

    ERIC Educational Resources Information Center

    Curran-Everett, Douglas

    2008-01-01

    Learning about statistics is a lot like learning about science: the learning is more meaningful if you can actively explore. This series in "Advances in Physiology Education" provides an opportunity to do just that: we will investigate basic concepts in statistics using the free software package R. Because this series uses R solely as a vehicle…

  15. JP-8+100: The development of high-thermal-stability jet fuel

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Heneghan, S.P.; Zabarnick, S.; Ballal, D.R.

    1996-09-01

    Jet fuel requirements have evolved over the years as a balance of the demands placed by advanced aircraft performance (technological need), fuel cost (economic factors), and fuel availability (strategic factors). In a modern aircraft, the jet fuel not only provides the propulsive energy for flight, but also is the primary coolant for aircraft and engine subsystems. To meet the evolving challenge of improving the cooling potential of jet fuel while maintaining the current availability at a minimal price increase, the US Air Force, industry, and academia have teamed to develop an additive package for JP-8 fuels. This paper describes themore » development of an additive package for JP-8, to produce JP-8+100. This new fuel offers a 55 C increase in the bulk maximum temperature (from 325 F to 425 F) and improves the heat sink capability by 50%. Major advances made during the development JP-8 + 100 fuel include the development of several new quantitative fuel analysis tests, a free radical theory of autooxidation, adaptation of new chemistry models to computational fluid dynamics programs, and a nonparametric statistical analysis to evaluate thermal stability. Hundreds of additives were tested for effectiveness, and a package of additives was then formulated for JP-8 fuel. This package has been tested for fuel system materials compatibility and general fuel applicability. To date, the flight testing ha shown an improvement in thermal stability of JP-8 fuel. This improvement has resulted in a significant reduction in fuel-related maintenance costs and a threefold increase in mean time between fuel-related failures. In this manner, a novel high-thermal-stability jet fuel for the 21st century has been successfully developed.« less

  16. Novel features and enhancements in BioBin, a tool for the biologically inspired binning and association analysis of rare variants

    PubMed Central

    Byrska-Bishop, Marta; Wallace, John; Frase, Alexander T; Ritchie, Marylyn D

    2018-01-01

    Abstract Motivation BioBin is an automated bioinformatics tool for the multi-level biological binning of sequence variants. Herein, we present a significant update to BioBin which expands the software to facilitate a comprehensive rare variant analysis and incorporates novel features and analysis enhancements. Results In BioBin 2.3, we extend our software tool by implementing statistical association testing, updating the binning algorithm, as well as incorporating novel analysis features providing for a robust, highly customizable, and unified rare variant analysis tool. Availability and implementation The BioBin software package is open source and freely available to users at http://www.ritchielab.com/software/biobin-download Contact mdritchie@geisinger.edu Supplementary information Supplementary data are available at Bioinformatics online. PMID:28968757

  17. Computational statistics using the Bayesian Inference Engine

    NASA Astrophysics Data System (ADS)

    Weinberg, Martin D.

    2013-09-01

    This paper introduces the Bayesian Inference Engine (BIE), a general parallel, optimized software package for parameter inference and model selection. This package is motivated by the analysis needs of modern astronomical surveys and the need to organize and reuse expensive derived data. The BIE is the first platform for computational statistics designed explicitly to enable Bayesian update and model comparison for astronomical problems. Bayesian update is based on the representation of high-dimensional posterior distributions using metric-ball-tree based kernel density estimation. Among its algorithmic offerings, the BIE emphasizes hybrid tempered Markov chain Monte Carlo schemes that robustly sample multimodal posterior distributions in high-dimensional parameter spaces. Moreover, the BIE implements a full persistence or serialization system that stores the full byte-level image of the running inference and previously characterized posterior distributions for later use. Two new algorithms to compute the marginal likelihood from the posterior distribution, developed for and implemented in the BIE, enable model comparison for complex models and data sets. Finally, the BIE was designed to be a collaborative platform for applying Bayesian methodology to astronomy. It includes an extensible object-oriented and easily extended framework that implements every aspect of the Bayesian inference. By providing a variety of statistical algorithms for all phases of the inference problem, a scientist may explore a variety of approaches with a single model and data implementation. Additional technical details and download details are available from http://www.astro.umass.edu/bie. The BIE is distributed under the GNU General Public License.

  18. Reminder packaging for improving adherence to self-administered long-term medications.

    PubMed

    Heneghan, C J; Glasziou, P; Perera, R

    2006-01-25

    Current methods of improving medication adherence for health problems are mostly complex, labour-intensive, and not reliably effective. Medication 'reminder packaging' which incorporates a date or time for a medication to be taken in the packaging, can act as a reminder system to improve adherence. The objective of this review was to determine the effects of reminder packaging to enhance patient adherence with self-administered medications taken for one month or more. We searched the Cochrane Central Register of Controlled Trials (CENTRAL) and the Database of Abstracts of Reviews of Effects (DARE) (The Cochrane Library Issue 3, 2004), MEDLINE, EMBASE, CINAHL and PsycINFO from the start of the databases to 1 September 2004. We also searched the internet, contacted packaging manufacturers, and checked abstracts from the Pharm-line database and reference lists from relevant articles. We did not apply any language restrictions. We selected randomised controlled trials with at least 80% follow up, comparing a reminder packaging device with no device in participants taking self-administered medications for a minimum of one month. Two reviewers independently assessed studies for inclusion, assessed quality, and extracted data from included studies. Where considered appropriate, data were combined for meta-analysis, or were reported and discussed in a narrative. Eight studies containing data on 1,137 participants were included. Six intervention groups in four trials provided data on the percentage of pills taken. Reminder packaging showed a significant increase in the percentage of pills taken, weighted mean difference 11% (95% confidence interval (CI) 6% to 17%). Notable heterogeneity occurred among these trials I(2 )= 96.3%. Two trials provided data for the proportion of self-reported adherent patients, reporting a reduction in the intervention group which was not statistically significant, odds ratio = 0.89 (95% CI 0.56 to 1.40). No appropriate data were available for meta-analysis of different clinical outcomes, the most common of these being blood pressure (three out of eight trials). Other clinical outcomes reported were glycated haemoglobin, serum Vitamin C and E levels, and self-reported psychological symptoms (one trial each). Reminder packing may represent a simple method for improving adherence for patients with selected conditions examined to date. Further research is warranted to improve the design and targeting of these devices.

  19. A comparison of three approaches to non-stationary flood frequency analysis

    NASA Astrophysics Data System (ADS)

    Debele, S. E.; Strupczewski, W. G.; Bogdanowicz, E.

    2017-08-01

    Non-stationary flood frequency analysis (FFA) is applied to statistical analysis of seasonal flow maxima from Polish and Norwegian catchments. Three non-stationary estimation methods, namely, maximum likelihood (ML), two stage (WLS/TS) and GAMLSS (generalized additive model for location, scale and shape parameters), are compared in the context of capturing the effect of non-stationarity on the estimation of time-dependent moments and design quantiles. The use of a multimodel approach is recommended, to reduce the errors due to the model misspecification in the magnitude of quantiles. The results of calculations based on observed seasonal daily flow maxima and computer simulation experiments showed that GAMLSS gave the best results with respect to the relative bias and root mean square error in the estimates of trend in the standard deviation and the constant shape parameter, while WLS/TS provided better accuracy in the estimates of trend in the mean value. Within three compared methods the WLS/TS method is recommended to deal with non-stationarity in short time series. Some practical aspects of the GAMLSS package application are also presented. The detailed discussion of general issues related to consequences of climate change in the FFA is presented in the second part of the article entitled "Around and about an application of the GAMLSS package in non-stationary flood frequency analysis".

  20. DISFIT: A PROGRAM FOR FITTING DISTRIBUTIONS IN DATA

    EPA Science Inventory

    Although distribution fitting methods abound in the statistical literature, very few of these methods are found in the major statistical packages. In particular, SPSS (1975), BMD-P (1981) and SAS (1979) only give some overall tests for normality. There a few specialized distribut...

  1. Materials of acoustic analysis: sustained vowel versus sentence.

    PubMed

    Moon, Kyung Ray; Chung, Sung Min; Park, Hae Sang; Kim, Han Su

    2012-09-01

    Sustained vowel is a widely used material of acoustic analysis. However, vowel phonation does not sufficiently demonstrate sentence-based real-life phonation, and biases may occur depending on the test subjects intent during pronunciation. The purpose of this study was to investigate the differences between the results of acoustic analysis using each material. An individual prospective study. Two hundred two individuals (87 men and 115 women) with normal findings in videostroboscopy were enrolled. Acoustic analysis was done using the speech pattern element acquisition and display program. Fundamental frequency (Fx), amplitude (Ax), contact quotient (Qx), jitter, and shimmer were measured with sustained vowel-based acoustic analysis. Average fundamental frequency (FxM), average amplitude (AxM), average contact quotient (QxM), Fx perturbation (CFx), and amplitude perturbation (CAx) were measured with sentence-based acoustic analysis. Corresponding data of the two methods were compared with each other. SPSS (Statistical Package for the Social Sciences, Version 12.0; SPSS, Inc., Chicago, IL) software was used for statistical analysis. FxM was higher than Fx in men (Fx, 124.45 Hz; FxM, 133.09 Hz; P=0.000). In women, FxM seemed to be lower than Fx, but the results were not statistically significant (Fx, 210.58 Hz; FxM, 208.34 Hz; P=0.065). There was no statistical significance between Ax and AxM in both the groups. QxM was higher than Qx in men and women. Jitter was lower in men, but CFx was lower in women. Both Shimmer and CAx were higher in men. Sustained vowel phonation could not be a complete substitute for real-time phonation in acoustic analysis. Characteristics of acoustic materials should be considered when choosing the material for acoustic analysis and interpreting the results. Copyright © 2012 The Voice Foundation. Published by Mosby, Inc. All rights reserved.

  2. GWAR: robust analysis and meta-analysis of genome-wide association studies.

    PubMed

    Dimou, Niki L; Tsirigos, Konstantinos D; Elofsson, Arne; Bagos, Pantelis G

    2017-05-15

    In the context of genome-wide association studies (GWAS), there is a variety of statistical techniques in order to conduct the analysis, but, in most cases, the underlying genetic model is usually unknown. Under these circumstances, the classical Cochran-Armitage trend test (CATT) is suboptimal. Robust procedures that maximize the power and preserve the nominal type I error rate are preferable. Moreover, performing a meta-analysis using robust procedures is of great interest and has never been addressed in the past. The primary goal of this work is to implement several robust methods for analysis and meta-analysis in the statistical package Stata and subsequently to make the software available to the scientific community. The CATT under a recessive, additive and dominant model of inheritance as well as robust methods based on the Maximum Efficiency Robust Test statistic, the MAX statistic and the MIN2 were implemented in Stata. Concerning MAX and MIN2, we calculated their asymptotic null distributions relying on numerical integration resulting in a great gain in computational time without losing accuracy. All the aforementioned approaches were employed in a fixed or a random effects meta-analysis setting using summary data with weights equal to the reciprocal of the combined cases and controls. Overall, this is the first complete effort to implement procedures for analysis and meta-analysis in GWAS using Stata. A Stata program and a web-server are freely available for academic users at http://www.compgen.org/tools/GWAR. pbagos@compgen.org. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com

  3. Efficacy of UV-C irradiation for inactivation of food-borne pathogens on sliced cheese packaged with different types and thicknesses of plastic films.

    PubMed

    Ha, Jae-Won; Back, Kyeong-Hwan; Kim, Yoon-Hee; Kang, Dong-Hyun

    2016-08-01

    In this study, the efficacy of using UV-C light to inactivate sliced cheese inoculated with Escherichia coli O157:H7, Salmonella Typhimurium, and Listeria monocytogenes and, packaged with 0.07 mm films of polyethylene terephthalate (PET), polyvinylchloride (PVC), polypropylene (PP), and polyethylene (PE) was investigated. The results show that compared with PET and PVC, PP and PE films showed significantly reduced levels of the three pathogens compared to inoculated but non-treated controls. Therefore, PP and PE films of different thicknesses (0.07 mm, 0.10 mm, and 0.13 mm) were then evaluated for pathogen reduction of inoculated sliced cheese samples. Compared with 0.10 and 0.13 mm, 0.07 mm thick PP and PE films did not show statistically significant reductions compared to non-packaged treated samples. Moreover, there were no statistically significant differences between the efficacy of PP and PE films. These results suggest that adjusted PP or PE film packaging in conjunction with UV-C radiation can be applied to control foodborne pathogens in the dairy industry. Copyright © 2016. Published by Elsevier Ltd.

  4. Stan : A Probabilistic Programming Language

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Carpenter, Bob; Gelman, Andrew; Hoffman, Matthew D.

    Stan is a probabilistic programming language for specifying statistical models. A Stan program imperatively defines a log probability function over parameters conditioned on specified data and constants. As of version 2.14.0, Stan provides full Bayesian inference for continuous-variable models through Markov chain Monte Carlo methods such as the No-U-Turn sampler, an adaptive form of Hamiltonian Monte Carlo sampling. Penalized maximum likelihood estimates are calculated using optimization methods such as the limited memory Broyden-Fletcher-Goldfarb-Shanno algorithm. Stan is also a platform for computing log densities and their gradients and Hessians, which can be used in alternative algorithms such as variational Bayes, expectationmore » propagation, and marginal inference using approximate integration. To this end, Stan is set up so that the densities, gradients, and Hessians, along with intermediate quantities of the algorithm such as acceptance probabilities, are easily accessible. Stan can also be called from the command line using the cmdstan package, through R using the rstan package, and through Python using the pystan package. All three interfaces support sampling and optimization-based inference with diagnostics and posterior analysis. rstan and pystan also provide access to log probabilities, gradients, Hessians, parameter transforms, and specialized plotting.« less

  5. WhopGenome: high-speed access to whole-genome variation and sequence data in R.

    PubMed

    Wittelsbürger, Ulrich; Pfeifer, Bastian; Lercher, Martin J

    2015-02-01

    The statistical programming language R has become a de facto standard for the analysis of many types of biological data, and is well suited for the rapid development of new algorithms. However, variant call data from population-scale resequencing projects are typically too large to be read and processed efficiently with R's built-in I/O capabilities. WhopGenome can efficiently read whole-genome variation data stored in the widely used variant call format (VCF) file format into several R data types. VCF files can be accessed either on local hard drives or on remote servers. WhopGenome can associate variants with annotations such as those available from the UCSC genome browser, and can accelerate the reading process by filtering loci according to user-defined criteria. WhopGenome can also read other Tabix-indexed files and create indices to allow fast selective access to FASTA-formatted sequence files. The WhopGenome R package is available on CRAN at http://cran.r-project.org/web/packages/WhopGenome/. A Bioconductor package has been submitted. lercher@cs.uni-duesseldorf.de. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  6. Stan : A Probabilistic Programming Language

    DOE PAGES

    Carpenter, Bob; Gelman, Andrew; Hoffman, Matthew D.; ...

    2017-01-01

    Stan is a probabilistic programming language for specifying statistical models. A Stan program imperatively defines a log probability function over parameters conditioned on specified data and constants. As of version 2.14.0, Stan provides full Bayesian inference for continuous-variable models through Markov chain Monte Carlo methods such as the No-U-Turn sampler, an adaptive form of Hamiltonian Monte Carlo sampling. Penalized maximum likelihood estimates are calculated using optimization methods such as the limited memory Broyden-Fletcher-Goldfarb-Shanno algorithm. Stan is also a platform for computing log densities and their gradients and Hessians, which can be used in alternative algorithms such as variational Bayes, expectationmore » propagation, and marginal inference using approximate integration. To this end, Stan is set up so that the densities, gradients, and Hessians, along with intermediate quantities of the algorithm such as acceptance probabilities, are easily accessible. Stan can also be called from the command line using the cmdstan package, through R using the rstan package, and through Python using the pystan package. All three interfaces support sampling and optimization-based inference with diagnostics and posterior analysis. rstan and pystan also provide access to log probabilities, gradients, Hessians, parameter transforms, and specialized plotting.« less

  7. A comparison of the Health Star Rating system when used for restaurant fast foods and packaged foods.

    PubMed

    Dunford, Elizabeth K; Wu, Jason H Y; Wellard-Cole, Lyndal; Watson, Wendy; Crino, Michelle; Petersen, Kristina; Neal, Bruce

    2017-10-01

    In June 2014, the Australian government agreed to the voluntary implementation of an interpretive 'Health Star Rating' (HSR) front-of-pack labelling system for packaged foods. The aim of the system is to make it easier for consumers to compare the healthiness of products based on number of stars. With many Australians consuming fast food there is a strong rationale for extending the HSR system to include fast food items. To examine the performance of the HSR system when applied to fast foods. Nutrient content data for fast food menu items were collected from the websites of 13 large Australian fast-food chains. The HSR was calculated for each menu item. Statistics describing HSR values for fast foods were calculated and compared to results for comparable packaged foods. Data for 1529 fast food products were compared to data for 3810 packaged food products across 16 of 17 fast food product categories. The mean HSR for the fast foods was 2.5 and ranged from 0.5 to 5.0 and corresponding values for the comparator packaged foods were 2.6 and 0.5 to 5.0. Visual inspection of the data showed broadly comparable distributions of HSR values across the fast food and the packaged food categories, although statistically significant differences were apparent for seven categories (all p < 0.04). In some cases these differences reflected the large sample size and the power to detect small variations across fast foods and packaged food, and in others it appeared to reflect primarily differences in the mix of product types within a category. These data support the idea that the HSR system could be extended to Australian fast foods. There are likely to be significant benefits to the community from the use of a single standardised signposting system for healthiness across all fresh, packaged and restaurant foods. Copyright © 2017 Elsevier Ltd. All rights reserved.

  8. Mean values of Arnett's soft tissue analysis in Maratha ethnic (Indian) population - A cephalometric study.

    PubMed

    Singh, Shikha; Deshmukh, Sonali; Merani, Varsha; Rejintal, Neeta

    2016-01-01

    The aim of this article is to evaluate the mean cephalometric values for Arnett's soft tissue analysis in the Maratha ethnic (Indian) population. Lateral cephalograms of 60 patients (30 males and 30 females) aged 18-26 years were obtained with the patients in the Natural Head Position (NHP), with teeth in maximum intercuspation and lips in the rest position. Moreover, hand tracings were also done. The statistical analysis was performed with the help of a statistical software, the Statistical Package for the Social Sciences version 16, and Microsoft word and Excel (Microsoft office 2007) were used to generate the analytical data. Statistical significance was tested atP level (1% and 5% level of significance). Statistical analysis using student's unpaired t-test were performed. Various cephalometric values for the Maratha ethnic (Indian) population differed from Caucasian cephalometric values such as nasolabial inclination, incisor proclination, and exposure, which may affect the outcome of the orthodontic and orthognathic treatment. Marathas have more proclined maxillary incisors, less prominent chin, less facial length, acute nasolabial angle, and all soft tissue thickness are greater in Marathas except lower lip thickness (in Maratha males and females) and upper lip angle (in Maratha males) than those of the Caucasian population. It is a fact that all different ethnic races have different facial characters. The variability of the soft tissue integument in people with different ethnic origin makes it necessary to study the soft tissue standards of a particular community and consider those norms when planning an orthodontic and orthognathic treatment for particular racial and ethnic patients.

  9. The Importance of Take-Out Food Packaging Attributes: Conjoint Analysis and Quality Function Deployment Approach

    NASA Astrophysics Data System (ADS)

    Lestari Widaningrum, Dyah

    2014-03-01

    This research aims to investigate the importance of take-out food packaging attributes, using conjoint analysis and QFD approach among consumers of take-out food products in Jakarta, Indonesia. The conjoint results indicate that perception about packaging material (such as paper, plastic, and polystyrene foam) plays the most important role overall in consumer perception. The clustering results that there is strong segmentation in which take-out food packaging material consumer consider most important. Some consumers are mostly oriented toward the colour of packaging, while another segment of customers concerns on packaging shape and packaging information. Segmentation variables based on packaging response can provide very useful information to maximize image of products through the package's impact. The results of House of Quality development described that Conjoint Analysis - QFD is a useful combination of the two methodologies in product development, market segmentation, and the trade off between customers' requirements in the early stages of HOQ process

  10. Safety analysis report for packaging (onsite) steel drum

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    McCormick, W.A.

    This Safety Analysis Report for Packaging (SARP) provides the analyses and evaluations necessary to demonstrate that the steel drum packaging system meets the transportation safety requirements of HNF-PRO-154, Responsibilities and Procedures for all Hazardous Material Shipments, for an onsite packaging containing Type B quantities of solid and liquid radioactive materials. The basic component of the steel drum packaging system is the 208 L (55-gal) steel drum.

  11. A user-friendly workflow for analysis of Illumina gene expression bead array data available at the arrayanalysis.org portal.

    PubMed

    Eijssen, Lars M T; Goelela, Varshna S; Kelder, Thomas; Adriaens, Michiel E; Evelo, Chris T; Radonjic, Marijana

    2015-06-30

    Illumina whole-genome expression bead arrays are a widely used platform for transcriptomics. Most of the tools available for the analysis of the resulting data are not easily applicable by less experienced users. ArrayAnalysis.org provides researchers with an easy-to-use and comprehensive interface to the functionality of R and Bioconductor packages for microarray data analysis. As a modular open source project, it allows developers to contribute modules that provide support for additional types of data or extend workflows. To enable data analysis of Illumina bead arrays for a broad user community, we have developed a module for ArrayAnalysis.org that provides a free and user-friendly web interface for quality control and pre-processing for these arrays. This module can be used together with existing modules for statistical and pathway analysis to provide a full workflow for Illumina gene expression data analysis. The module accepts data exported from Illumina's GenomeStudio, and provides the user with quality control plots and normalized data. The outputs are directly linked to the existing statistics module of ArrayAnalysis.org, but can also be downloaded for further downstream analysis in third-party tools. The Illumina bead arrays analysis module is available at http://www.arrayanalysis.org . A user guide, a tutorial demonstrating the analysis of an example dataset, and R scripts are available. The module can be used as a starting point for statistical evaluation and pathway analysis provided on the website or to generate processed input data for a broad range of applications in life sciences research.

  12. Skills Analysis. Workshop Package on Skills Analysis, Skills Audit and Training Needs Analysis.

    ERIC Educational Resources Information Center

    Hayton, Geoff; And Others

    This four-part package is designed to assist Australian workshop leaders running 2-day workshops on skills analysis, skills audit, and training needs analysis. Part A contains information on how to use the package and a list of workshop aims. Parts B, C, and D consist, respectively, of the workshop leader's guide; overhead transparency sheets and…

  13. Waste reduction through consumer education. Final report

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Harrison, E.Z.

    The Waste Reduction through Consumer Education research project was conducted to determine how environmental educational strategies influence purchasing behavior in the supermarket. The objectives were to develop, demonstrate, and evaluate consumer education strategies for waste reduction. The amount of waste generated by packaging size and form, with an adjustment for local recyclability of waste, was determined for 14 product categories identified as having more waste generating and less waste generating product choices (a total of 484 products). Using supermarket scan data and shopper identification numbers, the research tracked the purchases of shoppers in groups receiving different education treatments for 9more » months. Statistical tests applied to the purchase data assessed patterns of change between the groups by treatment period. Analysis of the data revealed few meaningful statistical differences between study groups or changes in behavior over time. Findings suggest that broad brush consumer education about waste reduction is not effective in changing purchasing behaviors in the short term. However, it may help create a general awareness of the issues surrounding excess packaging and consumer responsibility. The study concludes that the answer to waste reduction in the future may be a combination of voluntary initiatives by manufacturers and retailers, governmental intervention, and better-informed consumers.« less

  14. Evaluation of Sensibility Threshold for Interocclusal Thickness of Patients Wearing Complete Dentures

    PubMed Central

    Shala, Kujtim Sh.; Ahmedi, Enis F.; Tmava-Dragusha, Arlinda

    2017-01-01

    Objective The aim of this study was to evaluate sensibility threshold for interocclusal thickness in experienced and nonexperienced denture wearers after the insertion of new complete dentures. Materials and Methods A total of 88 patients with complete dentures have participated in this study. The research was divided into two experimental groups, compared with the previous experience prosthetic dental treatment. The sensibility threshold for interocclusal thickness was measured with metal foil with 8 μm thickness and width of 8 mm, placed between the upper and lower incisor region. Statistical analysis was performed using standard software package BMDP (biomedical statistical package). Results Results suggest that time of measurement affects the average values of the sensibility threshold for interocclusal thickness (F = 242.68, p = 0.0000). Gender appeared to be a significant factor when it interacted with time measurement resulting in differences in sensibility threshold for interocclusal thickness (gender: F = 9.84, p = 0.018; F = 4.83, p = 0.0003). Conclusion The sensibility threshold for interocclusal thickness was the most important functional adaptation in patient with complete dentures. A unique trait of this indicator is the progressive reduction of initial values and a tendency to reestablish the stationary state in the fifteenth week after dentures is taken off. PMID:28702055

  15. Statistical Package User’s Guide.

    DTIC Science & Technology

    1980-08-01

    261 C. STACH Nonparametric Descriptive Statistics ... ......... ... 265 D. CHIRA Coefficient of Concordance...135 I.- -a - - W 7- Test Data: This program was tested using data from John Neter and William Wasserman, Applied Linear Statistical Models: Regression...length of data file e. new fileý name (not same as raw data file) 5. Printout as optioned for only. Comments: Ranked data are used for program CHIRA

  16. Quantitative analysis of tympanic membrane perforation: a simple and reliable method.

    PubMed

    Ibekwe, T S; Adeosun, A A; Nwaorgu, O G

    2009-01-01

    Accurate assessment of the features of tympanic membrane perforation, especially size, site, duration and aetiology, is important, as it enables optimum management. To describe a simple, cheap and effective method of quantitatively analysing tympanic membrane perforations. The system described comprises a video-otoscope (capable of generating still and video images of the tympanic membrane), adapted via a universal serial bus box to a computer screen, with images analysed using the Image J geometrical analysis software package. The reproducibility of results and their correlation with conventional otoscopic methods of estimation were tested statistically with the paired t-test and correlational tests, using the Statistical Package for the Social Sciences version 11 software. The following equation was generated: P/T x 100 per cent = percentage perforation, where P is the area (in pixels2) of the tympanic membrane perforation and T is the total area (in pixels2) for the entire tympanic membrane (including the perforation). Illustrations are shown. Comparison of blinded data on tympanic membrane perforation area obtained independently from assessments by two trained otologists, of comparative years of experience, using the video-otoscopy system described, showed similar findings, with strong correlations devoid of inter-observer error (p = 0.000, r = 1). Comparison with conventional otoscopic assessment also indicated significant correlation, comparing results for two trained otologists, but some inter-observer variation was present (p = 0.000, r = 0.896). Correlation between the two methods for each of the otologists was also highly significant (p = 0.000). A computer-adapted video-otoscope, with images analysed by Image J software, represents a cheap, reliable, technology-driven, clinical method of quantitative analysis of tympanic membrane perforations and injuries.

  17. Microcomputer package for statistical analysis of microbial populations.

    PubMed

    Lacroix, J M; Lavoie, M C

    1987-11-01

    We have developed a Pascal system to compare microbial populations from different ecological sites using microcomputers. The values calculated are: the coverage value and its standard error, the minimum similarity and the geometric similarity between two biological samples, and the Lambda test consisting of calculating the ratio of the mean similarity between two subsets by the mean similarity within subsets. This system is written for Apple II, IBM or compatible computers, but it can work for any computer which can use CP/M, if the programs are recompiled for such a system.

  18. Experimental Design and Power Calculation for RNA-seq Experiments.

    PubMed

    Wu, Zhijin; Wu, Hao

    2016-01-01

    Power calculation is a critical component of RNA-seq experimental design. The flexibility of RNA-seq experiment and the wide dynamic range of transcription it measures make it an attractive technology for whole transcriptome analysis. These features, in addition to the high dimensionality of RNA-seq data, bring complexity in experimental design, making an analytical power calculation no longer realistic. In this chapter we review the major factors that influence the statistical power of detecting differential expression, and give examples of power assessment using the R package PROPER.

  19. Missing data imputation: focusing on single imputation.

    PubMed

    Zhang, Zhongheng

    2016-01-01

    Complete case analysis is widely used for handling missing data, and it is the default method in many statistical packages. However, this method may introduce bias and some useful information will be omitted from analysis. Therefore, many imputation methods are developed to make gap end. The present article focuses on single imputation. Imputations with mean, median and mode are simple but, like complete case analysis, can introduce bias on mean and deviation. Furthermore, they ignore relationship with other variables. Regression imputation can preserve relationship between missing values and other variables. There are many sophisticated methods exist to handle missing values in longitudinal data. This article focuses primarily on how to implement R code to perform single imputation, while avoiding complex mathematical calculations.

  20. Missing data imputation: focusing on single imputation

    PubMed Central

    2016-01-01

    Complete case analysis is widely used for handling missing data, and it is the default method in many statistical packages. However, this method may introduce bias and some useful information will be omitted from analysis. Therefore, many imputation methods are developed to make gap end. The present article focuses on single imputation. Imputations with mean, median and mode are simple but, like complete case analysis, can introduce bias on mean and deviation. Furthermore, they ignore relationship with other variables. Regression imputation can preserve relationship between missing values and other variables. There are many sophisticated methods exist to handle missing values in longitudinal data. This article focuses primarily on how to implement R code to perform single imputation, while avoiding complex mathematical calculations. PMID:26855945

  1. MORPH-I (Ver 1.0) a software package for the analysis of scanning electron micrograph (binary formatted) images for the assessment of the fractal dimension of enclosed pore surfaces

    USGS Publications Warehouse

    Mossotti, Victor G.; Eldeeb, A. Raouf; Oscarson, Robert

    1998-01-01

    MORPH-I is a set of C-language computer programs for the IBM PC and compatible minicomputers. The programs in MORPH-I are used for the fractal analysis of scanning electron microscope and electron microprobe images of pore profiles exposed in cross-section. The program isolates and traces the cross-sectional profiles of exposed pores and computes the Richardson fractal dimension for each pore. Other programs in the set provide for image calibration, display, and statistical analysis of the computed dimensions for highly complex porous materials. Requirements: IBM PC or compatible; minimum 640 K RAM; mathcoprocessor; SVGA graphics board providing mode 103 display.

  2. Picante: R tools for integrating phylogenies and ecology.

    PubMed

    Kembel, Steven W; Cowan, Peter D; Helmus, Matthew R; Cornwell, William K; Morlon, Helene; Ackerly, David D; Blomberg, Simon P; Webb, Campbell O

    2010-06-01

    Picante is a software package that provides a comprehensive set of tools for analyzing the phylogenetic and trait diversity of ecological communities. The package calculates phylogenetic diversity metrics, performs trait comparative analyses, manipulates phenotypic and phylogenetic data, and performs tests for phylogenetic signal in trait distributions, community structure and species interactions. Picante is a package for the R statistical language and environment written in R and C, released under a GPL v2 open-source license, and freely available on the web (http://picante.r-forge.r-project.org) and from CRAN (http://cran.r-project.org).

  3. Environmental statistics with S-Plus

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Millard, S.P.; Neerchal, N.K.

    1999-12-01

    The combination of easy-to-use software with easy access to a description of the statistical methods (definitions, concepts, etc.) makes this book an excellent resource. One of the major features of this book is the inclusion of general information on environmental statistical methods and examples of how to implement these methods using the statistical software package S-Plus and the add-in modules Environmental-Stats for S-Plus, S+SpatialStats, and S-Plus for ArcView.

  4. PLATSIM: A Simulation and Analysis Package for Large-Order Flexible Systems. Version 2.0

    NASA Technical Reports Server (NTRS)

    Maghami, Peiman G.; Kenny, Sean P.; Giesy, Daniel P.

    1997-01-01

    The software package PLATSIM provides efficient time and frequency domain analysis of large-order generic space platforms. PLATSIM can perform open-loop analysis or closed-loop analysis with linear or nonlinear control system models. PLATSIM exploits the particular form of sparsity of the plant matrices for very efficient linear and nonlinear time domain analysis, as well as frequency domain analysis. A new, original algorithm for the efficient computation of open-loop and closed-loop frequency response functions for large-order systems has been developed and is implemented within the package. Furthermore, a novel and efficient jitter analysis routine which determines jitter and stability values from time simulations in a very efficient manner has been developed and is incorporated in the PLATSIM package. In the time domain analysis, PLATSIM simulates the response of the space platform to disturbances and calculates the jitter and stability values from the response time histories. In the frequency domain analysis, PLATSIM calculates frequency response function matrices and provides the corresponding Bode plots. The PLATSIM software package is written in MATLAB script language. A graphical user interface is developed in the package to provide convenient access to its various features.

  5. 2 × 2 Tables: a note on Campbell's recommendation.

    PubMed

    Busing, F M T A; Weaver, B; Dubois, S

    2016-04-15

    For 2 × 2 tables, Egon Pearson's N - 1 chi-squared statistic is theoretically more sound than Karl Pearson's chi-squared statistic, and provides more accurate p values. Moreover, Egon Pearson's N - 1 chi-squared statistic is equal to the Mantel-Haenszel chi-squared statistic for a single 2 × 2 table, and as such, is often available in statistical software packages like SPSS, SAS, Stata, or R, which facilitates compliance with Ian Campbell's recommendations. Copyright © 2015 John Wiley & Sons, Ltd.

  6. Multiple-Group Analysis Using the sem Package in the R System

    ERIC Educational Resources Information Center

    Evermann, Joerg

    2010-01-01

    Multiple-group analysis in covariance-based structural equation modeling (SEM) is an important technique to ensure the invariance of latent construct measurements and the validity of theoretical models across different subpopulations. However, not all SEM software packages provide multiple-group analysis capabilities. The sem package for the R…

  7. CORSSA: Community Online Resource for Statistical Seismicity Analysis

    NASA Astrophysics Data System (ADS)

    Zechar, J. D.; Hardebeck, J. L.; Michael, A. J.; Naylor, M.; Steacy, S.; Wiemer, S.; Zhuang, J.

    2011-12-01

    Statistical seismology is critical to the understanding of seismicity, the evaluation of proposed earthquake prediction and forecasting methods, and the assessment of seismic hazard. Unfortunately, despite its importance to seismology-especially to those aspects with great impact on public policy-statistical seismology is mostly ignored in the education of seismologists, and there is no central repository for the existing open-source software tools. To remedy these deficiencies, and with the broader goal to enhance the quality of statistical seismology research, we have begun building the Community Online Resource for Statistical Seismicity Analysis (CORSSA, www.corssa.org). We anticipate that the users of CORSSA will range from beginning graduate students to experienced researchers. More than 20 scientists from around the world met for a week in Zurich in May 2010 to kick-start the creation of CORSSA: the format and initial table of contents were defined; a governing structure was organized; and workshop participants began drafting articles. CORSSA materials are organized with respect to six themes, each will contain between four and eight articles. CORSSA now includes seven articles with an additional six in draft form along with forums for discussion, a glossary, and news about upcoming meetings, special issues, and recent papers. Each article is peer-reviewed and presents a balanced discussion, including illustrative examples and code snippets. Topics in the initial set of articles include: introductions to both CORSSA and statistical seismology, basic statistical tests and their role in seismology; understanding seismicity catalogs and their problems; basic techniques for modeling seismicity; and methods for testing earthquake predictability hypotheses. We have also begun curating a collection of statistical seismology software packages.

  8. PANDA: a pipeline toolbox for analyzing brain diffusion images.

    PubMed

    Cui, Zaixu; Zhong, Suyu; Xu, Pengfei; He, Yong; Gong, Gaolang

    2013-01-01

    Diffusion magnetic resonance imaging (dMRI) is widely used in both scientific research and clinical practice in in-vivo studies of the human brain. While a number of post-processing packages have been developed, fully automated processing of dMRI datasets remains challenging. Here, we developed a MATLAB toolbox named "Pipeline for Analyzing braiN Diffusion imAges" (PANDA) for fully automated processing of brain diffusion images. The processing modules of a few established packages, including FMRIB Software Library (FSL), Pipeline System for Octave and Matlab (PSOM), Diffusion Toolkit and MRIcron, were employed in PANDA. Using any number of raw dMRI datasets from different subjects, in either DICOM or NIfTI format, PANDA can automatically perform a series of steps to process DICOM/NIfTI to diffusion metrics [e.g., fractional anisotropy (FA) and mean diffusivity (MD)] that are ready for statistical analysis at the voxel-level, the atlas-level and the Tract-Based Spatial Statistics (TBSS)-level and can finish the construction of anatomical brain networks for all subjects. In particular, PANDA can process different subjects in parallel, using multiple cores either in a single computer or in a distributed computing environment, thus greatly reducing the time cost when dealing with a large number of datasets. In addition, PANDA has a friendly graphical user interface (GUI), allowing the user to be interactive and to adjust the input/output settings, as well as the processing parameters. As an open-source package, PANDA is freely available at http://www.nitrc.org/projects/panda/. This novel toolbox is expected to substantially simplify the image processing of dMRI datasets and facilitate human structural connectome studies.

  9. PANDA: a pipeline toolbox for analyzing brain diffusion images

    PubMed Central

    Cui, Zaixu; Zhong, Suyu; Xu, Pengfei; He, Yong; Gong, Gaolang

    2013-01-01

    Diffusion magnetic resonance imaging (dMRI) is widely used in both scientific research and clinical practice in in-vivo studies of the human brain. While a number of post-processing packages have been developed, fully automated processing of dMRI datasets remains challenging. Here, we developed a MATLAB toolbox named “Pipeline for Analyzing braiN Diffusion imAges” (PANDA) for fully automated processing of brain diffusion images. The processing modules of a few established packages, including FMRIB Software Library (FSL), Pipeline System for Octave and Matlab (PSOM), Diffusion Toolkit and MRIcron, were employed in PANDA. Using any number of raw dMRI datasets from different subjects, in either DICOM or NIfTI format, PANDA can automatically perform a series of steps to process DICOM/NIfTI to diffusion metrics [e.g., fractional anisotropy (FA) and mean diffusivity (MD)] that are ready for statistical analysis at the voxel-level, the atlas-level and the Tract-Based Spatial Statistics (TBSS)-level and can finish the construction of anatomical brain networks for all subjects. In particular, PANDA can process different subjects in parallel, using multiple cores either in a single computer or in a distributed computing environment, thus greatly reducing the time cost when dealing with a large number of datasets. In addition, PANDA has a friendly graphical user interface (GUI), allowing the user to be interactive and to adjust the input/output settings, as well as the processing parameters. As an open-source package, PANDA is freely available at http://www.nitrc.org/projects/panda/. This novel toolbox is expected to substantially simplify the image processing of dMRI datasets and facilitate human structural connectome studies. PMID:23439846

  10. GAPIT: genome association and prediction integrated tool.

    PubMed

    Lipka, Alexander E; Tian, Feng; Wang, Qishan; Peiffer, Jason; Li, Meng; Bradbury, Peter J; Gore, Michael A; Buckler, Edward S; Zhang, Zhiwu

    2012-09-15

    Software programs that conduct genome-wide association studies and genomic prediction and selection need to use methodologies that maximize statistical power, provide high prediction accuracy and run in a computationally efficient manner. We developed an R package called Genome Association and Prediction Integrated Tool (GAPIT) that implements advanced statistical methods including the compressed mixed linear model (CMLM) and CMLM-based genomic prediction and selection. The GAPIT package can handle large datasets in excess of 10 000 individuals and 1 million single-nucleotide polymorphisms with minimal computational time, while providing user-friendly access and concise tables and graphs to interpret results. http://www.maizegenetics.net/GAPIT. zhiwu.zhang@cornell.edu Supplementary data are available at Bioinformatics online.

  11. Data article on the effectiveness of entrepreneurship curriculum contents on entrepreneurial interest and knowledge of Nigerian university students.

    PubMed

    Olokundun, Maxwell; Iyiola, Oluwole; Ibidunni, Stephen; Ogbari, Mercy; Falola, Hezekiah; Salau, Odunayo; Peter, Fred; Borishade, Taiye

    2018-06-01

    The article presented data on the effectiveness of entrepreneurship curriculum contents on university students' entrepreneurial interest and knowledge. The study focused on the perceptions of Nigerian university students. Emphasis was laid on the first four universities in Nigeria to offer a degree programme in entrepreneurship. The study adopted quantitative approach with a descriptive research design to establish trends related to the objective of the study. Survey was be used as quantitative research method. The population of this study included all students in the selected universities. Data was analyzed with the use of Statistical Package for Social Sciences (SPSS). Mean score was used as statistical tool of analysis. The field data set is made widely accessible to enable critical or a more comprehensive investigation.

  12. Integrated data management for clinical studies: automatic transformation of data models with semantic annotations for principal investigators, data managers and statisticians.

    PubMed

    Dugas, Martin; Dugas-Breit, Susanne

    2014-01-01

    Design, execution and analysis of clinical studies involves several stakeholders with different professional backgrounds. Typically, principle investigators are familiar with standard office tools, data managers apply electronic data capture (EDC) systems and statisticians work with statistics software. Case report forms (CRFs) specify the data model of study subjects, evolve over time and consist of hundreds to thousands of data items per study. To avoid erroneous manual transformation work, a converting tool for different representations of study data models was designed. It can convert between office format, EDC and statistics format. In addition, it supports semantic annotations, which enable precise definitions for data items. A reference implementation is available as open source package ODMconverter at http://cran.r-project.org.

  13. User's manual for the coupled rotor/airframe vibration analysis graphic package

    NASA Technical Reports Server (NTRS)

    Studwell, R. E.

    1982-01-01

    User instructions for a graphics package for coupled rotor/airframe vibration analysis are presented. Responses to plot package messages which the user must make to activate plot package operations and options are described. Installation instructions required to set up the program on the CDC system are included. The plot package overlay structure and subroutines which have to be modified for the CDC system are also described. Operating instructions for CDC applications are included.

  14. Extended local similarity analysis (eLSA) of microbial community and other time series data with replicates.

    PubMed

    Xia, Li C; Steele, Joshua A; Cram, Jacob A; Cardon, Zoe G; Simmons, Sheri L; Vallino, Joseph J; Fuhrman, Jed A; Sun, Fengzhu

    2011-01-01

    The increasing availability of time series microbial community data from metagenomics and other molecular biological studies has enabled the analysis of large-scale microbial co-occurrence and association networks. Among the many analytical techniques available, the Local Similarity Analysis (LSA) method is unique in that it captures local and potentially time-delayed co-occurrence and association patterns in time series data that cannot otherwise be identified by ordinary correlation analysis. However LSA, as originally developed, does not consider time series data with replicates, which hinders the full exploitation of available information. With replicates, it is possible to understand the variability of local similarity (LS) score and to obtain its confidence interval. We extended our LSA technique to time series data with replicates and termed it extended LSA, or eLSA. Simulations showed the capability of eLSA to capture subinterval and time-delayed associations. We implemented the eLSA technique into an easy-to-use analytic software package. The software pipeline integrates data normalization, statistical correlation calculation, statistical significance evaluation, and association network construction steps. We applied the eLSA technique to microbial community and gene expression datasets, where unique time-dependent associations were identified. The extended LSA analysis technique was demonstrated to reveal statistically significant local and potentially time-delayed association patterns in replicated time series data beyond that of ordinary correlation analysis. These statistically significant associations can provide insights to the real dynamics of biological systems. The newly designed eLSA software efficiently streamlines the analysis and is freely available from the eLSA homepage, which can be accessed at http://meta.usc.edu/softs/lsa.

  15. Extended local similarity analysis (eLSA) of microbial community and other time series data with replicates

    PubMed Central

    2011-01-01

    Background The increasing availability of time series microbial community data from metagenomics and other molecular biological studies has enabled the analysis of large-scale microbial co-occurrence and association networks. Among the many analytical techniques available, the Local Similarity Analysis (LSA) method is unique in that it captures local and potentially time-delayed co-occurrence and association patterns in time series data that cannot otherwise be identified by ordinary correlation analysis. However LSA, as originally developed, does not consider time series data with replicates, which hinders the full exploitation of available information. With replicates, it is possible to understand the variability of local similarity (LS) score and to obtain its confidence interval. Results We extended our LSA technique to time series data with replicates and termed it extended LSA, or eLSA. Simulations showed the capability of eLSA to capture subinterval and time-delayed associations. We implemented the eLSA technique into an easy-to-use analytic software package. The software pipeline integrates data normalization, statistical correlation calculation, statistical significance evaluation, and association network construction steps. We applied the eLSA technique to microbial community and gene expression datasets, where unique time-dependent associations were identified. Conclusions The extended LSA analysis technique was demonstrated to reveal statistically significant local and potentially time-delayed association patterns in replicated time series data beyond that of ordinary correlation analysis. These statistically significant associations can provide insights to the real dynamics of biological systems. The newly designed eLSA software efficiently streamlines the analysis and is freely available from the eLSA homepage, which can be accessed at http://meta.usc.edu/softs/lsa. PMID:22784572

  16. Application of mixsep software package: Performance verification of male-mixed DNA analysis

    PubMed Central

    HU, NA; CONG, BIN; GAO, TAO; CHEN, YU; SHEN, JUNYI; LI, SHUJIN; MA, CHUNLING

    2015-01-01

    An experimental model of male-mixed DNA (n=297) was constructed according to the mixed DNA construction principle. This comprised the use of the Applied Biosystems (ABI) 7500 quantitative polymerase chain reaction system, with scientific validation of mixture proportion (Mx; root-mean-square error ≤0.02). Statistical analysis was performed on locus separation accuracy using mixsep, a DNA mixture separation R-package, and the analytical performance of mixsep was assessed by examining the data distribution pattern of different mixed gradients, short tandem repeat (STR) loci and mixed DNA types. The results showed that locus separation accuracy had a negative linear correlation with the mixed gradient (R2=−0.7121). With increasing mixed gradient imbalance, locus separation accuracy first increased and then decreased, with the highest value detected at a gradient of 1:3 (≥90%). The mixed gradient, which is the theoretical Mx, was one of the primary factors that influenced the success of mixed DNA analysis. Among the 16 STR loci detected by Identifiler®, the separation accuracy was relatively high (>88%) for loci D5S818, D8S1179 and FGA, whereas the median separation accuracy value was lowest for the D7S820 locus. STR loci with relatively large numbers of allelic drop-out (ADO; >15) were all located in the yellow and red channels, including loci D18S51, D19S433, FGA, TPOX and vWA. These five loci featured low allele peak heights, which was consistent with the low sensitivity of the ABI 3130xl Genetic Analyzer to yellow and red fluorescence. The locus separation accuracy of the mixsep package was substantially different with and without the inclusion of ADO loci; inclusion of ADO significantly reduced the analytical performance of the mixsep package, which was consistent with the lack of an ADO functional module in this software. The present study demonstrated that the mixsep software had a number of advantages and was recommended for analysis of mixed DNA. This software was easy to operate and produced understandable results with a degree of controllability. PMID:25936428

  17. A survey of tools for the analysis of quantitative PCR (qPCR) data.

    PubMed

    Pabinger, Stephan; Rödiger, Stefan; Kriegner, Albert; Vierlinger, Klemens; Weinhäusel, Andreas

    2014-09-01

    Real-time quantitative polymerase-chain-reaction (qPCR) is a standard technique in most laboratories used for various applications in basic research. Analysis of qPCR data is a crucial part of the entire experiment, which has led to the development of a plethora of methods. The released tools either cover specific parts of the workflow or provide complete analysis solutions. Here, we surveyed 27 open-access software packages and tools for the analysis of qPCR data. The survey includes 8 Microsoft Windows, 5 web-based, 9 R-based and 5 tools from other platforms. Reviewed packages and tools support the analysis of different qPCR applications, such as RNA quantification, DNA methylation, genotyping, identification of copy number variations, and digital PCR. We report an overview of the functionality, features and specific requirements of the individual software tools, such as data exchange formats, availability of a graphical user interface, included procedures for graphical data presentation, and offered statistical methods. In addition, we provide an overview about quantification strategies, and report various applications of qPCR. Our comprehensive survey showed that most tools use their own file format and only a fraction of the currently existing tools support the standardized data exchange format RDML. To allow a more streamlined and comparable analysis of qPCR data, more vendors and tools need to adapt the standardized format to encourage the exchange of data between instrument software, analysis tools, and researchers.

  18. Event time analysis of longitudinal neuroimage data.

    PubMed

    Sabuncu, Mert R; Bernal-Rusiel, Jorge L; Reuter, Martin; Greve, Douglas N; Fischl, Bruce

    2014-08-15

    This paper presents a method for the statistical analysis of the associations between longitudinal neuroimaging measurements, e.g., of cortical thickness, and the timing of a clinical event of interest, e.g., disease onset. The proposed approach consists of two steps, the first of which employs a linear mixed effects (LME) model to capture temporal variation in serial imaging data. The second step utilizes the extended Cox regression model to examine the relationship between time-dependent imaging measurements and the timing of the event of interest. We demonstrate the proposed method both for the univariate analysis of image-derived biomarkers, e.g., the volume of a structure of interest, and the exploratory mass-univariate analysis of measurements contained in maps, such as cortical thickness and gray matter density. The mass-univariate method employs a recently developed spatial extension of the LME model. We applied our method to analyze structural measurements computed using FreeSurfer, a widely used brain Magnetic Resonance Image (MRI) analysis software package. We provide a quantitative and objective empirical evaluation of the statistical performance of the proposed method on longitudinal data from subjects suffering from Mild Cognitive Impairment (MCI) at baseline. Copyright © 2014 Elsevier Inc. All rights reserved.

  19. clusterProfiler: an R package for comparing biological themes among gene clusters.

    PubMed

    Yu, Guangchuang; Wang, Li-Gen; Han, Yanyan; He, Qing-Yu

    2012-05-01

    Increasing quantitative data generated from transcriptomics and proteomics require integrative strategies for analysis. Here, we present an R package, clusterProfiler that automates the process of biological-term classification and the enrichment analysis of gene clusters. The analysis module and visualization module were combined into a reusable workflow. Currently, clusterProfiler supports three species, including humans, mice, and yeast. Methods provided in this package can be easily extended to other species and ontologies. The clusterProfiler package is released under Artistic-2.0 License within Bioconductor project. The source code and vignette are freely available at http://bioconductor.org/packages/release/bioc/html/clusterProfiler.html.

  20. 75 FR 65060 - Privacy Act of 1974; System of Records

    Federal Register 2010, 2011, 2012, 2013, 2014

    2010-10-21

    ... documented using the Statistical Package for the Social Sciences (SPSS). Information collected in SPSS... information will be used for statistical reports for the purpose of evaluating the need for development of... . All comments received will be available for public inspection in the Office of Regulation Policy and...

  1. Poppr: an R package for genetic analysis of populations with mixed (clonal/sexual) reproduction

    USDA-ARS?s Scientific Manuscript database

    Poppr is an R package for analysis of population genetic data. It extends the adegenet package and provides several novel tools, particularly with regard to analysis of data from admixed, clonal, and/or sexual populations. Currently, poppr can be used for dominant/codominant and haploid/diploid gene...

  2. Three-dimensional analysis of scoliosis surgery using stereophotogrammetry

    NASA Astrophysics Data System (ADS)

    Jang, Stanley B.; Booth, Kellogg S.; Reilly, Chris W.; Sawatzky, Bonita J.; Tredwell, Stephen J.

    1994-04-01

    A new stereophotogrammetric analysis and 3D visualization allow accurate assessment of the scoliotic spine during instrumentation. Stereophoto pairs taken at each stage of the operation and robust statistical techniques are used to compute 3D transformations of the vertebrae between stages. These determine rotation, translation, goodness of fit, and overall spinal contour. A polygonal model of the spine using commercial 3D modeling package is used to produce an animation sequence of the transformation. The visualization have provided some important observation. Correction of the scoliosis is achieved largely through vertebral translation and coronal plane rotation, contrary to claims that large axial rotations are required. The animations provide valuable qualitative information for surgeons assessing the results of scoliotic correction.

  3. Integration and global analysis of isothermal titration calorimetry data for studying macromolecular interactions.

    PubMed

    Brautigam, Chad A; Zhao, Huaying; Vargas, Carolyn; Keller, Sandro; Schuck, Peter

    2016-05-01

    Isothermal titration calorimetry (ITC) is a powerful and widely used method to measure the energetics of macromolecular interactions by recording a thermogram of differential heating power during a titration. However, traditional ITC analysis is limited by stochastic thermogram noise and by the limited information content of a single titration experiment. Here we present a protocol for bias-free thermogram integration based on automated shape analysis of the injection peaks, followed by combination of isotherms from different calorimetric titration experiments into a global analysis, statistical analysis of binding parameters and graphical presentation of the results. This is performed using the integrated public-domain software packages NITPIC, SEDPHAT and GUSSI. The recently developed low-noise thermogram integration approach and global analysis allow for more precise parameter estimates and more reliable quantification of multisite and multicomponent cooperative and competitive interactions. Titration experiments typically take 1-2.5 h each, and global analysis usually takes 10-20 min.

  4. Model-based clustering for RNA-seq data.

    PubMed

    Si, Yaqing; Liu, Peng; Li, Pinghua; Brutnell, Thomas P

    2014-01-15

    RNA-seq technology has been widely adopted as an attractive alternative to microarray-based methods to study global gene expression. However, robust statistical tools to analyze these complex datasets are still lacking. By grouping genes with similar expression profiles across treatments, cluster analysis provides insight into gene functions and networks, and hence is an important technique for RNA-seq data analysis. In this manuscript, we derive clustering algorithms based on appropriate probability models for RNA-seq data. An expectation-maximization algorithm and another two stochastic versions of expectation-maximization algorithms are described. In addition, a strategy for initialization based on likelihood is proposed to improve the clustering algorithms. Moreover, we present a model-based hybrid-hierarchical clustering method to generate a tree structure that allows visualization of relationships among clusters as well as flexibility of choosing the number of clusters. Results from both simulation studies and analysis of a maize RNA-seq dataset show that our proposed methods provide better clustering results than alternative methods such as the K-means algorithm and hierarchical clustering methods that are not based on probability models. An R package, MBCluster.Seq, has been developed to implement our proposed algorithms. This R package provides fast computation and is publicly available at http://www.r-project.org

  5. Performance evaluation of WAVEWATCH III model in the Persian Gulf using different wind resources

    NASA Astrophysics Data System (ADS)

    Kazeminezhad, Mohammad Hossein; Siadatmousavi, Seyed Mostafa

    2017-07-01

    The third-generation wave model, WAVEWATCH III, was employed to simulate bulk wave parameters in the Persian Gulf using three different wind sources: ERA-Interim, CCMP, and GFS-Analysis. Different formulations for whitecapping term and the energy transfer from wind to wave were used, namely the Tolman and Chalikov (J Phys Oceanogr 26:497-518, 1996), WAM cycle 4 (BJA and WAM4), and Ardhuin et al. (J Phys Oceanogr 40(9):1917-1941, 2010) (TEST405 and TEST451 parameterizations) source term packages. The obtained results from numerical simulations were compared to altimeter-derived significant wave heights and measured wave parameters at two stations in the northern part of the Persian Gulf through statistical indicators and the Taylor diagram. Comparison of the bulk wave parameters with measured values showed underestimation of wave height using all wind sources. However, the performance of the model was best when GFS-Analysis wind data were used. In general, when wind veering from southeast to northwest occurred, and wind speed was high during the rotation, the model underestimation of wave height was severe. Except for the Tolman and Chalikov (J Phys Oceanogr 26:497-518, 1996) source term package, which severely underestimated the bulk wave parameters during stormy condition, the performances of other formulations were practically similar. However, in terms of statistics, the Ardhuin et al. (J Phys Oceanogr 40(9):1917-1941, 2010) source terms with TEST405 parameterization were the most successful formulation in the Persian Gulf when compared to in situ and altimeter-derived observations.

  6. Flexible Adaptive Paradigms for fMRI Using a Novel Software Package ‘Brain Analysis in Real-Time’ (BART)

    PubMed Central

    Hellrung, Lydia; Hollmann, Maurice; Zscheyge, Oliver; Schlumm, Torsten; Kalberlah, Christian; Roggenhofer, Elisabeth; Okon-Singer, Hadas; Villringer, Arno; Horstmann, Annette

    2015-01-01

    In this work we present a new open source software package offering a unified framework for the real-time adaptation of fMRI stimulation procedures. The software provides a straightforward setup and highly flexible approach to adapt fMRI paradigms while the experiment is running. The general framework comprises the inclusion of parameters from subject’s compliance, such as directing gaze to visually presented stimuli and physiological fluctuations, like blood pressure or pulse. Additionally, this approach yields possibilities to investigate complex scientific questions, for example the influence of EEG rhythms or fMRI signals results themselves. To prove the concept of this approach, we used our software in a usability example for an fMRI experiment where the presentation of emotional pictures was dependent on the subject’s gaze position. This can have a significant impact on the results. So far, if this is taken into account during fMRI data analysis, it is commonly done by the post-hoc removal of erroneous trials. Here, we propose an a priori adaptation of the paradigm during the experiment’s runtime. Our fMRI findings clearly show the benefits of an adapted paradigm in terms of statistical power and higher effect sizes in emotion-related brain regions. This can be of special interest for all experiments with low statistical power due to a limited number of subjects, a limited amount of time, costs or available data to analyze, as is the case with real-time fMRI. PMID:25837719

  7. MoleculaRnetworks: an integrated graph theoretic and data mining tool to explore solvent organization in molecular simulation.

    PubMed

    Mooney, Barbara Logan; Corrales, L René; Clark, Aurora E

    2012-03-30

    This work discusses scripts for processing molecular simulations data written using the software package R: A Language and Environment for Statistical Computing. These scripts, named moleculaRnetworks, are intended for the geometric and solvent network analysis of aqueous solutes and can be extended to other H-bonded solvents. New algorithms, several of which are based on graph theory, that interrogate the solvent environment about a solute are presented and described. This includes a novel method for identifying the geometric shape adopted by the solvent in the immediate vicinity of the solute and an exploratory approach for describing H-bonding, both based on the PageRank algorithm of Google search fame. The moleculaRnetworks codes include a preprocessor, which distills simulation trajectories into physicochemical data arrays, and an interactive analysis script that enables statistical, trend, and correlation analysis, and other data mining. The goal of these scripts is to increase access to the wealth of structural and dynamical information that can be obtained from molecular simulations. Copyright © 2012 Wiley Periodicals, Inc.

  8. Statistical tools for analysis and modeling of cosmic populations and astronomical time series: CUDAHM and TSE

    NASA Astrophysics Data System (ADS)

    Loredo, Thomas; Budavari, Tamas; Scargle, Jeffrey D.

    2018-01-01

    This presentation provides an overview of open-source software packages addressing two challenging classes of astrostatistics problems. (1) CUDAHM is a C++ framework for hierarchical Bayesian modeling of cosmic populations, leveraging graphics processing units (GPUs) to enable applying this computationally challenging paradigm to large datasets. CUDAHM is motivated by measurement error problems in astronomy, where density estimation and linear and nonlinear regression must be addressed for populations of thousands to millions of objects whose features are measured with possibly complex uncertainties, potentially including selection effects. An example calculation demonstrates accurate GPU-accelerated luminosity function estimation for simulated populations of $10^6$ objects in about two hours using a single NVIDIA Tesla K40c GPU. (2) Time Series Explorer (TSE) is a collection of software in Python and MATLAB for exploratory analysis and statistical modeling of astronomical time series. It comprises a library of stand-alone functions and classes, as well as an application environment for interactive exploration of times series data. The presentation will summarize key capabilities of this emerging project, including new algorithms for analysis of irregularly-sampled time series.

  9. New Mexico Play Fairway Analysis: Particle Tracking ArcGIS Map Packages

    DOE Data Explorer

    Jeff Pepin

    2015-11-15

    These are map packages used to visualize geochemical particle-tracking analysis results in ArcGIS. It includes individual map packages for several regions of New Mexico including: Acoma, Rincon, Gila, Las Cruces, Socorro and Truth or Consequences.

  10. Analysis of Stakeholder's Behaviours for an Improved Management of an Agricultural Coastal Region in Oman

    NASA Astrophysics Data System (ADS)

    Khatri, Ayisha Al; Jens, Grundmann; der Weth Rüdiger, van; Niels, Schütze

    2015-04-01

    Al Batinah coastal area is the main agricultural region in Oman. Agriculture is concentrated in Al Batinah, because of more fertile soils and easier access to water in the form of groundwater compared to other administrative areas in the country. The region now is facing a problem as a result of over abstraction of fresh groundwater for irrigation from the main aquifer along the coast. This enforces the inflow of sea water into the coastal aquifer and causes salinization of the groundwater. As a consequence the groundwater becomes no longer suitable for irrigation which impacts the social and economical situation of farmers as well as the environment. Therefore, the existing situation generates conflicts between different stakeholders regarding water availability, sustainable aquifer management, and profitable agricultural production in Al Batinah region. Several management measures to maintain the groundwater aquifer in the region, were implemented by the government. However, these solutions showed only limited successes for the existing problem. The aim of this study now is to evaluate the implementation potential of several management interventions and their combinations by analysing opinions and responses of all relevant stakeholders in the region. This is done in order to identify potential conflicts among stakeholders to a participatory process within the frame of an integrated water resources management and to support decision makers in taking more informed decisions. Questionnaires were designed for collecting data from different groups of stakeholders e.g. water professionals, farmers from the study area and decision makers of different organizations and ministries. These data were analysed statistically for each group separately as well as regarding relations amongst groups by using the SPSS (Statistical Package for Social Science) software package. Results show, that the need to improve the situation is supported by all groups. However, significant differences exist between groups on how to achieve this improvement, since farmers prefer management interventions operating more on the water resources side while decision makers support measures for a better management on the water demand side. Furthermore, the opinions within single groups are sometimes contradicting for several management interventions. The use of more advanced statistical methods like discriminant analysis or Bayesian network allow for identifying factors and drivers to explain these differences. Both approaches, will help to understand stakeholder's behaviours and to evaluate the implementation potential of several management interventions. Keywords IWRM, Stakeholder participation, field survey, statistical analysis, Oman

  11. Association between Stereotactic Radiotherapy and Death from Brain Metastases of Epithelial Ovarian Cancer: a Gliwice Data Re-Analysis with Penalization

    PubMed

    Tukiendorf, Andrzej; Mansournia, Mohammad Ali; Wydmański, Jerzy; Wolny-Rokicka, Edyta

    2017-04-01

    Background: Clinical datasets for epithelial ovarian cancer brain metastatic patients are usually small in size. When adequate case numbers are lacking, resulting estimates of regression coefficients may demonstrate bias. One of the direct approaches to reduce such sparse-data bias is based on penalized estimation. Methods: A re- analysis of formerly reported hazard ratios in diagnosed patients was performed using penalized Cox regression with a popular SAS package providing additional software codes for a statistical computational procedure. Results: It was found that the penalized approach can readily diminish sparse data artefacts and radically reduce the magnitude of estimated regression coefficients. Conclusions: It was confirmed that classical statistical approaches may exaggerate regression estimates or distort study interpretations and conclusions. The results support the thesis that penalization via weak informative priors and data augmentation are the safest approaches to shrink sparse data artefacts frequently occurring in epidemiological research. Creative Commons Attribution License

  12. ProteoSign: an end-user online differential proteomics statistical analysis platform.

    PubMed

    Efstathiou, Georgios; Antonakis, Andreas N; Pavlopoulos, Georgios A; Theodosiou, Theodosios; Divanach, Peter; Trudgian, David C; Thomas, Benjamin; Papanikolaou, Nikolas; Aivaliotis, Michalis; Acuto, Oreste; Iliopoulos, Ioannis

    2017-07-03

    Profiling of proteome dynamics is crucial for understanding cellular behavior in response to intrinsic and extrinsic stimuli and maintenance of homeostasis. Over the last 20 years, mass spectrometry (MS) has emerged as the most powerful tool for large-scale identification and characterization of proteins. Bottom-up proteomics, the most common MS-based proteomics approach, has always been challenging in terms of data management, processing, analysis and visualization, with modern instruments capable of producing several gigabytes of data out of a single experiment. Here, we present ProteoSign, a freely available web application, dedicated in allowing users to perform proteomics differential expression/abundance analysis in a user-friendly and self-explanatory way. Although several non-commercial standalone tools have been developed for post-quantification statistical analysis of proteomics data, most of them are not end-user appealing as they often require very stringent installation of programming environments, third-party software packages and sometimes further scripting or computer programming. To avoid this bottleneck, we have developed a user-friendly software platform accessible via a web interface in order to enable proteomics laboratories and core facilities to statistically analyse quantitative proteomics data sets in a resource-efficient manner. ProteoSign is available at http://bioinformatics.med.uoc.gr/ProteoSign and the source code at https://github.com/yorgodillo/ProteoSign. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  13. Identifying drought response of semi-arid aeolian systems using near-surface luminescence profiles and changepoint analysis, Nebraska Sandhills.

    NASA Astrophysics Data System (ADS)

    Buckland, Catherine; Bailey, Richard; Thomas, David

    2017-04-01

    Two billion people living in drylands are affected by land degradation. Sediment erosion by wind and water removes fertile soil and destabilises landscapes. Vegetation disturbance is a key driver of dryland erosion caused by both natural and human forcings: drought, fire, land use, grazing pressure. A quantified understanding of vegetation cover sensitivities and resultant surface change to forcing factors is needed if the vegetation and landscape response to future climate change and human pressure are to be better predicted. Using quartz luminescence dating and statistical changepoint analysis (Killick & Eckley, 2014) this study demonstrates the ability to identify step-changes in depositional age of near-surface sediments. Lx/Tx luminescence profiles coupled with statistical analysis show the use of near-surface sediments in providing a high-resolution record of recent system response and aeolian system thresholds. This research determines how the environment has recorded and retained sedimentary evidence of drought response and land use disturbances over the last two hundred years across both individual landforms and the wider Nebraska Sandhills. Identifying surface deposition and comparing with records of climate, fire and land use changes allows us to assess the sensitivity and stability of the surface sediment to a range of forcing factors. Killick, R and Eckley, IA. (2014) "changepoint: An R Package for Changepoint Analysis." Journal of Statistical Software, (58) 1-19.

  14. PresenceAbsence: An R package for presence absence analysis

    Treesearch

    Elizabeth A. Freeman; Gretchen Moisen

    2008-01-01

    The PresenceAbsence package for R provides a set of functions useful when evaluating the results of presence-absence analysis, for example, models of species distribution or the analysis of diagnostic tests. The package provides a toolkit for selecting the optimal threshold for translating a probability surface into presence-absence maps specifically tailored to their...

  15. European consumer attitudes on the associated health benefits of neutraceutical-containing processed meats using Co-enzyme Q10 as a sample functional ingredient.

    PubMed

    Tobin, Brian D; O'Sullivan, Maurice G; Hamill, Ruth; Kerry, Joseph P

    2014-06-01

    This study accumulated European consumer attitudes towards processed meats and their use as a functional food. A survey was set up using an online web-application to gather information on consumer perception of processed meats as well as neutraceutical-containing processed meats. 548 responses were obtained and statistical analysis was carried out using a statistical software package. Data was summarized as frequencies for each question and statistical differences analyzed using the Chi-Square statistical test with a significance level of 5% (P<0.05). The majority of consumer attitudes towards processed meat indicate that they are unhealthy products. Most believe that processed meats contain large quantities of harmful chemicals, fat and salt. Consumers were found to be very pro-bioactive compounds in yogurt style products but unsure of their feelings in meat based products, which is likely due to the lack of familiarity to these products. Many of the respondents were willing to consume meat based functional foods but were not willing to pay more for them. Copyright © 2014 Elsevier Ltd. All rights reserved.

  16. Statistical software applications used in health services research: analysis of published studies in the U.S

    PubMed Central

    2011-01-01

    Background This study aims to identify the statistical software applications most commonly employed for data analysis in health services research (HSR) studies in the U.S. The study also examines the extent to which information describing the specific analytical software utilized is provided in published articles reporting on HSR studies. Methods Data were extracted from a sample of 1,139 articles (including 877 original research articles) published between 2007 and 2009 in three U.S. HSR journals, that were considered to be representative of the field based upon a set of selection criteria. Descriptive analyses were conducted to categorize patterns in statistical software usage in those articles. The data were stratified by calendar year to detect trends in software use over time. Results Only 61.0% of original research articles in prominent U.S. HSR journals identified the particular type of statistical software application used for data analysis. Stata and SAS were overwhelmingly the most commonly used software applications employed (in 46.0% and 42.6% of articles respectively). However, SAS use grew considerably during the study period compared to other applications. Stratification of the data revealed that the type of statistical software used varied considerably by whether authors were from the U.S. or from other countries. Conclusions The findings highlight a need for HSR investigators to identify more consistently the specific analytical software used in their studies. Knowing that information can be important, because different software packages might produce varying results, owing to differences in the software's underlying estimation methods. PMID:21977990

  17. MODEL 9977 B(M)F-96 SAFETY ANALYSIS REPORT FOR PACKAGING

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Abramczyk, G; Paul Blanton, P; Kurt Eberl, K

    2006-05-18

    This Safety Analysis Report for Packaging (SARP) documents the analysis and testing performed on and for the 9977 Shipping Package, referred to as the General Purpose Fissile Package (GPFP). The performance evaluation presented in this SARP documents the compliance of the 9977 package with the regulatory safety requirements for Type B packages. Per 10 CFR 71.59, for the 9977 packages evaluated in this SARP, the value of ''N'' is 50, and the Transport Index based on nuclear criticality control is 1.0. The 9977 package is designed with a high degree of single containment. The 9977 complies with 10 CFR 71more » (2002), Department of Energy (DOE) Order 460.1B, DOE Order 460.2, and 10 CFR 20 (2003) for As Low As Reasonably Achievable (ALARA) principles. The 9977 also satisfies the requirements of the Regulations for the Safe Transport of Radioactive Material--1996 Edition (Revised)--Requirements. IAEA Safety Standards, Safety Series No. TS-R-1 (ST-1, Rev.), International Atomic Energy Agency, Vienna, Austria (2000). The 9977 package is designed, analyzed and fabricated in accordance with Section III of the American Society of Mechanical Engineers (ASME) Boiler and Pressure Vessel (B&PV) Code, 1992 edition.« less

  18. Astrophysical properties of star clusters in the Magellanic Clouds homogeneously estimated by ASteCA

    NASA Astrophysics Data System (ADS)

    Perren, G. I.; Piatti, A. E.; Vázquez, R. A.

    2017-06-01

    Aims: We seek to produce a homogeneous catalog of astrophysical parameters of 239 resolved star clusters, located in the Small and Large Magellanic Clouds, observed in the Washington photometric system. Methods: The cluster sample was processed with the recently introduced Automated Stellar Cluster Analysis (ASteCA) package, which ensures both an automatized and a fully reproducible treatment, together with a statistically based analysis of their fundamental parameters and associated uncertainties. The fundamental parameters determined for each cluster with this tool, via a color-magnitude diagram (CMD) analysis, are metallicity, age, reddening, distance modulus, and total mass. Results: We generated a homogeneous catalog of structural and fundamental parameters for the studied cluster sample and performed a detailed internal error analysis along with a thorough comparison with values taken from 26 published articles. We studied the distribution of cluster fundamental parameters in both Clouds and obtained their age-metallicity relationships. Conclusions: The ASteCA package can be applied to an unsupervised determination of fundamental cluster parameters, which is a task of increasing relevance as more data becomes available through upcoming surveys. A table with the estimated fundamental parameters for the 239 clusters analyzed is only available at the CDS via anonymous ftp to http://cdsarc.u-strasbg.fr (http://130.79.128.5) or via http://cdsarc.u-strasbg.fr/viz-bin/qcat?J/A+A/602/A89

  19. Statistical analysis of the factors that influenced the mechanical properties improvement of cassava starch films

    NASA Astrophysics Data System (ADS)

    Monteiro, Mayra; Oliveira, Victor; Santos, Francisco; Barros Neto, Eduardo; Silva, Karyn; Silva, Rayane; Henrique, João; Chibério, Abimaelle

    2017-08-01

    In order to obtain cassava starch films with improved mechanical properties in relation to the synthetic polymer in the packaging production, a complete factorial design 23 was carried out in order to investigate which factor significantly influences the tensile strength of the biofilm. The factors to be investigated were cassava starch, glycerol and modified clay contents. Modified bentonite clay was used as a filling material of the biofilm. Glycerol was the plasticizer used to thermoplastify cassava starch. The factorial analysis suggested a regression model capable of predicting the optimal mechanical property of the cassava starch film from the maximization of the tensile strength. The reliability of the regression model was tested by the correlation established with the experimental data through the following statistical analyse: Pareto graph. The modified clay was the factor of greater statistical significance on the observed response variable, being the factor that contributed most to the improvement of the mechanical property of the starch film. The factorial experiments showed that the interaction of glycerol with both modified clay and cassava starch was significant for the reduction of biofilm ductility. Modified clay and cassava starch contributed to the maximization of biofilm ductility, while glycerol contributed to the minimization.

  20. Minfi: a flexible and comprehensive Bioconductor package for the analysis of Infinium DNA methylation microarrays

    PubMed Central

    Aryee, Martin J.; Jaffe, Andrew E.; Corrada-Bravo, Hector; Ladd-Acosta, Christine; Feinberg, Andrew P.; Hansen, Kasper D.; Irizarry, Rafael A.

    2014-01-01

    Motivation: The recently released Infinium HumanMethylation450 array (the ‘450k’ array) provides a high-throughput assay to quantify DNA methylation (DNAm) at ∼450 000 loci across a range of genomic features. Although less comprehensive than high-throughput sequencing-based techniques, this product is more cost-effective and promises to be the most widely used DNAm high-throughput measurement technology over the next several years. Results: Here we describe a suite of computational tools that incorporate state-of-the-art statistical techniques for the analysis of DNAm data. The software is structured to easily adapt to future versions of the technology. We include methods for preprocessing, quality assessment and detection of differentially methylated regions from the kilobase to the megabase scale. We show how our software provides a powerful and flexible development platform for future methods. We also illustrate how our methods empower the technology to make discoveries previously thought to be possible only with sequencing-based methods. Availability and implementation: http://bioconductor.org/packages/release/bioc/html/minfi.html. Contact: khansen@jhsph.edu; rafa@jimmy.harvard.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:24478339

  1. A 125 year history of topographic mapping and GIS in the U.S. Geological Survey 1884-2009, part 2: 1980-2009

    USGS Publications Warehouse

    Usery, E. Lynn; Varanka, Dalia; Finn, Michael P.

    2009-01-01

    The United States Geological Survey (USGS) entered the mainstream of developments in computer-assisted technology for mapping during the 1970s. The introduction by USGS of digital line graphs (DLGs), digital elevation models (DEMs), and land use data analysis (LUDA) nationwide land-cover data provided a base for the rapid expansion of the use of GIS in the 1980s. Whereas USGS had developed the topologically structured DLG data and the Geographic Information Retrieval and Analysis System (GIRAS) for land-cover data, the Map Overlay Statistical System (MOSS), a nontopologically structured GIS software package developed by Autometric, Inc., under contract to the U.S. Fish and Wildlife Service, dominated the use of GIS by federal agencies in the 1970s. Thus, USGS data was used in MOSS, but the topological structure, which later became a requirement for GIS vector datasets, was not used in early GIS applications. The introduction of Esri's ARC/INFO in 1982 changed that, and by the end of the 1980s, topological structure for vector data was essential, and ARC/INFO was the dominant GIS software package used by federal agencies.

  2. GOTHiC, a probabilistic model to resolve complex biases and to identify real interactions in Hi-C data.

    PubMed

    Mifsud, Borbala; Martincorena, Inigo; Darbo, Elodie; Sugar, Robert; Schoenfelder, Stefan; Fraser, Peter; Luscombe, Nicholas M

    2017-01-01

    Hi-C is one of the main methods for investigating spatial co-localisation of DNA in the nucleus. However, the raw sequencing data obtained from Hi-C experiments suffer from large biases and spurious contacts, making it difficult to identify true interactions. Existing methods use complex models to account for biases and do not provide a significance threshold for detecting interactions. Here we introduce a simple binomial probabilistic model that resolves complex biases and distinguishes between true and false interactions. The model corrects biases of known and unknown origin and yields a p-value for each interaction, providing a reliable threshold based on significance. We demonstrate this experimentally by testing the method against a random ligation dataset. Our method outperforms previous methods and provides a statistical framework for further data analysis, such as comparisons of Hi-C interactions between different conditions. GOTHiC is available as a BioConductor package (http://www.bioconductor.org/packages/release/bioc/html/GOTHiC.html).

  3. Fabrication and analysis of microfiber array platform for optogenetics with cellular resolution

    PubMed Central

    Chen, Jian-Hong; Chou, Ming-Yi; Pan, Chien-Yuan; Wang, Lon A.

    2016-01-01

    Optogenetics has emerged as a revolutionary technology especially for neuroscience and has advanced continuously over the past decade. Conventional approaches for patterned in vivo optical illumination have a limitation on the implanted device size and achievable spatio-temporal resolution. In this work, we developed a fabrication process for a microfiber array platform. Arrayed poly(methyl methacrylate) (PMMA) microfibers were drawn from a polymer solution and packaged with polydimethylsiloxane (PDMS). The exposed end face of a packaged microfiber was tuned to have a size corresponding to a single cell. To demonstrate its capability for single cell optogenetics, HEK293T cells expressing channelrhodopsin-2 (ChR2) were cultured on the platform and excited with UV laser. We could then observe an elevation in the intracellular Ca2+ concentrations due to the influx of Ca2+ through the activated ChR2 into the cytosol. The statistical and simulation results indicate that the proposed microfiber array platform can be used for single cell optogenetic applications. PMID:27895984

  4. Evaluating Dense 3d Reconstruction Software Packages for Oblique Monitoring of Crop Canopy Surface

    NASA Astrophysics Data System (ADS)

    Brocks, S.; Bareth, G.

    2016-06-01

    Crop Surface Models (CSMs) are 2.5D raster surfaces representing absolute plant canopy height. Using multiple CMSs generated from data acquired at multiple time steps, a crop surface monitoring is enabled. This makes it possible to monitor crop growth over time and can be used for monitoring in-field crop growth variability which is useful in the context of high-throughput phenotyping. This study aims to evaluate several software packages for dense 3D reconstruction from multiple overlapping RGB images on field and plot-scale. A summer barley field experiment located at the Campus Klein-Altendorf of University of Bonn was observed by acquiring stereo images from an oblique angle using consumer-grade smart cameras. Two such cameras were mounted at an elevation of 10 m and acquired images for a period of two months during the growing period of 2014. The field experiment consisted of nine barley cultivars that were cultivated in multiple repetitions and nitrogen treatments. Manual plant height measurements were carried out at four dates during the observation period. The software packages Agisoft PhotoScan, VisualSfM with CMVS/PMVS2 and SURE are investigated. The point clouds are georeferenced through a set of ground control points. Where adequate results are reached, a statistical analysis is performed.

  5. Survival modeling for the estimation of transition probabilities in model-based economic evaluations in the absence of individual patient data: a tutorial.

    PubMed

    Diaby, Vakaramoko; Adunlin, Georges; Montero, Alberto J

    2014-02-01

    Survival modeling techniques are increasingly being used as part of decision modeling for health economic evaluations. As many models are available, it is imperative for interested readers to know about the steps in selecting and using the most suitable ones. The objective of this paper is to propose a tutorial for the application of appropriate survival modeling techniques to estimate transition probabilities, for use in model-based economic evaluations, in the absence of individual patient data (IPD). An illustration of the use of the tutorial is provided based on the final progression-free survival (PFS) analysis of the BOLERO-2 trial in metastatic breast cancer (mBC). An algorithm was adopted from Guyot and colleagues, and was then run in the statistical package R to reconstruct IPD, based on the final PFS analysis of the BOLERO-2 trial. It should be emphasized that the reconstructed IPD represent an approximation of the original data. Afterwards, we fitted parametric models to the reconstructed IPD in the statistical package Stata. Both statistical and graphical tests were conducted to verify the relative and absolute validity of the findings. Finally, the equations for transition probabilities were derived using the general equation for transition probabilities used in model-based economic evaluations, and the parameters were estimated from fitted distributions. The results of the application of the tutorial suggest that the log-logistic model best fits the reconstructed data from the latest published Kaplan-Meier (KM) curves of the BOLERO-2 trial. Results from the regression analyses were confirmed graphically. An equation for transition probabilities was obtained for each arm of the BOLERO-2 trial. In this paper, a tutorial was proposed and used to estimate the transition probabilities for model-based economic evaluation, based on the results of the final PFS analysis of the BOLERO-2 trial in mBC. The results of our study can serve as a basis for any model (Markov) that needs the parameterization of transition probabilities, and only has summary KM plots available.

  6. Study of the structure changes caused by volcanic activity in Mexico applying the lineament analysis to the Aster (Terra) satellite data.

    NASA Astrophysics Data System (ADS)

    Arellano-Baeza, A. A.; Garcia, R. V.; Trejo-Soto, M.; Molina-Sauceda, E.

    Mexico is one of the most volcanically active regions in North America Volcanic activity in central Mexico is associated with the subduction of the Cocos and Rivera plates beneath the North American plate Periods of enhanced microseismic activity associated with the volcanic activity of the Colima and Popocapetl volcanoes are compared to some periods of low microseismic activity We detected changes in the number and orientation of lineaments associated with the microseismic activity due to lineament analysis of a temporal sequence of high resolution satellite images of both volcanoes 15 m resolution multispectral images provided by the ASTER VNIR instrument were used The Lineament Extraction and Stripes Statistic Analysis LESSA software package was employed for the lineament extraction

  7. SigTree: A Microbial Community Analysis Tool to Identify and Visualize Significantly Responsive Branches in a Phylogenetic Tree.

    PubMed

    Stevens, John R; Jones, Todd R; Lefevre, Michael; Ganesan, Balasubramanian; Weimer, Bart C

    2017-01-01

    Microbial community analysis experiments to assess the effect of a treatment intervention (or environmental change) on the relative abundance levels of multiple related microbial species (or operational taxonomic units) simultaneously using high throughput genomics are becoming increasingly common. Within the framework of the evolutionary phylogeny of all species considered in the experiment, this translates to a statistical need to identify the phylogenetic branches that exhibit a significant consensus response (in terms of operational taxonomic unit abundance) to the intervention. We present the R software package SigTree , a collection of flexible tools that make use of meta-analysis methods and regular expressions to identify and visualize significantly responsive branches in a phylogenetic tree, while appropriately adjusting for multiple comparisons.

  8. Emerging Personnel Requirements in Academic Libraries as Reflected in Recent Position Announcements.

    ERIC Educational Resources Information Center

    Block, David

    This study of the personnel requirements and hiring patterns of academic libraries draws on data collected from academic library position announcements issued nationwide during the fourth quarter of 1980. Data on 224 announcements were analyzed using the Statistical Package for the Social Sciences, and the resulting statistics are interpreted as a…

  9. Using Data Mining to Teach Applied Statistics and Correlation

    ERIC Educational Resources Information Center

    Hartnett, Jessica L.

    2016-01-01

    This article describes two class activities that introduce the concept of data mining and very basic data mining analyses. Assessment data suggest that students learned some of the conceptual basics of data mining, understood some of the ethical concerns related to the practice, and were able to perform correlations via the Statistical Package for…

  10. Effect of structural parameters on burning behavior of polyester fabrics having flame retardancy property

    NASA Astrophysics Data System (ADS)

    Çeven, E. K.; Günaydın, G. K.

    2017-10-01

    The aim of this study is filling the gap in the literature about investigating the effect of yarn and fabric structural parameters on burning behavior of polyester fabrics. According to the experimental design three different fabric types, three different weft densities and two different weave types were selected and a total of eighteen different polyester drapery fabrics were produced. All statistical procedures were conducted using the SPSS Statistical software package. The results of the Analysis of Variance (ANOVA) tests indicated that; there were statistically significant (5% significance level) differences between the mass loss ratios (%) in weft and mass loss ratios (%) in warp direction of different fabrics calculated after the flammability test. The Student-Newman-Keuls (SNK) results for mass loss ratios (%) both in weft and warp directions revealed that the mass loss ratios (%) of fabrics containing Trevira CS type polyester were lower than the mass loss ratios of polyester fabrics subjected to washing treatment and flame retardancy treatment.

  11. Additive hazards regression and partial likelihood estimation for ecological monitoring data across space.

    PubMed

    Lin, Feng-Chang; Zhu, Jun

    2012-01-01

    We develop continuous-time models for the analysis of environmental or ecological monitoring data such that subjects are observed at multiple monitoring time points across space. Of particular interest are additive hazards regression models where the baseline hazard function can take on flexible forms. We consider time-varying covariates and take into account spatial dependence via autoregression in space and time. We develop statistical inference for the regression coefficients via partial likelihood. Asymptotic properties, including consistency and asymptotic normality, are established for parameter estimates under suitable regularity conditions. Feasible algorithms utilizing existing statistical software packages are developed for computation. We also consider a simpler additive hazards model with homogeneous baseline hazard and develop hypothesis testing for homogeneity. A simulation study demonstrates that the statistical inference using partial likelihood has sound finite-sample properties and offers a viable alternative to maximum likelihood estimation. For illustration, we analyze data from an ecological study that monitors bark beetle colonization of red pines in a plantation of Wisconsin.

  12. Fundamentals of poly(lactic acid) microstructure, crystallization behavior, and properties

    NASA Astrophysics Data System (ADS)

    Kang, Shuhui

    Poly(lactic acid) is an environmentally-benign biodegradable and sustainable thermoplastic material, which has found broad applications as food packaging films and as non-woven fibers. The crystallization and deformation mechanisms of the polymer are largely determined by the distribution of conformation and configuration. Knowledge of these mechanisms is needed to understand the mechanical and thermal properties on which processing conditions mainly depend. In conjunction with laser light scattering, Raman spectroscopy and normal coordinate analysis are used in this thesis to elucidate these properties. Vibrational spectroscopic theory, Flory's rotational isomeric state (RIS) theory, Gaussian chain statistics and statistical mechanics are used to relate experimental data to molecular chain structure. A refined RIS model is proposed, chain rigidity recalculated and chain statistics discussed. A Raman spectroscopic characterization method for crystalline and amorphous phase orientation has been developed. A shrinkage model is also proposed to interpret the dimensional stability for fibers and uni- or biaxially stretched films. A study of stereocomplexation formed by poly(l-lactic acid) and poly(d-lactic acid) is also presented.

  13. LV software support for supersonic flow analysis

    NASA Technical Reports Server (NTRS)

    Bell, W. A.; Lepicovsky, J.

    1992-01-01

    The software for configuring an LV counter processor system has been developed using structured design. The LV system includes up to three counter processors and a rotary encoder. The software for configuring and testing the LV system has been developed, tested, and included in an overall software package for data acquisition, analysis, and reduction. Error handling routines respond to both operator and instrument errors which often arise in the course of measuring complex, high-speed flows. The use of networking capabilities greatly facilitates the software development process by allowing software development and testing from a remote site. In addition, high-speed transfers allow graphics files or commands to provide viewing of the data from a remote site. Further advances in data analysis require corresponding advances in procedures for statistical and time series analysis of nonuniformly sampled data.

  14. LV software support for supersonic flow analysis

    NASA Technical Reports Server (NTRS)

    Bell, William A.

    1992-01-01

    The software for configuring a Laser Velocimeter (LV) counter processor system was developed using structured design. The LV system includes up to three counter processors and a rotary encoder. The software for configuring and testing the LV system was developed, tested, and included in an overall software package for data acquisition, analysis, and reduction. Error handling routines respond to both operator and instrument errors which often arise in the course of measuring complex, high-speed flows. The use of networking capabilities greatly facilitates the software development process by allowing software development and testing from a remote site. In addition, high-speed transfers allow graphics files or commands to provide viewing of the data from a remote site. Further advances in data analysis require corresponding advances in procedures for statistical and time series analysis of nonuniformly sampled data.

  15. Retail colour stability of lamb meat is influenced by breed type, muscle, packaging and iron concentration.

    PubMed

    Warner, R D; Kearney, G; Hopkins, D L; Jacob, R H

    2017-07-01

    The longissmus lumborum (LL) and semimembranosus (SM) muscles from 391 lamb carcasses, derived from various breed types, were used to investigate the effect of animal/muscle factors, packaging type [over-wrap (OW) or high oxygen modified atmosphere packaging (MAP O2 )] and duration of display on redness of meat during simulated retail display. Using statistical models the time required (in days) for redness to reach a threshold value of 3.5 (below this is unacceptable) was predicted. High levels of iron in the SM, but not LL, reduced the time for redness to reach 3.5 by 2-2.6days in MAP O2 and 0.5-0.8days in OW. The greater the proportion of Merino breed type, the shorter was the time for redness to reach the value of 3.5, an effect consistent across muscles and packaging types. In summary, breed type, packaging format, muscle and muscle iron levels had a significant impact on colour stability of sheep meat in oxygen-available packaging systems. Copyright © 2017. Published by Elsevier Ltd.

  16. Does introducing an immunization package of services for migrant children improve the coverage, service quality and understanding? An evidence from an intervention study among 1548 migrant children in eastern China.

    PubMed

    Hu, Yu; Luo, Shuying; Tang, Xuewen; Lou, Linqiao; Chen, Yaping; Guo, Jing; Zhang, Bing

    2015-07-15

    An EPI (Expanded Program on Immunization) intervention package was implemented from October 2011 to May 2014 among migrant children in Yiwu, east China. This study aimed to evaluate its impacts on vaccination coverage, maternal understanding of EPI and the local immunization service performance. A pre- and post-test design was used. The EPI intervention package included: (1) extending the EPI service time and increasing the frequency of vaccination service; (2) training program for vaccinators; (3) developing a screening tool to identify vaccination demands among migrant clinic attendants; (4) Social mobilization for immunization. Data were obtained from random sampling investigations, vaccination service statistics and qualitative interviews with vaccinators and mothers of migrant children. The analysis of quantitative data was based on a "before and after" evaluation and qualitative data were analyzed using content analysis. The immunization registration (records kept by immunization clinics) rate increased from 87.4 to 91.9% (P = 0.016) after implementation of the EPI intervention package and the EPI card holding (EPI card kept by caregivers) rate increased from 90.9 to 95.6% (P = 0.003). The coverage of fully immunized increased from 71.5 to 88.6% for migrant children aged 1-4 years (P < 0.001) and increased from 42.2 to 80.5% for migrant children aged 2-4 years (P < 0.001). The correct response rates on valid doses and management of adverse events among vaccinators were over 90% after training. The correct response rates on immunization among mothers of migrant children were 86.8-99.3% after interventions. Our study showed a substantial improvement in vaccination coverage among migrant children in Yiwu after implementation of the EPI intervention package. Further studies are needed to evaluate the cost-effectiveness of the interventions, to identify individual interventions that make the biggest contribution to coverage, and to examine the sustainability of the interventions within the existing vaccination service delivery system in a larger scale settings or in a longer term.

  17. An application of principal component analysis to the clavicle and clavicle fixation devices.

    PubMed

    Daruwalla, Zubin J; Courtis, Patrick; Fitzpatrick, Clare; Fitzpatrick, David; Mullett, Hannan

    2010-03-26

    Principal component analysis (PCA) enables the building of statistical shape models of bones and joints. This has been used in conjunction with computer assisted surgery in the past. However, PCA of the clavicle has not been performed. Using PCA, we present a novel method that examines the major modes of size and three-dimensional shape variation in male and female clavicles and suggests a method of grouping the clavicle into size and shape categories. Twenty-one high-resolution computerized tomography scans of the clavicle were reconstructed and analyzed using a specifically developed statistical software package. After performing statistical shape analysis, PCA was applied to study the factors that account for anatomical variation. The first principal component representing size accounted for 70.5 percent of anatomical variation. The addition of a further three principal components accounted for almost 87 percent. Using statistical shape analysis, clavicles in males have a greater lateral depth and are longer, wider and thicker than in females. However, the sternal angle in females is larger than in males. PCA confirmed these differences between genders but also noted that men exhibit greater variance and classified clavicles into five morphological groups. This unique approach is the first that standardizes a clavicular orientation. It provides information that is useful to both, the biomedical engineer and clinician. Other applications include implant design with regard to modifying current or designing future clavicle fixation devices. Our findings support the need for further development of clavicle fixation devices and the questioning of whether gender-specific devices are necessary.

  18. DESIGN ANALYSIS FOR THE DEFENSE HIGH-LEVEL WASTE DISPOSAL CONTAINER

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    G. Radulesscu; J.S. Tang

    The purpose of ''Design Analysis for the Defense High-Level Waste Disposal Container'' analysis is to technically define the defense high-level waste (DHLW) disposal container/waste package using the Waste Package Department's (WPD) design methods, as documented in ''Waste Package Design Methodology Report'' (CRWMS M&O [Civilian Radioactive Waste Management System Management and Operating Contractor] 2000a). The DHLW disposal container is intended for disposal of commercial high-level waste (HLW) and DHLW (including immobilized plutonium waste forms), placed within disposable canisters. The U.S. Department of Energy (DOE)-managed spent nuclear fuel (SNF) in disposable canisters may also be placed in a DHLW disposal container alongmore » with HLW forms. The objective of this analysis is to demonstrate that the DHLW disposal container/waste package satisfies the project requirements, as embodied in Defense High Level Waste Disposal Container System Description Document (SDD) (CRWMS M&O 1999a), and additional criteria, as identified in Waste Package Design Sensitivity Report (CRWMS M&Q 2000b, Table 4). The analysis briefly describes the analytical methods appropriate for the design of the DHLW disposal contained waste package, and summarizes the results of the calculations that illustrate the analytical methods. However, the analysis is limited to the calculations selected for the DHLW disposal container in support of the Site Recommendation (SR) (CRWMS M&O 2000b, Section 7). The scope of this analysis is restricted to the design of the codisposal waste package of the Savannah River Site (SRS) DHLW glass canisters and the Training, Research, Isotopes General Atomics (TRIGA) SNF loaded in a short 18-in.-outer diameter (OD) DOE standardized SNF canister. This waste package is representative of the waste packages that consist of the DHLW disposal container, the DHLW/HLW glass canisters, and the DOE-managed SNF in disposable canisters. The intended use of this analysis is to support Site Recommendation reports and to assist in the development of WPD drawings. Activities described in this analysis were conducted in accordance with the Development Plan ''Design Analysis for the Defense High-Level Waste Disposal Container'' (CRWMS M&O 2000c) with no deviations from the plan.« less

  19. Statistical testing and power analysis for brain-wide association study.

    PubMed

    Gong, Weikang; Wan, Lin; Lu, Wenlian; Ma, Liang; Cheng, Fan; Cheng, Wei; Grünewald, Stefan; Feng, Jianfeng

    2018-04-05

    The identification of connexel-wise associations, which involves examining functional connectivities between pairwise voxels across the whole brain, is both statistically and computationally challenging. Although such a connexel-wise methodology has recently been adopted by brain-wide association studies (BWAS) to identify connectivity changes in several mental disorders, such as schizophrenia, autism and depression, the multiple correction and power analysis methods designed specifically for connexel-wise analysis are still lacking. Therefore, we herein report the development of a rigorous statistical framework for connexel-wise significance testing based on the Gaussian random field theory. It includes controlling the family-wise error rate (FWER) of multiple hypothesis testings using topological inference methods, and calculating power and sample size for a connexel-wise study. Our theoretical framework can control the false-positive rate accurately, as validated empirically using two resting-state fMRI datasets. Compared with Bonferroni correction and false discovery rate (FDR), it can reduce false-positive rate and increase statistical power by appropriately utilizing the spatial information of fMRI data. Importantly, our method bypasses the need of non-parametric permutation to correct for multiple comparison, thus, it can efficiently tackle large datasets with high resolution fMRI images. The utility of our method is shown in a case-control study. Our approach can identify altered functional connectivities in a major depression disorder dataset, whereas existing methods fail. A software package is available at https://github.com/weikanggong/BWAS. Copyright © 2018 Elsevier B.V. All rights reserved.

  20. scoringRules - A software package for probabilistic model evaluation

    NASA Astrophysics Data System (ADS)

    Lerch, Sebastian; Jordan, Alexander; Krüger, Fabian

    2016-04-01

    Models in the geosciences are generally surrounded by uncertainty, and being able to quantify this uncertainty is key to good decision making. Accordingly, probabilistic forecasts in the form of predictive distributions have become popular over the last decades. With the proliferation of probabilistic models arises the need for decision theoretically principled tools to evaluate the appropriateness of models and forecasts in a generalized way. Various scoring rules have been developed over the past decades to address this demand. Proper scoring rules are functions S(F,y) which evaluate the accuracy of a forecast distribution F , given that an outcome y was observed. As such, they allow to compare alternative models, a crucial ability given the variety of theories, data sources and statistical specifications that is available in many situations. This poster presents the software package scoringRules for the statistical programming language R, which contains functions to compute popular scoring rules such as the continuous ranked probability score for a variety of distributions F that come up in applied work. Two main classes are parametric distributions like normal, t, or gamma distributions, and distributions that are not known analytically, but are indirectly described through a sample of simulation draws. For example, Bayesian forecasts produced via Markov Chain Monte Carlo take this form. Thereby, the scoringRules package provides a framework for generalized model evaluation that both includes Bayesian as well as classical parametric models. The scoringRules package aims to be a convenient dictionary-like reference for computing scoring rules. We offer state of the art implementations of several known (but not routinely applied) formulas, and implement closed-form expressions that were previously unavailable. Whenever more than one implementation variant exists, we offer statistically principled default choices.

  1. Evaluation of probabilistic forecasts with the scoringRules package

    NASA Astrophysics Data System (ADS)

    Jordan, Alexander; Krüger, Fabian; Lerch, Sebastian

    2017-04-01

    Over the last decades probabilistic forecasts in the form of predictive distributions have become popular in many scientific disciplines. With the proliferation of probabilistic models arises the need for decision-theoretically principled tools to evaluate the appropriateness of models and forecasts in a generalized way in order to better understand sources of prediction errors and to improve the models. Proper scoring rules are functions S(F,y) which evaluate the accuracy of a forecast distribution F , given that an outcome y was observed. In coherence with decision-theoretical principles they allow to compare alternative models, a crucial ability given the variety of theories, data sources and statistical specifications that is available in many situations. This contribution presents the software package scoringRules for the statistical programming language R, which provides functions to compute popular scoring rules such as the continuous ranked probability score for a variety of distributions F that come up in applied work. For univariate variables, two main classes are parametric distributions like normal, t, or gamma distributions, and distributions that are not known analytically, but are indirectly described through a sample of simulation draws. For example, ensemble weather forecasts take this form. The scoringRules package aims to be a convenient dictionary-like reference for computing scoring rules. We offer state of the art implementations of several known (but not routinely applied) formulas, and implement closed-form expressions that were previously unavailable. Whenever more than one implementation variant exists, we offer statistically principled default choices. Recent developments include the addition of scoring rules to evaluate multivariate forecast distributions. The use of the scoringRules package is illustrated in an example on post-processing ensemble forecasts of temperature.

  2. SVAw - a web-based application tool for automated surrogate variable analysis of gene expression studies

    PubMed Central

    2013-01-01

    Background Surrogate variable analysis (SVA) is a powerful method to identify, estimate, and utilize the components of gene expression heterogeneity due to unknown and/or unmeasured technical, genetic, environmental, or demographic factors. These sources of heterogeneity are common in gene expression studies, and failing to incorporate them into the analysis can obscure results. Using SVA increases the biological accuracy and reproducibility of gene expression studies by identifying these sources of heterogeneity and correctly accounting for them in the analysis. Results Here we have developed a web application called SVAw (Surrogate variable analysis Web app) that provides a user friendly interface for SVA analyses of genome-wide expression studies. The software has been developed based on open source bioconductor SVA package. In our software, we have extended the SVA program functionality in three aspects: (i) the SVAw performs a fully automated and user friendly analysis workflow; (ii) It calculates probe/gene Statistics for both pre and post SVA analysis and provides a table of results for the regression of gene expression on the primary variable of interest before and after correcting for surrogate variables; and (iii) it generates a comprehensive report file, including graphical comparison of the outcome for the user. Conclusions SVAw is a web server freely accessible solution for the surrogate variant analysis of high-throughput datasets and facilitates removing all unwanted and unknown sources of variation. It is freely available for use at http://psychiatry.igm.jhmi.edu/sva. The executable packages for both web and standalone application and the instruction for installation can be downloaded from our web site. PMID:23497726

  3. The expression of full length Gp91-phox protein is associated with reduced amphotropic retroviral production.

    PubMed

    Bellantuono, I; Lashford, L S; Rafferty, J A; Fairbairn, L J

    2000-05-01

    As a single gene defect in mature bone marrow cells, chronic granulomatous disease (X-CGD) represents a disorder which may be amenable to gene therapy by the transfer of the missing subunit into hemopoietic stem cells. In the majority of cases lack of Gp91-phox causes the disease. So far, studies involving transfer of Gp91-phox cDNA, including a phase I clinical trial, have yielded disappointing results. Most often, low titers of virus have been reported. In the present study we investigated the possible reasons for low titer amphotropic viral production. To investigate the effect of Gp91 cDNA on the efficiency of retroviral production from the packaging cell line, GP+envAm12, we constructed vectors containing either the native cDNA, truncated versions of the cDNA or a mutated form (LATG) in which the natural translational start codon was changed to a stop codon. Following derivation of clonal packaging cell lines, these were assessed for viral titer by RNA slot blot and analyzed by non-parametrical statistical analysis (Whitney-Mann U-test). An improvement in viral titer of just over two-fold was found in packaging cells containing the start-codon mutant of Gp91 and no evidence of truncated viral RNA was seen in these cells. Further analysis revealed the presence of rearranged forms of the provirus in Gp91-expressing cells, and the production of truncated, unpackaged viral RNA. Protein analysis revealed that LATG-transduced cells did not express full-length Gp91-phox, whereas those containing the wild-type cDNA did. However, a truncated protein was seen in ATG-transduced cells which was also present in wild type cells. No evidence for the presence of a negative transcriptional regulatory element was found from studies with the deletion mutants. A statistically significant effect of protein production on the production of virus from Gp91-expressing cells was found. Our data point to a need to restrict expression of the Gp91-phox protein and its derivatives in order to enhance retroviral production and suggest that improvements in current vectors for CGD gene therapy may need to include controlled, directed expression only in mature neutrophils.

  4. Evaluation of sequence alignments and oligonucleotide probes with respect to three-dimensional structure of ribosomal RNA using ARB software package

    PubMed Central

    Kumar, Yadhu; Westram, Ralf; Kipfer, Peter; Meier, Harald; Ludwig, Wolfgang

    2006-01-01

    Background Availability of high-resolution RNA crystal structures for the 30S and 50S ribosomal subunits and the subsequent validation of comparative secondary structure models have prompted the biologists to use three-dimensional structure of ribosomal RNA (rRNA) for evaluating sequence alignments of rRNA genes. Furthermore, the secondary and tertiary structural features of rRNA are highly useful and successfully employed in designing rRNA targeted oligonucleotide probes intended for in situ hybridization experiments. RNA3D, a program to combine sequence alignment information with three-dimensional structure of rRNA was developed. Integration into ARB software package, which is used extensively by the scientific community for phylogenetic analysis and molecular probe designing, has substantially extended the functionality of ARB software suite with 3D environment. Results Three-dimensional structure of rRNA is visualized in OpenGL 3D environment with the abilities to change the display and overlay information onto the molecule, dynamically. Phylogenetic information derived from the multiple sequence alignments can be overlaid onto the molecule structure in a real time. Superimposition of both statistical and non-statistical sequence associated information onto the rRNA 3D structure can be done using customizable color scheme, which is also applied to a textual sequence alignment for reference. Oligonucleotide probes designed by ARB probe design tools can be mapped onto the 3D structure along with the probe accessibility models for evaluation with respect to secondary and tertiary structural conformations of rRNA. Conclusion Visualization of three-dimensional structure of rRNA in an intuitive display provides the biologists with the greater possibilities to carry out structure based phylogenetic analysis. Coupled with secondary structure models of rRNA, RNA3D program aids in validating the sequence alignments of rRNA genes and evaluating probe target sites. Superimposition of the information derived from the multiple sequence alignment onto the molecule dynamically allows the researchers to observe any sequence inherited characteristics (phylogenetic information) in real-time environment. The extended ARB software package is made freely available for the scientific community via . PMID:16672074

  5. The use of open source bioinformatics tools to dissect transcriptomic data.

    PubMed

    Nitsche, Benjamin M; Ram, Arthur F J; Meyer, Vera

    2012-01-01

    Microarrays are a valuable technology to study fungal physiology on a transcriptomic level. Various microarray platforms are available comprising both single and two channel arrays. Despite different technologies, preprocessing of microarray data generally includes quality control, background correction, normalization, and summarization of probe level data. Subsequently, depending on the experimental design, diverse statistical analysis can be performed, including the identification of differentially expressed genes and the construction of gene coexpression networks.We describe how Bioconductor, a collection of open source and open development packages for the statistical programming language R, can be used for dissecting microarray data. We provide fundamental details that facilitate the process of getting started with R and Bioconductor. Using two publicly available microarray datasets from Aspergillus niger, we give detailed protocols on how to identify differentially expressed genes and how to construct gene coexpression networks.

  6. Building the Community Online Resource for Statistical Seismicity Analysis (CORSSA)

    NASA Astrophysics Data System (ADS)

    Michael, A. J.; Wiemer, S.; Zechar, J. D.; Hardebeck, J. L.; Naylor, M.; Zhuang, J.; Steacy, S.; Corssa Executive Committee

    2010-12-01

    Statistical seismology is critical to the understanding of seismicity, the testing of proposed earthquake prediction and forecasting methods, and the assessment of seismic hazard. Unfortunately, despite its importance to seismology - especially to those aspects with great impact on public policy - statistical seismology is mostly ignored in the education of seismologists, and there is no central repository for the existing open-source software tools. To remedy these deficiencies, and with the broader goal to enhance the quality of statistical seismology research, we have begun building the Community Online Resource for Statistical Seismicity Analysis (CORSSA). CORSSA is a web-based educational platform that is authoritative, up-to-date, prominent, and user-friendly. We anticipate that the users of CORSSA will range from beginning graduate students to experienced researchers. More than 20 scientists from around the world met for a week in Zurich in May 2010 to kick-start the creation of CORSSA: the format and initial table of contents were defined; a governing structure was organized; and workshop participants began drafting articles. CORSSA materials are organized with respect to six themes, each containing between four and eight articles. The CORSSA web page, www.corssa.org, officially unveiled on September 6, 2010, debuts with an initial set of approximately 10 to 15 articles available online for viewing and commenting with additional articles to be added over the coming months. Each article will be peer-reviewed and will present a balanced discussion, including illustrative examples and code snippets. Topics in the initial set of articles will include: introductions to both CORSSA and statistical seismology, basic statistical tests and their role in seismology; understanding seismicity catalogs and their problems; basic techniques for modeling seismicity; and methods for testing earthquake predictability hypotheses. A special article will compare and review available statistical seismology software packages.

  7. Environmental assessment of packaging: Sense and sensibility

    NASA Astrophysics Data System (ADS)

    Kooijman, Jan M.

    1993-09-01

    The functions of packaging are derived from product requirements, thus for insight into the environmental effects of packaging the actual combination of product and package has to be evaluated along the production and distribution system. This extension to all related environmental aspects adds realism to the environmental analysis and provides guidance for design while preventing a too detailed investigation of parts of the production system. This approach is contrary to current environmental studies where packaging is always treated as an independent object, neglecting the more important environmental effects of the product that are influenced by packaging. The general analysis and quantification stages for this approach are described, and the currently available methods for the assessment of environmental effects are reviewed. To limit the workload involved in an environmental assessment, a step-by-step analysis and the use of feedback is recommended. First the dominant environmental effects of a particular product and its production and distribution are estimated. Then, on the basis of these preliminary results, the appropriate system boundaries are chosen and the need for further or more detailed environmental analysis is determined. For typical food and drink applications, the effect of different system boundaries on the outcome of environmental assessments and the advantage of the step-by-step analysis of the food supply system is shown. It appears that, depending on the consumer group, different advice for reduction of environmental effects has to be given. Furthermore, because of interrelated environmental effects of the food supply system, the continuing quest for more detailed and accurate analysis of the package components is not necessary for improved management of the environmental effects of packaging.

  8. ACCOUNTING FOR CALIBRATION UNCERTAINTIES IN X-RAY ANALYSIS: EFFECTIVE AREAS IN SPECTRAL FITTING

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lee, Hyunsook; Kashyap, Vinay L.; Drake, Jeremy J.

    2011-04-20

    While considerable advance has been made to account for statistical uncertainties in astronomical analyses, systematic instrumental uncertainties have been generally ignored. This can be crucial to a proper interpretation of analysis results because instrumental calibration uncertainty is a form of systematic uncertainty. Ignoring it can underestimate error bars and introduce bias into the fitted values of model parameters. Accounting for such uncertainties currently requires extensive case-specific simulations if using existing analysis packages. Here, we present general statistical methods that incorporate calibration uncertainties into spectral analysis of high-energy data. We first present a method based on multiple imputation that can bemore » applied with any fitting method, but is necessarily approximate. We then describe a more exact Bayesian approach that works in conjunction with a Markov chain Monte Carlo based fitting. We explore methods for improving computational efficiency, and in particular detail a method of summarizing calibration uncertainties with a principal component analysis of samples of plausible calibration files. This method is implemented using recently codified Chandra effective area uncertainties for low-resolution spectral analysis and is verified using both simulated and actual Chandra data. Our procedure for incorporating effective area uncertainty is easily generalized to other types of calibration uncertainties.« less

  9. [Propensity score matching in SPSS].

    PubMed

    Huang, Fuqiang; DU, Chunlin; Sun, Menghui; Ning, Bing; Luo, Ying; An, Shengli

    2015-11-01

    To realize propensity score matching in PS Matching module of SPSS and interpret the analysis results. The R software and plug-in that could link with the corresponding versions of SPSS and propensity score matching package were installed. A PS matching module was added in the SPSS interface, and its use was demonstrated with test data. Score estimation and nearest neighbor matching was achieved with the PS matching module, and the results of qualitative and quantitative statistical description and evaluation were presented in the form of a graph matching. Propensity score matching can be accomplished conveniently using SPSS software.

  10. Supracondylar fracture in children. Rehabilitation in occupational therapy. Yes or no?

    NASA Astrophysics Data System (ADS)

    Costa, Maria J.; Pires, Mafalda; Neves, Cassiano; Tavares, Delfin; Quintas, Alexandra M.; Ferreira, Ana I.; Espirito Santo, M. J.; Castro, Alexandra; Cabral, M. Salomé; João Gomes, J. F.

    2013-10-01

    The aim of this study was to evaluate the recovery time of elbow range of motion after treatment of Gartland's type II and III supracondylar fractures of distal humerus in children who attended a program of occupational therapy (OT). A randomized control design (RCD) was conducted to compare the two groups (OT group and Control group) and several statistical methodologies have been used to compare them. In all the cases the results point out to a faster recover in the OT group. All the analysis were performed using the package R version 3.0.1.

  11. Tolerancing aspheres based on manufacturing knowledge

    NASA Astrophysics Data System (ADS)

    Wickenhagen, S.; Kokot, S.; Fuchs, U.

    2017-10-01

    A standard way of tolerancing optical elements or systems is to perform a Monte Carlo based analysis within a common optical design software package. Although, different weightings and distributions are assumed they are all counting on statistics, which usually means several hundreds or thousands of systems for reliable results. Thus, employing these methods for small batch sizes is unreliable, especially when aspheric surfaces are involved. The huge database of asphericon was used to investigate the correlation between the given tolerance values and measured data sets. The resulting probability distributions of these measured data were analyzed aiming for a robust optical tolerancing process.

  12. Toolkit for testing scientific CCD cameras

    NASA Astrophysics Data System (ADS)

    Uzycki, Janusz; Mankiewicz, Lech; Molak, Marcin; Wrochna, Grzegorz

    2006-03-01

    The CCD Toolkit (1) is a software tool for testing CCD cameras which allows to measure important characteristics of a camera like readout noise, total gain, dark current, 'hot' pixels, useful area, etc. The application makes a statistical analysis of images saved in files with FITS format, commonly used in astronomy. A graphical interface is based on the ROOT package, which offers high functionality and flexibility. The program was developed in a way to ensure future compatibility with different operating systems: Windows and Linux. The CCD Toolkit was created for the "Pie of the Sky" project collaboration (2).

  13. The R package 'icosa' for coarse resolution global triangular and penta-hexagonal gridding

    NASA Astrophysics Data System (ADS)

    Kocsis, Adam T.

    2017-04-01

    With the development of the internet and the computational power of personal computers, open source programming environments have become indispensable for science in the past decade. This includes the increase of the GIS capacity of the free R environment, which was originally developed for statistical analyses. The flexibility of R made it a preferred programming tool in a multitude of disciplines from the area of the biological and geological sciences. Many of these subdisciplines operate with incidence (occurrence) data that are in a large number of cases to be grained before further analyses can be conducted. This graining is executed mostly by gridding data to cells of a Gaussian grid of various resolutions to increase the density of data in a single unit of the analyses. This method has obvious shortcomings despite the ease of its application: well-known systematic biases are induced to cell sizes and shapes that can interfere with the results of statistical procedures, especially if the number of incidence points influences the metrics in question. The 'icosa' package employs a common method to overcome this obstacle by implementing grids with roughly equal cell sizes and shapes that are based on tessellated icosahedra. These grid objects are essentially polyhedra with xyz Cartesian vertex data that are linked to tables of faces and edges. At its current developmental stage, the package uses a single method of tessellation which balances grid cell size and shape distortions, but its structure allows the implementation of various other types of tessellation algorithms. The resolution of the grids can be set by the number of breakpoints inserted into a segment forming an edge of the original icosahedron. Both the triangular and their inverted penta-hexagonal grids are available for creation with the package. The package also incorporates functions to look up coordinates in the grid very effectively and data containers to link data to the grid structure. The classes defined in the package are communicating with classes of the 'sp' and 'raster' packages and functions are supplied that allow resolution change and type conversions. Three-dimensional rendering is made available with the 'rgl' package and two-dimensional projections can be calculated using 'sp' and 'rgdal'. The package was developed as part of a project funded by the Deutsche Forschungsgemeinschaft (KO - 5382/1-1).

  14. Safety analysis report -- Packages LP-50 tritium package (Packaging of fissile and other radioactive materials)

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Gates, A.A.; McCarthy, P.G.; Edl, J.W.

    1975-05-01

    Elemental tritium is shipped at low pressure in a stainless steel container (LP-50) surrounded by an aluminum vessel and Celotex insulation at least 4 in. thick in a steel drum. Each package contains a large quantity (greater than a Type A quantity) of nonfissile material, as defined in AECM 0529. This report provides the details of the safety analysis performed for this type container.

  15. FIT: statistical modeling tool for transcriptome dynamics under fluctuating field conditions

    PubMed Central

    Iwayama, Koji; Aisaka, Yuri; Kutsuna, Natsumaro

    2017-01-01

    Abstract Motivation: Considerable attention has been given to the quantification of environmental effects on organisms. In natural conditions, environmental factors are continuously changing in a complex manner. To reveal the effects of such environmental variations on organisms, transcriptome data in field environments have been collected and analyzed. Nagano et al. proposed a model that describes the relationship between transcriptomic variation and environmental conditions and demonstrated the capability to predict transcriptome variation in rice plants. However, the computational cost of parameter optimization has prevented its wide application. Results: We propose a new statistical model and efficient parameter optimization based on the previous study. We developed and released FIT, an R package that offers functions for parameter optimization and transcriptome prediction. The proposed method achieves comparable or better prediction performance within a shorter computational time than the previous method. The package will facilitate the study of the environmental effects on transcriptomic variation in field conditions. Availability and Implementation: Freely available from CRAN (https://cran.r-project.org/web/packages/FIT/). Contact: anagano@agr.ryukoku.ac.jp Supplementary information: Supplementary data are available at Bioinformatics online PMID:28158396

  16. A reduction package for cross-dispersed echelle spectrograph data in IDL

    NASA Astrophysics Data System (ADS)

    Hall, Jeffrey C.; Neff, James E.

    1992-12-01

    We have written in IDL a data reduction package that performs reduction and extraction of cross-dispersed echelle spectrograph data. The present package includes a complete set of tools for extracting data from any number of spectral orders with arbitrary tilt and curvature. Essential elements include debiasing and flatfielding of the raw CCD image, removal of scattered light background, either nonoptimal or optimal extraction of data, and wavelength calibration and continuum normalization of the extracted orders. A growing set of support routines permits examination of the frame being processed to provide continuing checks on the statistical properties of the data and on the accuracy of the extraction. We will display some sample reductions and discuss the algorithms used. The inherent simplicity and user-friendliness of the IDL interface make this package a useful tool for spectroscopists. We will provide an email distribution list for those interested in receiving the package, and further documentation will be distributed at the meeting.

  17. [Relationship between the frequency of work-related stress and prevalence of functional dyspepsia in Lima Geriatric Army Hospital].

    PubMed

    Valenzuela Narváez, Daniel Raúl; Gayoso Cervantes, Milagros

    2017-01-01

    To determine the relationship between the frequency of work-related stress and prevalence of functional dyspepsia in a sample of 218 military older 50 years in 2010 in Lima Military Hospital Geriatric. Descriptive and explanatory study and for the data collection on stress, were used the Vital Events Scale Holmes-Rahe and clinical records for clinical and upper endoscopy registration that comply the criteria of Rome III for functional dyspepsia. For processing and data analysis the statistical software package SPSS (Statistical Package for Social Sciences) was used. 100% of military showed some level of work stress during the study year; thus, 36.7% had a high level, 31.2% medium or moderate level, and 32.1% had low stress level; these percentages medium and high stress levels accounted for 67.9%. These results establish that job stress is a common discomfort in the study population (tabulated Chi2 = 3.841, chi2 observed = 27,908). Regarding functional dyspepsia prevalence of 37.2%, which indicates that it is a common condition in those military (tabular Z = 1.96, Z c = 9.163) it was determined. There is a significant relationship between the frequency of work-related stress and prevalence of functional dyspepsia in military activity in older than 50 years (tabulated Chi2 = 5.991, chi2 observed = 28,878, contingency coefficient = 0.342).

  18. "Suicide shall cease to be a crime": suicide and undetermined death trends 1970-2000 before and after the decriminalization of suicide in Ireland 1993.

    PubMed

    Osman, Mugtaba; Parnell, Andrew C; Haley, Clifford

    2017-02-01

    Suicide is criminalized in more than 100 countries around the world. A dearth of research exists into the effect of suicide legislation on suicide rates and available statistics are mixed. This study investigates 10,353 suicide deaths in Ireland that took place between 1970 and 2000. Irish 1970-2000 annual suicide data were obtained from the Central Statistics Office and modelled via a negative binomial regression approach. We examined the effect of suicide legislation on different age groups and on both sexes. We used Bonferroni correction for multiple modelling. Statistical analysis was performed using the R statistical package version 3.1.2. The coefficient for the effect of suicide act on overall suicide deaths was -9.094 (95 % confidence interval (CI) -34.086 to 15.899), statistically non-significant (p = 0.476). The coefficient for the effect suicide act on undetermined deaths was statistically significant (p < 0.001) and was estimated to be -644.4 (95 % CI -818.6 to -469.9). The results of our study indicate that legalization of suicide is not associated with a significant increase in subsequent suicide deaths. However, undetermined death verdict rates have significantly dropped following legalization of suicide.

  19. DOE Office of Scientific and Technical Information (OSTI.GOV)

    P-Mart was designed specifically to allow cancer researchers to perform robust statistical processing of publicly available cancer proteomic datasets. To date an online statistical processing suite for proteomics does not exist. The P-Mart software is designed to allow statistical programmers to utilize these algorithms through packages in the R programming language as well as offering a web-based interface using the Azure cloud technology. The Azure cloud technology also allows the release of the software via Docker containers.

  20. SPA- STATISTICAL PACKAGE FOR TIME AND FREQUENCY DOMAIN ANALYSIS

    NASA Technical Reports Server (NTRS)

    Brownlow, J. D.

    1994-01-01

    The need for statistical analysis often arises when data is in the form of a time series. This type of data is usually a collection of numerical observations made at specified time intervals. Two kinds of analysis may be performed on the data. First, the time series may be treated as a set of independent observations using a time domain analysis to derive the usual statistical properties including the mean, variance, and distribution form. Secondly, the order and time intervals of the observations may be used in a frequency domain analysis to examine the time series for periodicities. In almost all practical applications, the collected data is actually a mixture of the desired signal and a noise signal which is collected over a finite time period with a finite precision. Therefore, any statistical calculations and analyses are actually estimates. The Spectrum Analysis (SPA) program was developed to perform a wide range of statistical estimation functions. SPA can provide the data analyst with a rigorous tool for performing time and frequency domain studies. In a time domain statistical analysis the SPA program will compute the mean variance, standard deviation, mean square, and root mean square. It also lists the data maximum, data minimum, and the number of observations included in the sample. In addition, a histogram of the time domain data is generated, a normal curve is fit to the histogram, and a goodness-of-fit test is performed. These time domain calculations may be performed on both raw and filtered data. For a frequency domain statistical analysis the SPA program computes the power spectrum, cross spectrum, coherence, phase angle, amplitude ratio, and transfer function. The estimates of the frequency domain parameters may be smoothed with the use of Hann-Tukey, Hamming, Barlett, or moving average windows. Various digital filters are available to isolate data frequency components. Frequency components with periods longer than the data collection interval are removed by least-squares detrending. As many as ten channels of data may be analyzed at one time. Both tabular and plotted output may be generated by the SPA program. This program is written in FORTRAN IV and has been implemented on a CDC 6000 series computer with a central memory requirement of approximately 142K (octal) of 60 bit words. This core requirement can be reduced by segmentation of the program. The SPA program was developed in 1978.

Top