SimHap GUI: an intuitive graphical user interface for genetic association analysis.
Carter, Kim W; McCaskie, Pamela A; Palmer, Lyle J
2008-12-25
Researchers wishing to conduct genetic association analysis involving single nucleotide polymorphisms (SNPs) or haplotypes are often confronted with the lack of user-friendly graphical analysis tools, requiring sophisticated statistical and informatics expertise to perform relatively straightforward tasks. Tools, such as the SimHap package for the R statistics language, provide the necessary statistical operations to conduct sophisticated genetic analysis, but lacks a graphical user interface that allows anyone but a professional statistician to effectively utilise the tool. We have developed SimHap GUI, a cross-platform integrated graphical analysis tool for conducting epidemiological, single SNP and haplotype-based association analysis. SimHap GUI features a novel workflow interface that guides the user through each logical step of the analysis process, making it accessible to both novice and advanced users. This tool provides a seamless interface to the SimHap R package, while providing enhanced functionality such as sophisticated data checking, automated data conversion, and real-time estimations of haplotype simulation progress. SimHap GUI provides a novel, easy-to-use, cross-platform solution for conducting a range of genetic and non-genetic association analyses. This provides a free alternative to commercial statistics packages that is specifically designed for genetic association analysis.
SimHap GUI: An intuitive graphical user interface for genetic association analysis
Carter, Kim W; McCaskie, Pamela A; Palmer, Lyle J
2008-01-01
Background Researchers wishing to conduct genetic association analysis involving single nucleotide polymorphisms (SNPs) or haplotypes are often confronted with the lack of user-friendly graphical analysis tools, requiring sophisticated statistical and informatics expertise to perform relatively straightforward tasks. Tools, such as the SimHap package for the R statistics language, provide the necessary statistical operations to conduct sophisticated genetic analysis, but lacks a graphical user interface that allows anyone but a professional statistician to effectively utilise the tool. Results We have developed SimHap GUI, a cross-platform integrated graphical analysis tool for conducting epidemiological, single SNP and haplotype-based association analysis. SimHap GUI features a novel workflow interface that guides the user through each logical step of the analysis process, making it accessible to both novice and advanced users. This tool provides a seamless interface to the SimHap R package, while providing enhanced functionality such as sophisticated data checking, automated data conversion, and real-time estimations of haplotype simulation progress. Conclusion SimHap GUI provides a novel, easy-to-use, cross-platform solution for conducting a range of genetic and non-genetic association analyses. This provides a free alternative to commercial statistics packages that is specifically designed for genetic association analysis. PMID:19109877
A Data Warehouse Architecture for DoD Healthcare Performance Measurements.
1999-09-01
design, develop, implement, and apply statistical analysis and data mining tools to a Data Warehouse of healthcare metrics. With the DoD healthcare...framework, this thesis defines a methodology to design, develop, implement, and apply statistical analysis and data mining tools to a Data Warehouse...21 F. INABILITY TO CONDUCT HELATHCARE ANALYSIS
Chang, Cheng; Xu, Kaikun; Guo, Chaoping; Wang, Jinxia; Yan, Qi; Zhang, Jian; He, Fuchu; Zhu, Yunping
2018-05-22
Compared with the numerous software tools developed for identification and quantification of -omics data, there remains a lack of suitable tools for both downstream analysis and data visualization. To help researchers better understand the biological meanings in their -omics data, we present an easy-to-use tool, named PANDA-view, for both statistical analysis and visualization of quantitative proteomics data and other -omics data. PANDA-view contains various kinds of analysis methods such as normalization, missing value imputation, statistical tests, clustering and principal component analysis, as well as the most commonly-used data visualization methods including an interactive volcano plot. Additionally, it provides user-friendly interfaces for protein-peptide-spectrum representation of the quantitative proteomics data. PANDA-view is freely available at https://sourceforge.net/projects/panda-view/. 1987ccpacer@163.com and zhuyunping@gmail.com. Supplementary data are available at Bioinformatics online.
The Shock and Vibration Digest. Volume 15, Number 7
1983-07-01
systems noise -- for tant analytical tool, the statistical energy analysis example, from a specific metal, chain driven, con- method, has been the subject...34Experimental Determination of Vibration Parameters Re- ~~~quired in the Statistical Energy Analysis Meth- .,i. 31. Dubowsky, S. and Morris, T.L., "An...34Coupling Loss Factors for 55. Upton, R., "Sound Intensity -. A Powerful New Statistical Energy Analysis of Sound Trans- Measurement Tool," S/V, Sound
On the blind use of statistical tools in the analysis of globular cluster stars
NASA Astrophysics Data System (ADS)
D'Antona, Francesca; Caloi, Vittoria; Tailo, Marco
2018-04-01
As with most data analysis methods, the Bayesian method must be handled with care. We show that its application to determine stellar evolution parameters within globular clusters can lead to paradoxical results if used without the necessary precautions. This is a cautionary tale on the use of statistical tools for big data analysis.
The Statistical Consulting Center for Astronomy (SCCA)
NASA Technical Reports Server (NTRS)
Akritas, Michael
2001-01-01
The process by which raw astronomical data acquisition is transformed into scientifically meaningful results and interpretation typically involves many statistical steps. Traditional astronomy limits itself to a narrow range of old and familiar statistical methods: means and standard deviations; least-squares methods like chi(sup 2) minimization; and simple nonparametric procedures such as the Kolmogorov-Smirnov tests. These tools are often inadequate for the complex problems and datasets under investigations, and recent years have witnessed an increased usage of maximum-likelihood, survival analysis, multivariate analysis, wavelet and advanced time-series methods. The Statistical Consulting Center for Astronomy (SCCA) assisted astronomers with the use of sophisticated tools, and to match these tools with specific problems. The SCCA operated with two professors of statistics and a professor of astronomy working together. Questions were received by e-mail, and were discussed in detail with the questioner. Summaries of those questions and answers leading to new approaches were posted on the Web (www.state.psu.edu/ mga/SCCA). In addition to serving individual astronomers, the SCCA established a Web site for general use that provides hypertext links to selected on-line public-domain statistical software and services. The StatCodes site (www.astro.psu.edu/statcodes) provides over 200 links in the areas of: Bayesian statistics; censored and truncated data; correlation and regression, density estimation and smoothing, general statistics packages and information; image analysis; interactive Web tools; multivariate analysis; multivariate clustering and classification; nonparametric analysis; software written by astronomers; spatial statistics; statistical distributions; time series analysis; and visualization tools. StatCodes has received a remarkable high and constant hit rate of 250 hits/week (over 10,000/year) since its inception in mid-1997. It is of interest to scientists both within and outside of astronomy. The most popular sections are multivariate techniques, image analysis, and time series analysis. Hundreds of copies of the ASURV, SLOPES and CENS-TAU codes developed by SCCA scientists were also downloaded from the StatCodes site. In addition to formal SCCA duties, SCCA scientists continued a variety of related activities in astrostatistics, including refereeing of statistically oriented papers submitted to the Astrophysical Journal, talks in meetings including Feigelson's talk to science journalists entitled "The reemergence of astrostatistics" at the American Association for the Advancement of Science meeting, and published papers of astrostatistical content.
Bamidis, P D; Lithari, C; Konstantinidis, S T
2010-01-01
With the number of scientific papers published in journals, conference proceedings, and international literature ever increasing, authors and reviewers are not only facilitated with an abundance of information, but unfortunately continuously confronted with risks associated with the erroneous copy of another's material. In parallel, Information Communication Technology (ICT) tools provide to researchers novel and continuously more effective ways to analyze and present their work. Software tools regarding statistical analysis offer scientists the chance to validate their work and enhance the quality of published papers. Moreover, from the reviewers and the editor's perspective, it is now possible to ensure the (text-content) originality of a scientific article with automated software tools for plagiarism detection. In this paper, we provide a step-bystep demonstration of two categories of tools, namely, statistical analysis and plagiarism detection. The aim is not to come up with a specific tool recommendation, but rather to provide useful guidelines on the proper use and efficiency of either category of tools. In the context of this special issue, this paper offers a useful tutorial to specific problems concerned with scientific writing and review discourse. A specific neuroscience experimental case example is utilized to illustrate the young researcher's statistical analysis burden, while a test scenario is purpose-built using open access journal articles to exemplify the use and comparative outputs of seven plagiarism detection software pieces. PMID:21487489
Bamidis, P D; Lithari, C; Konstantinidis, S T
2010-12-01
With the number of scientific papers published in journals, conference proceedings, and international literature ever increasing, authors and reviewers are not only facilitated with an abundance of information, but unfortunately continuously confronted with risks associated with the erroneous copy of another's material. In parallel, Information Communication Technology (ICT) tools provide to researchers novel and continuously more effective ways to analyze and present their work. Software tools regarding statistical analysis offer scientists the chance to validate their work and enhance the quality of published papers. Moreover, from the reviewers and the editor's perspective, it is now possible to ensure the (text-content) originality of a scientific article with automated software tools for plagiarism detection. In this paper, we provide a step-bystep demonstration of two categories of tools, namely, statistical analysis and plagiarism detection. The aim is not to come up with a specific tool recommendation, but rather to provide useful guidelines on the proper use and efficiency of either category of tools. In the context of this special issue, this paper offers a useful tutorial to specific problems concerned with scientific writing and review discourse. A specific neuroscience experimental case example is utilized to illustrate the young researcher's statistical analysis burden, while a test scenario is purpose-built using open access journal articles to exemplify the use and comparative outputs of seven plagiarism detection software pieces.
Han, Seong Kyu; Lee, Dongyeop; Lee, Heetak; Kim, Donghyo; Son, Heehwa G; Yang, Jae-Seong; Lee, Seung-Jae V; Kim, Sanguk
2016-08-30
Online application for survival analysis (OASIS) has served as a popular and convenient platform for the statistical analysis of various survival data, particularly in the field of aging research. With the recent advances in the fields of aging research that deal with complex survival data, we noticed a need for updates to the current version of OASIS. Here, we report OASIS 2 (http://sbi.postech.ac.kr/oasis2), which provides extended statistical tools for survival data and an enhanced user interface. In particular, OASIS 2 enables the statistical comparison of maximal lifespans, which is potentially useful for determining key factors that limit the lifespan of a population. Furthermore, OASIS 2 provides statistical and graphical tools that compare values in different conditions and times. That feature is useful for comparing age-associated changes in physiological activities, which can be used as indicators of "healthspan." We believe that OASIS 2 will serve as a standard platform for survival analysis with advanced and user-friendly statistical tools for experimental biologists in the field of aging research.
DAnTE: a statistical tool for quantitative analysis of –omics data
DOE Office of Scientific and Technical Information (OSTI.GOV)
Polpitiya, Ashoka D.; Qian, Weijun; Jaitly, Navdeep
2008-05-03
DAnTE (Data Analysis Tool Extension) is a statistical tool designed to address challenges unique to quantitative bottom-up, shotgun proteomics data. This tool has also been demonstrated for microarray data and can easily be extended to other high-throughput data types. DAnTE features selected normalization methods, missing value imputation algorithms, peptide to protein rollup methods, an extensive array of plotting functions, and a comprehensive ANOVA scheme that can handle unbalanced data and random effects. The Graphical User Interface (GUI) is designed to be very intuitive and user friendly.
Application of Statistics in Engineering Technology Programs
ERIC Educational Resources Information Center
Zhan, Wei; Fink, Rainer; Fang, Alex
2010-01-01
Statistics is a critical tool for robustness analysis, measurement system error analysis, test data analysis, probabilistic risk assessment, and many other fields in the engineering world. Traditionally, however, statistics is not extensively used in undergraduate engineering technology (ET) programs, resulting in a major disconnect from industry…
NIRS-SPM: statistical parametric mapping for near infrared spectroscopy
NASA Astrophysics Data System (ADS)
Tak, Sungho; Jang, Kwang Eun; Jung, Jinwook; Jang, Jaeduck; Jeong, Yong; Ye, Jong Chul
2008-02-01
Even though there exists a powerful statistical parametric mapping (SPM) tool for fMRI, similar public domain tools are not available for near infrared spectroscopy (NIRS). In this paper, we describe a new public domain statistical toolbox called NIRS-SPM for quantitative analysis of NIRS signals. Specifically, NIRS-SPM statistically analyzes the NIRS data using GLM and makes inference as the excursion probability which comes from the random field that are interpolated from the sparse measurement. In order to obtain correct inference, NIRS-SPM offers the pre-coloring and pre-whitening method for temporal correlation estimation. For simultaneous recording NIRS signal with fMRI, the spatial mapping between fMRI image and real coordinate in 3-D digitizer is estimated using Horn's algorithm. These powerful tools allows us the super-resolution localization of the brain activation which is not possible using the conventional NIRS analysis tools.
TRAPR: R Package for Statistical Analysis and Visualization of RNA-Seq Data.
Lim, Jae Hyun; Lee, Soo Youn; Kim, Ju Han
2017-03-01
High-throughput transcriptome sequencing, also known as RNA sequencing (RNA-Seq), is a standard technology for measuring gene expression with unprecedented accuracy. Numerous bioconductor packages have been developed for the statistical analysis of RNA-Seq data. However, these tools focus on specific aspects of the data analysis pipeline, and are difficult to appropriately integrate with one another due to their disparate data structures and processing methods. They also lack visualization methods to confirm the integrity of the data and the process. In this paper, we propose an R-based RNA-Seq analysis pipeline called TRAPR, an integrated tool that facilitates the statistical analysis and visualization of RNA-Seq expression data. TRAPR provides various functions for data management, the filtering of low-quality data, normalization, transformation, statistical analysis, data visualization, and result visualization that allow researchers to build customized analysis pipelines.
DOE Office of Scientific and Technical Information (OSTI.GOV)
White, Amanda M.; Daly, Don S.; Willse, Alan R.
The Automated Microarray Image Analysis (AMIA) Toolbox for MATLAB is a flexible, open-source microarray image analysis tool that allows the user to customize analysis of sets of microarray images. This tool provides several methods of identifying and quantify spot statistics, as well as extensive diagnostic statistics and images to identify poor data quality or processing. The open nature of this software allows researchers to understand the algorithms used to provide intensity estimates and to modify them easily if desired.
Mayo, Charles; Conners, Steve; Warren, Christopher; Miller, Robert; Court, Laurence; Popple, Richard
2013-01-01
Purpose: With emergence of clinical outcomes databases as tools utilized routinely within institutions, comes need for software tools to support automated statistical analysis of these large data sets and intrainstitutional exchange from independent federated databases to support data pooling. In this paper, the authors present a design approach and analysis methodology that addresses both issues. Methods: A software application was constructed to automate analysis of patient outcomes data using a wide range of statistical metrics, by combining use of C#.Net and R code. The accuracy and speed of the code was evaluated using benchmark data sets. Results: The approach provides data needed to evaluate combinations of statistical measurements for ability to identify patterns of interest in the data. Through application of the tools to a benchmark data set for dose-response threshold and to SBRT lung data sets, an algorithm was developed that uses receiver operator characteristic curves to identify a threshold value and combines use of contingency tables, Fisher exact tests, Welch t-tests, and Kolmogorov-Smirnov tests to filter the large data set to identify values demonstrating dose-response. Kullback-Leibler divergences were used to provide additional confirmation. Conclusions: The work demonstrates the viability of the design approach and the software tool for analysis of large data sets. PMID:24320426
Mayo, Charles; Conners, Steve; Warren, Christopher; Miller, Robert; Court, Laurence; Popple, Richard
2013-11-01
With emergence of clinical outcomes databases as tools utilized routinely within institutions, comes need for software tools to support automated statistical analysis of these large data sets and intrainstitutional exchange from independent federated databases to support data pooling. In this paper, the authors present a design approach and analysis methodology that addresses both issues. A software application was constructed to automate analysis of patient outcomes data using a wide range of statistical metrics, by combining use of C#.Net and R code. The accuracy and speed of the code was evaluated using benchmark data sets. The approach provides data needed to evaluate combinations of statistical measurements for ability to identify patterns of interest in the data. Through application of the tools to a benchmark data set for dose-response threshold and to SBRT lung data sets, an algorithm was developed that uses receiver operator characteristic curves to identify a threshold value and combines use of contingency tables, Fisher exact tests, Welch t-tests, and Kolmogorov-Smirnov tests to filter the large data set to identify values demonstrating dose-response. Kullback-Leibler divergences were used to provide additional confirmation. The work demonstrates the viability of the design approach and the software tool for analysis of large data sets.
Network Meta-Analysis Using R: A Review of Currently Available Automated Packages
Neupane, Binod; Richer, Danielle; Bonner, Ashley Joel; Kibret, Taddele; Beyene, Joseph
2014-01-01
Network meta-analysis (NMA) – a statistical technique that allows comparison of multiple treatments in the same meta-analysis simultaneously – has become increasingly popular in the medical literature in recent years. The statistical methodology underpinning this technique and software tools for implementing the methods are evolving. Both commercial and freely available statistical software packages have been developed to facilitate the statistical computations using NMA with varying degrees of functionality and ease of use. This paper aims to introduce the reader to three R packages, namely, gemtc, pcnetmeta, and netmeta, which are freely available software tools implemented in R. Each automates the process of performing NMA so that users can perform the analysis with minimal computational effort. We present, compare and contrast the availability and functionality of different important features of NMA in these three packages so that clinical investigators and researchers can determine which R packages to implement depending on their analysis needs. Four summary tables detailing (i) data input and network plotting, (ii) modeling options, (iii) assumption checking and diagnostic testing, and (iv) inference and reporting tools, are provided, along with an analysis of a previously published dataset to illustrate the outputs available from each package. We demonstrate that each of the three packages provides a useful set of tools, and combined provide users with nearly all functionality that might be desired when conducting a NMA. PMID:25541687
Network meta-analysis using R: a review of currently available automated packages.
Neupane, Binod; Richer, Danielle; Bonner, Ashley Joel; Kibret, Taddele; Beyene, Joseph
2014-01-01
Network meta-analysis (NMA)--a statistical technique that allows comparison of multiple treatments in the same meta-analysis simultaneously--has become increasingly popular in the medical literature in recent years. The statistical methodology underpinning this technique and software tools for implementing the methods are evolving. Both commercial and freely available statistical software packages have been developed to facilitate the statistical computations using NMA with varying degrees of functionality and ease of use. This paper aims to introduce the reader to three R packages, namely, gemtc, pcnetmeta, and netmeta, which are freely available software tools implemented in R. Each automates the process of performing NMA so that users can perform the analysis with minimal computational effort. We present, compare and contrast the availability and functionality of different important features of NMA in these three packages so that clinical investigators and researchers can determine which R packages to implement depending on their analysis needs. Four summary tables detailing (i) data input and network plotting, (ii) modeling options, (iii) assumption checking and diagnostic testing, and (iv) inference and reporting tools, are provided, along with an analysis of a previously published dataset to illustrate the outputs available from each package. We demonstrate that each of the three packages provides a useful set of tools, and combined provide users with nearly all functionality that might be desired when conducting a NMA.
STATWIZ - AN ELECTRONIC STATISTICAL TOOL (ABSTRACT)
StatWiz is a web-based, interactive, and dynamic statistical tool for researchers. It will allow researchers to input information and/or data and then receive experimental design options, or outputs from data analysis. StatWiz is envisioned as an expert system that will walk rese...
Using Quality Management Tools to Enhance Feedback from Student Evaluations
ERIC Educational Resources Information Center
Jensen, John B.; Artz, Nancy
2005-01-01
Statistical tools found in the service quality assessment literature--the "T"[superscript 2] statistic combined with factor analysis--can enhance the feedback instructors receive from student ratings. "T"[superscript 2] examines variability across multiple sets of ratings to isolate individual respondents with aberrant response…
A statistical software tool, Stream Fish Community Predictor (SFCP), based on EMAP stream sampling in the mid-Atlantic Highlands, was developed to predict stream fish communities using stream and watershed characteristics. Step one in the tool development was a cluster analysis t...
The Math Problem: Advertising Students' Attitudes toward Statistics
ERIC Educational Resources Information Center
Fullerton, Jami A.; Kendrick, Alice
2013-01-01
This study used the Students' Attitudes toward Statistics Scale (STATS) to measure attitude toward statistics among a national sample of advertising students. A factor analysis revealed four underlying factors make up the attitude toward statistics construct--"Interest & Future Applicability," "Confidence," "Statistical Tools," and "Initiative."…
Meta-Analysis for Primary and Secondary Data Analysis: The Super-Experiment Metaphor.
ERIC Educational Resources Information Center
Jackson, Sally
1991-01-01
Considers the relation between meta-analysis statistics and analysis of variance statistics. Discusses advantages and disadvantages as a primary data analysis tool. Argues that the two approaches are partial paraphrases of one another. Advocates an integrative approach that introduces the best of meta-analytic thinking into primary analysis…
Lamart, Stephanie; Griffiths, Nina M; Tchitchek, Nicolas; Angulo, Jaime F; Van der Meeren, Anne
2017-03-01
The aim of this work was to develop a computational tool that integrates several statistical analysis features for biodistribution data from internal contamination experiments. These data represent actinide levels in biological compartments as a function of time and are derived from activity measurements in tissues and excreta. These experiments aim at assessing the influence of different contamination conditions (e.g. intake route or radioelement) on the biological behavior of the contaminant. The ever increasing number of datasets and diversity of experimental conditions make the handling and analysis of biodistribution data difficult. This work sought to facilitate the statistical analysis of a large number of datasets and the comparison of results from diverse experimental conditions. Functional modules were developed using the open-source programming language R to facilitate specific operations: descriptive statistics, visual comparison, curve fitting, and implementation of biokinetic models. In addition, the structure of the datasets was harmonized using the same table format. Analysis outputs can be written in text files and updated data can be written in the consistent table format. Hence, a data repository is built progressively, which is essential for the optimal use of animal data. Graphical representations can be automatically generated and saved as image files. The resulting computational tool was applied using data derived from wound contamination experiments conducted under different conditions. In facilitating biodistribution data handling and statistical analyses, this computational tool ensures faster analyses and a better reproducibility compared with the use of multiple office software applications. Furthermore, re-analysis of archival data and comparison of data from different sources is made much easier. Hence this tool will help to understand better the influence of contamination characteristics on actinide biokinetics. Our approach can aid the optimization of treatment protocols and therefore contribute to the improvement of the medical response after internal contamination with actinides.
GARNET--gene set analysis with exploration of annotation relations.
Rho, Kyoohyoung; Kim, Bumjin; Jang, Youngjun; Lee, Sanghyun; Bae, Taejeong; Seo, Jihae; Seo, Chaehwa; Lee, Jihyun; Kang, Hyunjung; Yu, Ungsik; Kim, Sunghoon; Lee, Sanghyuk; Kim, Wan Kyu
2011-02-15
Gene set analysis is a powerful method of deducing biological meaning for an a priori defined set of genes. Numerous tools have been developed to test statistical enrichment or depletion in specific pathways or gene ontology (GO) terms. Major difficulties towards biological interpretation are integrating diverse types of annotation categories and exploring the relationships between annotation terms of similar information. GARNET (Gene Annotation Relationship NEtwork Tools) is an integrative platform for gene set analysis with many novel features. It includes tools for retrieval of genes from annotation database, statistical analysis & visualization of annotation relationships, and managing gene sets. In an effort to allow access to a full spectrum of amassed biological knowledge, we have integrated a variety of annotation data that include the GO, domain, disease, drug, chromosomal location, and custom-defined annotations. Diverse types of molecular networks (pathways, transcription and microRNA regulations, protein-protein interaction) are also included. The pair-wise relationship between annotation gene sets was calculated using kappa statistics. GARNET consists of three modules--gene set manager, gene set analysis and gene set retrieval, which are tightly integrated to provide virtually automatic analysis for gene sets. A dedicated viewer for annotation network has been developed to facilitate exploration of the related annotations. GARNET (gene annotation relationship network tools) is an integrative platform for diverse types of gene set analysis, where complex relationships among gene annotations can be easily explored with an intuitive network visualization tool (http://garnet.isysbio.org/ or http://ercsb.ewha.ac.kr/garnet/).
The Importance of Statistical Modeling in Data Analysis and Inference
ERIC Educational Resources Information Center
Rollins, Derrick, Sr.
2017-01-01
Statistical inference simply means to draw a conclusion based on information that comes from data. Error bars are the most commonly used tool for data analysis and inference in chemical engineering data studies. This work demonstrates, using common types of data collection studies, the importance of specifying the statistical model for sound…
Noise Reduction in High-Throughput Gene Perturbation Screens
USDA-ARS?s Scientific Manuscript database
Motivation: Accurate interpretation of perturbation screens is essential for a successful functional investigation. However, the screened phenotypes are often distorted by noise, and their analysis requires specialized statistical analysis tools. The number and scope of statistical methods available...
MetaGenyo: a web tool for meta-analysis of genetic association studies.
Martorell-Marugan, Jordi; Toro-Dominguez, Daniel; Alarcon-Riquelme, Marta E; Carmona-Saez, Pedro
2017-12-16
Genetic association studies (GAS) aims to evaluate the association between genetic variants and phenotypes. In the last few years, the number of this type of study has increased exponentially, but the results are not always reproducible due to experimental designs, low sample sizes and other methodological errors. In this field, meta-analysis techniques are becoming very popular tools to combine results across studies to increase statistical power and to resolve discrepancies in genetic association studies. A meta-analysis summarizes research findings, increases statistical power and enables the identification of genuine associations between genotypes and phenotypes. Meta-analysis techniques are increasingly used in GAS, but it is also increasing the amount of published meta-analysis containing different errors. Although there are several software packages that implement meta-analysis, none of them are specifically designed for genetic association studies and in most cases their use requires advanced programming or scripting expertise. We have developed MetaGenyo, a web tool for meta-analysis in GAS. MetaGenyo implements a complete and comprehensive workflow that can be executed in an easy-to-use environment without programming knowledge. MetaGenyo has been developed to guide users through the main steps of a GAS meta-analysis, covering Hardy-Weinberg test, statistical association for different genetic models, analysis of heterogeneity, testing for publication bias, subgroup analysis and robustness testing of the results. MetaGenyo is a useful tool to conduct comprehensive genetic association meta-analysis. The application is freely available at http://bioinfo.genyo.es/metagenyo/ .
RADSS: an integration of GIS, spatial statistics, and network service for regional data mining
NASA Astrophysics Data System (ADS)
Hu, Haitang; Bao, Shuming; Lin, Hui; Zhu, Qing
2005-10-01
Regional data mining, which aims at the discovery of knowledge about spatial patterns, clusters or association between regions, has widely applications nowadays in social science, such as sociology, economics, epidemiology, crime, and so on. Many applications in the regional or other social sciences are more concerned with the spatial relationship, rather than the precise geographical location. Based on the spatial continuity rule derived from Tobler's first law of geography: observations at two sites tend to be more similar to each other if the sites are close together than if far apart, spatial statistics, as an important means for spatial data mining, allow the users to extract the interesting and useful information like spatial pattern, spatial structure, spatial association, spatial outlier and spatial interaction, from the vast amount of spatial data or non-spatial data. Therefore, by integrating with the spatial statistical methods, the geographical information systems will become more powerful in gaining further insights into the nature of spatial structure of regional system, and help the researchers to be more careful when selecting appropriate models. However, the lack of such tools holds back the application of spatial data analysis techniques and development of new methods and models (e.g., spatio-temporal models). Herein, we make an attempt to develop such an integrated software and apply it into the complex system analysis for the Poyang Lake Basin. This paper presents a framework for integrating GIS, spatial statistics and network service in regional data mining, as well as their implementation. After discussing the spatial statistics methods involved in regional complex system analysis, we introduce RADSS (Regional Analysis and Decision Support System), our new regional data mining tool, by integrating GIS, spatial statistics and network service. RADSS includes the functions of spatial data visualization, exploratory spatial data analysis, and spatial statistics. The tool also includes some fundamental spatial and non-spatial database in regional population and environment, which can be updated by external database via CD or network. Utilizing this data mining and exploratory analytical tool, the users can easily and quickly analyse the huge mount of the interrelated regional data, and better understand the spatial patterns and trends of the regional development, so as to make a credible and scientific decision. Moreover, it can be used as an educational tool for spatial data analysis and environmental studies. In this paper, we also present a case study on Poyang Lake Basin as an application of the tool and spatial data mining in complex environmental studies. At last, several concluding remarks are discussed.
Open Source Tools for Seismicity Analysis
NASA Astrophysics Data System (ADS)
Powers, P.
2010-12-01
The spatio-temporal analysis of seismicity plays an important role in earthquake forecasting and is integral to research on earthquake interactions and triggering. For instance, the third version of the Uniform California Earthquake Rupture Forecast (UCERF), currently under development, will use Epidemic Type Aftershock Sequences (ETAS) as a model for earthquake triggering. UCERF will be a "living" model and therefore requires robust, tested, and well-documented ETAS algorithms to ensure transparency and reproducibility. Likewise, as earthquake aftershock sequences unfold, real-time access to high quality hypocenter data makes it possible to monitor the temporal variability of statistical properties such as the parameters of the Omori Law and the Gutenberg Richter b-value. Such statistical properties are valuable as they provide a measure of how much a particular sequence deviates from expected behavior and can be used when assigning probabilities of aftershock occurrence. To address these demands and provide public access to standard methods employed in statistical seismology, we present well-documented, open-source JavaScript and Java software libraries for the on- and off-line analysis of seismicity. The Javascript classes facilitate web-based asynchronous access to earthquake catalog data and provide a framework for in-browser display, analysis, and manipulation of catalog statistics; implementations of this framework will be made available on the USGS Earthquake Hazards website. The Java classes, in addition to providing tools for seismicity analysis, provide tools for modeling seismicity and generating synthetic catalogs. These tools are extensible and will be released as part of the open-source OpenSHA Commons library.
SWToolbox: A surface-water tool-box for statistical analysis of streamflow time series
Kiang, Julie E.; Flynn, Kate; Zhai, Tong; Hummel, Paul; Granato, Gregory
2018-03-07
This report is a user guide for the low-flow analysis methods provided with version 1.0 of the Surface Water Toolbox (SWToolbox) computer program. The software combines functionality from two software programs—U.S. Geological Survey (USGS) SWSTAT and U.S. Environmental Protection Agency (EPA) DFLOW. Both of these programs have been used primarily for computation of critical low-flow statistics. The main analysis methods are the computation of hydrologic frequency statistics such as the 7-day minimum flow that occurs on average only once every 10 years (7Q10), computation of design flows including biologically based flows, and computation of flow-duration curves and duration hydrographs. Other annual, monthly, and seasonal statistics can also be computed. The interface facilitates retrieval of streamflow discharge data from the USGS National Water Information System and outputs text reports for a record of the analysis. Tools for graphing data and screening tests are available to assist the analyst in conducting the analysis.
Blattmann, Peter; Heusel, Moritz; Aebersold, Ruedi
2016-01-01
SWATH-MS is an acquisition and analysis technique of targeted proteomics that enables measuring several thousand proteins with high reproducibility and accuracy across many samples. OpenSWATH is popular open-source software for peptide identification and quantification from SWATH-MS data. For downstream statistical and quantitative analysis there exist different tools such as MSstats, mapDIA and aLFQ. However, the transfer of data from OpenSWATH to the downstream statistical tools is currently technically challenging. Here we introduce the R/Bioconductor package SWATH2stats, which allows convenient processing of the data into a format directly readable by the downstream analysis tools. In addition, SWATH2stats allows annotation, analyzing the variation and the reproducibility of the measurements, FDR estimation, and advanced filtering before submitting the processed data to downstream tools. These functionalities are important to quickly analyze the quality of the SWATH-MS data. Hence, SWATH2stats is a new open-source tool that summarizes several practical functionalities for analyzing, processing, and converting SWATH-MS data and thus facilitates the efficient analysis of large-scale SWATH/DIA datasets.
Lü, Yiran; Hao, Shuxin; Zhang, Guoqing; Liu, Jie; Liu, Yue; Xu, Dongqun
2018-01-01
To implement the online statistical analysis function in information system of air pollution and health impact monitoring, and obtain the data analysis information real-time. Using the descriptive statistical method as well as time-series analysis and multivariate regression analysis, SQL language and visual tools to implement online statistical analysis based on database software. Generate basic statistical tables and summary tables of air pollution exposure and health impact data online; Generate tendency charts of each data part online and proceed interaction connecting to database; Generate butting sheets which can lead to R, SAS and SPSS directly online. The information system air pollution and health impact monitoring implements the statistical analysis function online, which can provide real-time analysis result to its users.
Research of Extension of the Life Cycle of Helicopter Rotor Blade in Hungary
2003-02-01
Radiography (DXR), and (iii) Vibration Diagnostics (VD) with Statistical Energy Analysis (SEA) were semi- simultaneously applied [1]. The used three...2.2. Vibration Diagnostics (VD)) Parallel to the NDT measurements the Statistical Energy Analysis (SEA) as a vibration diagnostical tool were...noises were analysed with a dual-channel real time frequency analyser (BK2035). In addition to the Statistical Energy Analysis measurement a small
Carroll, Adam J; Badger, Murray R; Harvey Millar, A
2010-07-14
Standardization of analytical approaches and reporting methods via community-wide collaboration can work synergistically with web-tool development to result in rapid community-driven expansion of online data repositories suitable for data mining and meta-analysis. In metabolomics, the inter-laboratory reproducibility of gas-chromatography/mass-spectrometry (GC/MS) makes it an obvious target for such development. While a number of web-tools offer access to datasets and/or tools for raw data processing and statistical analysis, none of these systems are currently set up to act as a public repository by easily accepting, processing and presenting publicly submitted GC/MS metabolomics datasets for public re-analysis. Here, we present MetabolomeExpress, a new File Transfer Protocol (FTP) server and web-tool for the online storage, processing, visualisation and statistical re-analysis of publicly submitted GC/MS metabolomics datasets. Users may search a quality-controlled database of metabolite response statistics from publicly submitted datasets by a number of parameters (eg. metabolite, species, organ/biofluid etc.). Users may also perform meta-analysis comparisons of multiple independent experiments or re-analyse public primary datasets via user-friendly tools for t-test, principal components analysis, hierarchical cluster analysis and correlation analysis. They may interact with chromatograms, mass spectra and peak detection results via an integrated raw data viewer. Researchers who register for a free account may upload (via FTP) their own data to the server for online processing via a novel raw data processing pipeline. MetabolomeExpress https://www.metabolome-express.org provides a new opportunity for the general metabolomics community to transparently present online the raw and processed GC/MS data underlying their metabolomics publications. Transparent sharing of these data will allow researchers to assess data quality and draw their own insights from published metabolomics datasets.
NASA Astrophysics Data System (ADS)
Whittaker, Kara A.; McShane, Dan
2013-02-01
A large storm event in southwest Washington State triggered over 2500 landslides and provided an opportunity to assess two slope stability screening tools. The statistical analysis conducted demonstrated that both screening tools are effective at predicting where landslides were likely to take place (Whittaker and McShane, 2012). Here we reply to two discussions of this article related to the development of the slope stability screening tools and the accuracy and scale of the spatial data used. Neither of the discussions address our statistical analysis or results. We provide greater detail on our sampling criteria and also elaborate on the policy and management implications of our findings and how they complement those of a separate investigation of landslides resulting from the same storm. The conclusions made in Whittaker and McShane (2012) stand as originally published unless future analysis indicates otherwise.
PRANAS: A New Platform for Retinal Analysis and Simulation.
Cessac, Bruno; Kornprobst, Pierre; Kraria, Selim; Nasser, Hassan; Pamplona, Daniela; Portelli, Geoffrey; Viéville, Thierry
2017-01-01
The retina encodes visual scenes by trains of action potentials that are sent to the brain via the optic nerve. In this paper, we describe a new free access user-end software allowing to better understand this coding. It is called PRANAS (https://pranas.inria.fr), standing for Platform for Retinal ANalysis And Simulation. PRANAS targets neuroscientists and modelers by providing a unique set of retina-related tools. PRANAS integrates a retina simulator allowing large scale simulations while keeping a strong biological plausibility and a toolbox for the analysis of spike train population statistics. The statistical method (entropy maximization under constraints) takes into account both spatial and temporal correlations as constraints, allowing to analyze the effects of memory on statistics. PRANAS also integrates a tool computing and representing in 3D (time-space) receptive fields. All these tools are accessible through a friendly graphical user interface. The most CPU-costly of them have been implemented to run in parallel.
Two Paradoxes in Linear Regression Analysis.
Feng, Ge; Peng, Jing; Tu, Dongke; Zheng, Julia Z; Feng, Changyong
2016-12-25
Regression is one of the favorite tools in applied statistics. However, misuse and misinterpretation of results from regression analysis are common in biomedical research. In this paper we use statistical theory and simulation studies to clarify some paradoxes around this popular statistical method. In particular, we show that a widely used model selection procedure employed in many publications in top medical journals is wrong. Formal procedures based on solid statistical theory should be used in model selection.
A Study on Predictive Analytics Application to Ship Machinery Maintenance
2013-09-01
Looking at the nature of the time series forecasting method , it would be better applied to offline analysis . The application for real- time online...other system attributes in future. Two techniques of statistical analysis , mainly time series models and cumulative sum control charts, are discussed in...statistical tool employed for the two techniques of statistical analysis . Both time series forecasting as well as CUSUM control charts are shown to be
Statistical quality control through overall vibration analysis
NASA Astrophysics Data System (ADS)
Carnero, M. a. Carmen; González-Palma, Rafael; Almorza, David; Mayorga, Pedro; López-Escobar, Carlos
2010-05-01
The present study introduces the concept of statistical quality control in automotive wheel bearings manufacturing processes. Defects on products under analysis can have a direct influence on passengers' safety and comfort. At present, the use of vibration analysis on machine tools for quality control purposes is not very extensive in manufacturing facilities. Noise and vibration are common quality problems in bearings. These failure modes likely occur under certain operating conditions and do not require high vibration amplitudes but relate to certain vibration frequencies. The vibration frequencies are affected by the type of surface problems (chattering) of ball races that are generated through grinding processes. The purpose of this paper is to identify grinding process variables that affect the quality of bearings by using statistical principles in the field of machine tools. In addition, an evaluation of the quality results of the finished parts under different combinations of process variables is assessed. This paper intends to establish the foundations to predict the quality of the products through the analysis of self-induced vibrations during the contact between the grinding wheel and the parts. To achieve this goal, the overall self-induced vibration readings under different combinations of process variables are analysed using statistical tools. The analysis of data and design of experiments follows a classical approach, considering all potential interactions between variables. The analysis of data is conducted through analysis of variance (ANOVA) for data sets that meet normality and homoscedasticity criteria. This paper utilizes different statistical tools to support the conclusions such as chi squared, Shapiro-Wilks, symmetry, Kurtosis, Cochran, Hartlett, and Hartley and Krushal-Wallis. The analysis presented is the starting point to extend the use of predictive techniques (vibration analysis) for quality control. This paper demonstrates the existence of predictive variables (high-frequency vibration displacements) that are sensible to the processes setup and the quality of the products obtained. Based on the result of this overall vibration analysis, a second paper will analyse self-induced vibration spectrums in order to define limit vibration bands, controllable every cycle or connected to permanent vibration-monitoring systems able to adjust sensible process variables identified by ANOVA, once the vibration readings exceed established quality limits.
HydroClimATe: hydrologic and climatic analysis toolkit
Dickinson, Jesse; Hanson, Randall T.; Predmore, Steven K.
2014-01-01
The potential consequences of climate variability and climate change have been identified as major issues for the sustainability and availability of the worldwide water resources. Unlike global climate change, climate variability represents deviations from the long-term state of the climate over periods of a few years to several decades. Currently, rich hydrologic time-series data are available, but the combination of data preparation and statistical methods developed by the U.S. Geological Survey as part of the Groundwater Resources Program is relatively unavailable to hydrologists and engineers who could benefit from estimates of climate variability and its effects on periodic recharge and water-resource availability. This report documents HydroClimATe, a computer program for assessing the relations between variable climatic and hydrologic time-series data. HydroClimATe was developed for a Windows operating system. The software includes statistical tools for (1) time-series preprocessing, (2) spectral analysis, (3) spatial and temporal analysis, (4) correlation analysis, and (5) projections. The time-series preprocessing tools include spline fitting, standardization using a normal or gamma distribution, and transformation by a cumulative departure. The spectral analysis tools include discrete Fourier transform, maximum entropy method, and singular spectrum analysis. The spatial and temporal analysis tool is empirical orthogonal function analysis. The correlation analysis tools are linear regression and lag correlation. The projection tools include autoregressive time-series modeling and generation of many realizations. These tools are demonstrated in four examples that use stream-flow discharge data, groundwater-level records, gridded time series of precipitation data, and the Multivariate ENSO Index.
eSACP - a new Nordic initiative towards developing statistical climate services
NASA Astrophysics Data System (ADS)
Thorarinsdottir, Thordis; Thejll, Peter; Drews, Martin; Guttorp, Peter; Venälainen, Ari; Uotila, Petteri; Benestad, Rasmus; Mesquita, Michel d. S.; Madsen, Henrik; Fox Maule, Cathrine
2015-04-01
The Nordic research council NordForsk has recently announced its support for a new 3-year research initiative on "statistical analysis of climate projections" (eSACP). eSACP will focus on developing e-science tools and services based on statistical analysis of climate projections for the purpose of helping decision-makers and planners in the face of expected future challenges in regional climate change. The motivation behind the project is the growing recognition in our society that forecasts of future climate change is associated with various sources of uncertainty, and that any long-term planning and decision-making dependent on a changing climate must account for this. At the same time there is an obvious gap between scientists from different fields and between practitioners in terms of understanding how climate information relates to different parts of the "uncertainty cascade". In eSACP we will develop generic e-science tools and statistical climate services to facilitate the use of climate projections by decision-makers and scientists from all fields for climate impact analyses and for the development of robust adaptation strategies, which properly (in a statistical sense) account for the inherent uncertainty. The new tool will be publically available and include functionality to utilize the extensive and dynamically growing repositories of data and use state-of-the-art statistical techniques to quantify the uncertainty and innovative approaches to visualize the results. Such a tool will not only be valuable for future assessments and underpin the development of dedicated climate services, but will also assist the scientific community in making more clearly its case on the consequences of our changing climate to policy makers and the general public. The eSACP project is led by Thordis Thorarinsdottir, Norwegian Computing Center, and also includes the Finnish Meteorological Institute, the Norwegian Meteorological Institute, the Technical University of Denmark and the Bjerknes Centre for Climate Research, Norway. This poster will present details of focus areas in the project and show some examples of the expected analysis tools.
Combining the Bourne-Shell, sed and awk in the UNIX Environment for Language Analysis.
ERIC Educational Resources Information Center
Schmitt, Lothar M.; Christianson, Kiel T.
This document describes how to construct tools for language analysis in research and teaching using the Bourne-shell, sed, and awk, three search tools, in the UNIX operating system. Applications include: searches for words, phrases, grammatical patterns, and phonemic patterns in text; statistical analysis of text in regard to such searches,…
Two Paradoxes in Linear Regression Analysis
FENG, Ge; PENG, Jing; TU, Dongke; ZHENG, Julia Z.; FENG, Changyong
2016-01-01
Summary Regression is one of the favorite tools in applied statistics. However, misuse and misinterpretation of results from regression analysis are common in biomedical research. In this paper we use statistical theory and simulation studies to clarify some paradoxes around this popular statistical method. In particular, we show that a widely used model selection procedure employed in many publications in top medical journals is wrong. Formal procedures based on solid statistical theory should be used in model selection. PMID:28638214
ERIC Educational Resources Information Center
Meletiou-Mavrotheris, Maria
2004-01-01
While technology has become an integral part of introductory statistics courses, the programs typically employed are professional packages designed primarily for data analysis rather than for learning. Findings from several studies suggest that use of such software in the introductory statistics classroom may not be very effective in helping…
Analysis tools for discovering strong parity violation at hadron colliders
NASA Astrophysics Data System (ADS)
Backović, Mihailo; Ralston, John P.
2011-07-01
Several arguments suggest parity violation may be observable in high energy strong interactions. We introduce new analysis tools to describe the azimuthal dependence of multiparticle distributions, or “azimuthal flow.” Analysis uses the representations of the orthogonal group O(2) and dihedral groups DN necessary to define parity completely in two dimensions. Classification finds that collective angles used in event-by-event statistics represent inequivalent tensor observables that cannot generally be represented by a single “reaction plane.” Many new parity-violating observables exist that have never been measured, while many parity-conserving observables formerly lumped together are now distinguished. We use the concept of “event-shape sorting” to suggest separating right- and left-handed events, and we discuss the effects of transverse and longitudinal spin. The analysis tools are statistically robust, and can be applied equally to low or high multiplicity events at the Tevatron, RHIC or RHIC Spin, and the LHC.
Osen, Hayley; Chang, David; Choo, Shelly; Perry, Henry; Hesse, Afua; Abantanga, Francis; McCord, Colin; Chrouser, Kristin; Abdullah, Fizan
2011-03-01
The World Health Organization (WHO) Tool for Situational Analysis to Assess Emergency and Essential Surgical Care (hereafter called the WHO Tool) has been used in more than 25 countries and is the largest effort to assess surgical care in the world. However, it has not yet been independently validated. Test-retest reliability is one way to validate the degree to which tests instruments are free from random error. The aim of the present field study was to determine the test-retest reliability of the WHO Tool. The WHO Tool was mailed to 10 district hospitals in Ghana. Written instructions were provided along with a letter from the Ghana Health Services requesting the hospital administrator to complete the survey tool. After ensuring delivery and completion of the forms, the study team readministered the WHO Tool at the time of an on-site visit less than 1 month later. The results of the two tests were compared to calculate kappa statistics for each of the 152 questions in the WHO Tool. The kappa statistic is a statistical measure of the degree of agreement above what would be expected based on chance alone. Ten hospitals were surveyed twice over a short interval (i.e., less than 1 month). Weighted and unweighted kappa statistics were calculated for 152 questions. The median unweighted kappa for the entire survey was 0.43 (interquartile range 0-0.84). The infrastructure section (24 questions) had a median kappa of 0.81; the human resources section (13 questions) had a median kappa of 0.77; the surgical procedures section (67 questions) had a median kappa of 0.00; and the emergency surgical equipment section (48 questions) had a median kappa of 0.81. Hospital capacity survey questions related to infrastructure characteristics had high reliability. However, questions related to process of care had poor reliability and may benefit from supplemental data gathered by direct observation. Limitations to the study include the small sample size: 10 district hospitals in a single country. Consistent and high correlations calculated from the field testing within the present analysis suggest that the WHO Tool for Situational Analysis is a reliable tool where it measures structure and setting, but it should be revised for measuring process of care.
Application of spatial technology in malaria research & control: some new insights.
Saxena, Rekha; Nagpal, B N; Srivastava, Aruna; Gupta, S K; Dash, A P
2009-08-01
Geographical information System (GIS) has emerged as the core of the spatial technology which integrates wide range of dataset available from different sources including Remote Sensing (RS) and Global Positioning System (GPS). Literature published during the decade (1998-2007) has been compiled and grouped into six categories according to the usage of the technology in malaria epidemiology. Different GIS modules like spatial data sources, mapping and geo-processing tools, distance calculation, digital elevation model (DEM), buffer zone and geo-statistical analysis have been investigated in detail, illustrated with examples as per the derived results. These GIS tools have contributed immensely in understanding the epidemiological processes of malaria and examples drawn have shown that GIS is now widely used for research and decision making in malaria control. Statistical data analysis currently is the most consistent and established set of tools to analyze spatial datasets. The desired future development of GIS is in line with the utilization of geo-statistical tools which combined with high quality data has capability to provide new insight into malaria epidemiology and the complexity of its transmission potential in endemic areas.
WebArray: an online platform for microarray data analysis
Xia, Xiaoqin; McClelland, Michael; Wang, Yipeng
2005-01-01
Background Many cutting-edge microarray analysis tools and algorithms, including commonly used limma and affy packages in Bioconductor, need sophisticated knowledge of mathematics, statistics and computer skills for implementation. Commercially available software can provide a user-friendly interface at considerable cost. To facilitate the use of these tools for microarray data analysis on an open platform we developed an online microarray data analysis platform, WebArray, for bench biologists to utilize these tools to explore data from single/dual color microarray experiments. Results The currently implemented functions were based on limma and affy package from Bioconductor, the spacings LOESS histogram (SPLOSH) method, PCA-assisted normalization method and genome mapping method. WebArray incorporates these packages and provides a user-friendly interface for accessing a wide range of key functions of limma and others, such as spot quality weight, background correction, graphical plotting, normalization, linear modeling, empirical bayes statistical analysis, false discovery rate (FDR) estimation, chromosomal mapping for genome comparison. Conclusion WebArray offers a convenient platform for bench biologists to access several cutting-edge microarray data analysis tools. The website is freely available at . It runs on a Linux server with Apache and MySQL. PMID:16371165
ToNER: A tool for identifying nucleotide enrichment signals in feature-enriched RNA-seq data.
Promworn, Yuttachon; Kaewprommal, Pavita; Shaw, Philip J; Intarapanich, Apichart; Tongsima, Sissades; Piriyapongsa, Jittima
2017-01-01
Biochemical methods are available for enriching 5' ends of RNAs in prokaryotes, which are employed in the differential RNA-seq (dRNA-seq) and the more recent Cappable-seq protocols. Computational methods are needed to locate RNA 5' ends from these data by statistical analysis of the enrichment. Although statistical-based analysis methods have been developed for dRNA-seq, they may not be suitable for Cappable-seq data. The more efficient enrichment method employed in Cappable-seq compared with dRNA-seq could affect data distribution and thus algorithm performance. We present Transformation of Nucleotide Enrichment Ratios (ToNER), a tool for statistical modeling of enrichment from RNA-seq data obtained from enriched and unenriched libraries. The tool calculates nucleotide enrichment scores and determines the global transformation for fitting to the normal distribution using the Box-Cox procedure. From the transformed distribution, sites of significant enrichment are identified. To increase power of detection, meta-analysis across experimental replicates is offered. We tested the tool on Cappable-seq and dRNA-seq data for identifying Escherichia coli transcript 5' ends and compared the results with those from the TSSAR tool, which is designed for analyzing dRNA-seq data. When combining results across Cappable-seq replicates, ToNER detects more known transcript 5' ends than TSSAR. In general, the transcript 5' ends detected by ToNER but not TSSAR occur in regions which cannot be locally modeled by TSSAR. ToNER uses a simple yet robust statistical modeling approach, which can be used for detecting RNA 5'ends from Cappable-seq data, in particular when combining information from experimental replicates. The ToNER tool could potentially be applied for analyzing other RNA-seq datasets in which enrichment for other structural features of RNA is employed. The program is freely available for download at ToNER webpage (http://www4a.biotec.or.th/GI/tools/toner) and GitHub repository (https://github.com/PavitaKae/ToNER).
Software Tools on the Peregrine System | High-Performance Computing | NREL
Debugger or performance analysis Tool for understanding the behavior of MPI applications. Intel VTune environment for statistical computing and graphics. VirtualGL/TurboVNC Visualization and analytics Remote Tools on the Peregrine System Software Tools on the Peregrine System NREL has a variety of
Rasch Model Based Analysis of the Force Concept Inventory
ERIC Educational Resources Information Center
Planinic, Maja; Ivanjek, Lana; Susac, Ana
2010-01-01
The Force Concept Inventory (FCI) is an important diagnostic instrument which is widely used in the field of physics education research. It is therefore very important to evaluate and monitor its functioning using different tools for statistical analysis. One of such tools is the stochastic Rasch model, which enables construction of linear…
Larry J. Gangi
2006-01-01
The FIREMON Analysis Tools program is designed to let the user perform grouped or ungrouped summary calculations of single measurement plot data, or statistical comparisons of grouped or ungrouped plot data taken at different sampling periods. The program allows the user to create reports and graphs, save and print them, or cut and paste them into a word processor....
Page, Grier P; Coulibaly, Issa
2008-01-01
Microarrays are a very powerful tool for quantifying the amount of RNA in samples; however, their ability to query essentially every gene in a genome, which can number in the tens of thousands, presents analytical and interpretative problems. As a result, a variety of software and web-based tools have been developed to help with these issues. This article highlights and reviews some of the tools for the first steps in the analysis of a microarray study. We have tried for a balance between free and commercial systems. We have organized the tools by topics including image processing tools (Section 2), power analysis tools (Section 3), image analysis tools (Section 4), database tools (Section 5), databases of functional information (Section 6), annotation tools (Section 7), statistical and data mining tools (Section 8), and dissemination tools (Section 9).
Tools for Assessing Readability of Statistics Teaching Materials
ERIC Educational Resources Information Center
Lesser, Lawrence; Wagler, Amy
2016-01-01
This article provides tools and rationale for instructors in math and science to make their assessment and curriculum materials (more) readable for students. The tools discussed (MSWord, LexTutor, Coh-Metrix TEA) are readily available linguistic analysis applications that are grounded in current linguistic theory, but present output that can…
RooStatsCms: A tool for analysis modelling, combination and statistical studies
NASA Astrophysics Data System (ADS)
Piparo, D.; Schott, G.; Quast, G.
2010-04-01
RooStatsCms is an object oriented statistical framework based on the RooFit technology. Its scope is to allow the modelling, statistical analysis and combination of multiple search channels for new phenomena in High Energy Physics. It provides a variety of methods described in literature implemented as classes, whose design is oriented to the execution of multiple CPU intensive jobs on batch systems or on the Grid.
Upgrade Summer Severe Weather Tool
NASA Technical Reports Server (NTRS)
Watson, Leela
2011-01-01
The goal of this task was to upgrade to the existing severe weather database by adding observations from the 2010 warm season, update the verification dataset with results from the 2010 warm season, use statistical logistic regression analysis on the database and develop a new forecast tool. The AMU analyzed 7 stability parameters that showed the possibility of providing guidance in forecasting severe weather, calculated verification statistics for the Total Threat Score (TTS), and calculated warm season verification statistics for the 2010 season. The AMU also performed statistical logistic regression analysis on the 22-year severe weather database. The results indicated that the logistic regression equation did not show an increase in skill over the previously developed TTS. The equation showed less accuracy than TTS at predicting severe weather, little ability to distinguish between severe and non-severe weather days, and worse standard categorical accuracy measures and skill scores over TTS.
Temporal Comparisons of Internet Topology
2014-06-01
Number CAIDA Cooperative Association of Internet Data Analysis CDN Content Delivery Network CI Confidence Interval DoS denial of service GMT Greenwich...the CAIDA data. Our methods include analysis of graph theoretical measures as well as complex network and statistical measures that will quantify the...tool that probes the Internet for topology analysis and performance [26]. Scamper uses network diagnostic tools, such as traceroute and ping, to probe
Ji, Jun; Ling, Jeffrey; Jiang, Helen; Wen, Qiaojun; Whitin, John C; Tian, Lu; Cohen, Harvey J; Ling, Xuefeng B
2013-03-23
Mass spectrometry (MS) has evolved to become the primary high throughput tool for proteomics based biomarker discovery. Until now, multiple challenges in protein MS data analysis remain: large-scale and complex data set management; MS peak identification, indexing; and high dimensional peak differential analysis with the concurrent statistical tests based false discovery rate (FDR). "Turnkey" solutions are needed for biomarker investigations to rapidly process MS data sets to identify statistically significant peaks for subsequent validation. Here we present an efficient and effective solution, which provides experimental biologists easy access to "cloud" computing capabilities to analyze MS data. The web portal can be accessed at http://transmed.stanford.edu/ssa/. Presented web application supplies large scale MS data online uploading and analysis with a simple user interface. This bioinformatic tool will facilitate the discovery of the potential protein biomarkers using MS.
GeneTools--application for functional annotation and statistical hypothesis testing.
Beisvag, Vidar; Jünge, Frode K R; Bergum, Hallgeir; Jølsum, Lars; Lydersen, Stian; Günther, Clara-Cecilie; Ramampiaro, Heri; Langaas, Mette; Sandvik, Arne K; Laegreid, Astrid
2006-10-24
Modern biology has shifted from "one gene" approaches to methods for genomic-scale analysis like microarray technology, which allow simultaneous measurement of thousands of genes. This has created a need for tools facilitating interpretation of biological data in "batch" mode. However, such tools often leave the investigator with large volumes of apparently unorganized information. To meet this interpretation challenge, gene-set, or cluster testing has become a popular analytical tool. Many gene-set testing methods and software packages are now available, most of which use a variety of statistical tests to assess the genes in a set for biological information. However, the field is still evolving, and there is a great need for "integrated" solutions. GeneTools is a web-service providing access to a database that brings together information from a broad range of resources. The annotation data are updated weekly, guaranteeing that users get data most recently available. Data submitted by the user are stored in the database, where it can easily be updated, shared between users and exported in various formats. GeneTools provides three different tools: i) NMC Annotation Tool, which offers annotations from several databases like UniGene, Entrez Gene, SwissProt and GeneOntology, in both single- and batch search mode. ii) GO Annotator Tool, where users can add new gene ontology (GO) annotations to genes of interest. These user defined GO annotations can be used in further analysis or exported for public distribution. iii) eGOn, a tool for visualization and statistical hypothesis testing of GO category representation. As the first GO tool, eGOn supports hypothesis testing for three different situations (master-target situation, mutually exclusive target-target situation and intersecting target-target situation). An important additional function is an evidence-code filter that allows users, to select the GO annotations for the analysis. GeneTools is the first "all in one" annotation tool, providing users with a rapid extraction of highly relevant gene annotation data for e.g. thousands of genes or clones at once. It allows a user to define and archive new GO annotations and it supports hypothesis testing related to GO category representations. GeneTools is freely available through www.genetools.no
Paleomagnetism.org: An online multi-platform open source environment for paleomagnetic data analysis
NASA Astrophysics Data System (ADS)
Koymans, Mathijs R.; Langereis, Cor G.; Pastor-Galán, Daniel; van Hinsbergen, Douwe J. J.
2016-08-01
This contribution provides an overview of Paleomagnetism.org, an open-source, multi-platform online environment for paleomagnetic data analysis. Paleomagnetism.org provides an interactive environment where paleomagnetic data can be interpreted, evaluated, visualized, and exported. The Paleomagnetism.org application is split in to an interpretation portal, a statistics portal, and a portal for miscellaneous paleomagnetic tools. In the interpretation portal, principle component analysis can be performed on visualized demagnetization diagrams. Interpreted directions and great circles can be combined to find great circle solutions. These directions can be used in the statistics portal, or exported as data and figures. The tools in the statistics portal cover standard Fisher statistics for directions and VGPs, including other statistical parameters used as reliability criteria. Other available tools include an eigenvector approach foldtest, two reversal test including a Monte Carlo simulation on mean directions, and a coordinate bootstrap on the original data. An implementation is included for the detection and correction of inclination shallowing in sediments following TK03.GAD. Finally we provide a module to visualize VGPs and expected paleolatitudes, declinations, and inclinations relative to widely used global apparent polar wander path models in coordinates of major continent-bearing plates. The tools in the miscellaneous portal include a net tectonic rotation (NTR) analysis to restore a body to its paleo-vertical and a bootstrapped oroclinal test using linear regressive techniques, including a modified foldtest around a vertical axis. Paleomagnetism.org provides an integrated approach for researchers to work with visualized (e.g. hemisphere projections, Zijderveld diagrams) paleomagnetic data. The application constructs a custom exportable file that can be shared freely and included in public databases. This exported file contains all data and can later be imported to the application by other researchers. The accessibility and simplicity through which paleomagnetic data can be interpreted, analyzed, visualized, and shared makes Paleomagnetism.org of interest to the community.
van Rhee, Henk; Hak, Tony
2017-01-01
We present a new tool for meta‐analysis, Meta‐Essentials, which is free of charge and easy to use. In this paper, we introduce the tool and compare its features to other tools for meta‐analysis. We also provide detailed information on the validation of the tool. Although free of charge and simple, Meta‐Essentials automatically calculates effect sizes from a wide range of statistics and can be used for a wide range of meta‐analysis applications, including subgroup analysis, moderator analysis, and publication bias analyses. The confidence interval of the overall effect is automatically based on the Knapp‐Hartung adjustment of the DerSimonian‐Laird estimator. However, more advanced meta‐analysis methods such as meta‐analytical structural equation modelling and meta‐regression with multiple covariates are not available. In summary, Meta‐Essentials may prove a valuable resource for meta‐analysts, including researchers, teachers, and students. PMID:28801932
Probability and Statistics in Sensor Performance Modeling
2010-12-01
language software program is called Environmental Awareness for Sensor and Emitter Employment. Some important numerical issues in the implementation...3 Statistical analysis for measuring sensor performance...complementary cumulative distribution function cdf cumulative distribution function DST decision-support tool EASEE Environmental Awareness of
Statistical Analysis Tools for Learning in Engineering Laboratories.
ERIC Educational Resources Information Center
Maher, Carolyn A.
1990-01-01
Described are engineering programs that have used automated data acquisition systems to implement data collection and analyze experiments. Applications include a biochemical engineering laboratory, heat transfer performance, engineering materials testing, mechanical system reliability, statistical control laboratory, thermo-fluid laboratory, and a…
pROC: an open-source package for R and S+ to analyze and compare ROC curves.
Robin, Xavier; Turck, Natacha; Hainard, Alexandre; Tiberti, Natalia; Lisacek, Frédérique; Sanchez, Jean-Charles; Müller, Markus
2011-03-17
Receiver operating characteristic (ROC) curves are useful tools to evaluate classifiers in biomedical and bioinformatics applications. However, conclusions are often reached through inconsistent use or insufficient statistical analysis. To support researchers in their ROC curves analysis we developed pROC, a package for R and S+ that contains a set of tools displaying, analyzing, smoothing and comparing ROC curves in a user-friendly, object-oriented and flexible interface. With data previously imported into the R or S+ environment, the pROC package builds ROC curves and includes functions for computing confidence intervals, statistical tests for comparing total or partial area under the curve or the operating points of different classifiers, and methods for smoothing ROC curves. Intermediary and final results are visualised in user-friendly interfaces. A case study based on published clinical and biomarker data shows how to perform a typical ROC analysis with pROC. pROC is a package for R and S+ specifically dedicated to ROC analysis. It proposes multiple statistical tests to compare ROC curves, and in particular partial areas under the curve, allowing proper ROC interpretation. pROC is available in two versions: in the R programming language or with a graphical user interface in the S+ statistical software. It is accessible at http://expasy.org/tools/pROC/ under the GNU General Public License. It is also distributed through the CRAN and CSAN public repositories, facilitating its installation.
Statistical power analysis of cardiovascular safety pharmacology studies in conscious rats.
Bhatt, Siddhartha; Li, Dingzhou; Flynn, Declan; Wisialowski, Todd; Hemkens, Michelle; Steidl-Nichols, Jill
2016-01-01
Cardiovascular (CV) toxicity and related attrition are a major challenge for novel therapeutic entities and identifying CV liability early is critical for effective derisking. CV safety pharmacology studies in rats are a valuable tool for early investigation of CV risk. Thorough understanding of data analysis techniques and statistical power of these studies is currently lacking and is imperative for enabling sound decision-making. Data from 24 crossover and 12 parallel design CV telemetry rat studies were used for statistical power calculations. Average values of telemetry parameters (heart rate, blood pressure, body temperature, and activity) were logged every 60s (from 1h predose to 24h post-dose) and reduced to 15min mean values. These data were subsequently binned into super intervals for statistical analysis. A repeated measure analysis of variance was used for statistical analysis of crossover studies and a repeated measure analysis of covariance was used for parallel studies. Statistical power analysis was performed to generate power curves and establish relationships between detectable CV (blood pressure and heart rate) changes and statistical power. Additionally, data from a crossover CV study with phentolamine at 4, 20 and 100mg/kg are reported as a representative example of data analysis methods. Phentolamine produced a CV profile characteristic of alpha adrenergic receptor antagonism, evidenced by a dose-dependent decrease in blood pressure and reflex tachycardia. Detectable blood pressure changes at 80% statistical power for crossover studies (n=8) were 4-5mmHg. For parallel studies (n=8), detectable changes at 80% power were 6-7mmHg. Detectable heart rate changes for both study designs were 20-22bpm. Based on our results, the conscious rat CV model is a sensitive tool to detect and mitigate CV risk in early safety studies. Furthermore, these results will enable informed selection of appropriate models and study design for early stage CV studies. Copyright © 2016 Elsevier Inc. All rights reserved.
NASA Astrophysics Data System (ADS)
Jebur, M. N.; Pradhan, B.; Shafri, H. Z. M.; Yusof, Z.; Tehrany, M. S.
2014-10-01
Modeling and classification difficulties are fundamental issues in natural hazard assessment. A geographic information system (GIS) is a domain that requires users to use various tools to perform different types of spatial modeling. Bivariate statistical analysis (BSA) assists in hazard modeling. To perform this analysis, several calculations are required and the user has to transfer data from one format to another. Most researchers perform these calculations manually by using Microsoft Excel or other programs. This process is time consuming and carries a degree of uncertainty. The lack of proper tools to implement BSA in a GIS environment prompted this study. In this paper, a user-friendly tool, BSM (bivariate statistical modeler), for BSA technique is proposed. Three popular BSA techniques such as frequency ratio, weights-of-evidence, and evidential belief function models are applied in the newly proposed ArcMAP tool. This tool is programmed in Python and is created by a simple graphical user interface, which facilitates the improvement of model performance. The proposed tool implements BSA automatically, thus allowing numerous variables to be examined. To validate the capability and accuracy of this program, a pilot test area in Malaysia is selected and all three models are tested by using the proposed program. Area under curve is used to measure the success rate and prediction rate. Results demonstrate that the proposed program executes BSA with reasonable accuracy. The proposed BSA tool can be used in numerous applications, such as natural hazard, mineral potential, hydrological, and other engineering and environmental applications.
NASA Astrophysics Data System (ADS)
Jebur, M. N.; Pradhan, B.; Shafri, H. Z. M.; Yusoff, Z. M.; Tehrany, M. S.
2015-03-01
Modelling and classification difficulties are fundamental issues in natural hazard assessment. A geographic information system (GIS) is a domain that requires users to use various tools to perform different types of spatial modelling. Bivariate statistical analysis (BSA) assists in hazard modelling. To perform this analysis, several calculations are required and the user has to transfer data from one format to another. Most researchers perform these calculations manually by using Microsoft Excel or other programs. This process is time-consuming and carries a degree of uncertainty. The lack of proper tools to implement BSA in a GIS environment prompted this study. In this paper, a user-friendly tool, bivariate statistical modeler (BSM), for BSA technique is proposed. Three popular BSA techniques, such as frequency ratio, weight-of-evidence (WoE), and evidential belief function (EBF) models, are applied in the newly proposed ArcMAP tool. This tool is programmed in Python and created by a simple graphical user interface (GUI), which facilitates the improvement of model performance. The proposed tool implements BSA automatically, thus allowing numerous variables to be examined. To validate the capability and accuracy of this program, a pilot test area in Malaysia is selected and all three models are tested by using the proposed program. Area under curve (AUC) is used to measure the success rate and prediction rate. Results demonstrate that the proposed program executes BSA with reasonable accuracy. The proposed BSA tool can be used in numerous applications, such as natural hazard, mineral potential, hydrological, and other engineering and environmental applications.
Measuring, Understanding, and Responding to Covert Social Networks: Passive and Active Tomography
2017-11-29
Methods for generating a random sample of networks with desired properties are important tools for the analysis of social , biological, and information...on Theoretical Foundations for Statistical Network Analysis at the Isaac Newton Institute for Mathematical Sciences at Cambridge U. (organized by...Approach SOCIAL SCIENCES STATISTICS EECS Problems span three disciplines Scientific focus is needed at the interfaces
ERIC Educational Resources Information Center
Rimbey, Kimberly
2008-01-01
Created by teachers for teachers, the Math Academy tools and activities included in this booklet were designed to create hands-on activities and a fun learning environment for the teaching of mathematics to the students. This booklet contains the "Math Academy--Play Ball! Explorations in Data Analysis & Statistics," which teachers can use to…
Virtual tool mark generation for efficient striation analysis in forensic science
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ekstrand, Laura
In 2009, a National Academy of Sciences report called for investigation into the scienti c basis behind tool mark comparisons (National Academy of Sciences, 2009). Answering this call, Chumbley et al. (2010) attempted to prove or disprove the hypothesis that tool marks are unique to a single tool. They developed a statistical algorithm that could, in most cases, discern matching and non-matching tool marks made at di erent angles by sequentially numbered screwdriver tips. Moreover, in the cases where the algorithm misinterpreted a pair of marks, an experienced forensics examiner could discern the correct outcome. While this research served tomore » con rm the basic assumptions behind tool mark analysis, it also suggested that statistical analysis software could help to reduce the examiner's workload. This led to a new tool mark analysis approach, introduced in this thesis, that relies on 3D scans of screwdriver tip and marked plate surfaces at the micrometer scale from an optical microscope. These scans are carefully cleaned to remove noise from the data acquisition process and assigned a coordinate system that mathematically de nes angles and twists in a natural way. The marking process is then simulated by using a 3D graphics software package to impart rotations to the tip and take the projection of the tip's geometry in the direction of tool travel. The edge of this projection, retrieved from the 3D graphics software, becomes a virtual tool mark. Using this method, virtual marks are made at increments of 5 and compared to a scan of the evidence mark. The previously developed statistical package from Chumbley et al. (2010) performs the comparison, comparing the similarity of the geometry of both marks to the similarity that would occur due to random chance. The resulting statistical measure of the likelihood of the match informs the examiner of the angle of the best matching virtual mark, allowing the examiner to focus his/her mark analysis on a smaller range of angles. Preliminary results are quite promising. In a study with both sides of 6 screwdriver tips and 34 corresponding marks, the method distinguished known matches from known non-matches with zero false positive matches and only two matches mistaken for non-matches. For matches, it could predict the correct marking angle within 5-10 . Moreover, on a standard desktop computer, the virtual marking software is capable of cleaning 3D tip and plate scans in minutes and producing a virtual mark and comparing it to a real mark in seconds. These results support several of the professional conclusions of the tool mark analysis com- munity, including the idea that marks produced by the same tool only match if they are made at similar angles. The method also displays the potential to automate part of the comparison process, freeing the examiner to focus on other tasks, which is important in busy, backlogged crime labs. Finally, the method o ers the unique chance to directly link an evidence mark to the tool that produced it while reducing potential damage to the evidence.« less
Geospatial methods and data analysis for assessing distribution of grazing livestock
USDA-ARS?s Scientific Manuscript database
Free-ranging livestock research must begin with a well conceived problem statement and employ appropriate data acquisition tools and analytical techniques to accomplish the research objective. These requirements are especially critical in addressing animal distribution. Tools and statistics used t...
Smith, Joseph M.; Mather, Martha E.
2012-01-01
Ecological indicators are science-based tools used to assess how human activities have impacted environmental resources. For monitoring and environmental assessment, existing species assemblage data can be used to make these comparisons through time or across sites. An impediment to using assemblage data, however, is that these data are complex and need to be simplified in an ecologically meaningful way. Because multivariate statistics are mathematical relationships, statistical groupings may not make ecological sense and will not have utility as indicators. Our goal was to define a process to select defensible and ecologically interpretable statistical simplifications of assemblage data in which researchers and managers can have confidence. For this, we chose a suite of statistical methods, compared the groupings that resulted from these analyses, identified convergence among groupings, then we interpreted the groupings using species and ecological guilds. When we tested this approach using a statewide stream fish dataset, not all statistical methods worked equally well. For our dataset, logistic regression (Log), detrended correspondence analysis (DCA), cluster analysis (CL), and non-metric multidimensional scaling (NMDS) provided consistent, simplified output. Specifically, the Log, DCA, CL-1, and NMDS-1 groupings were ≥60% similar to each other, overlapped with the fluvial-specialist ecological guild, and contained a common subset of species. Groupings based on number of species (e.g., Log, DCA, CL and NMDS) outperformed groupings based on abundance [e.g., principal components analysis (PCA) and Poisson regression]. Although the specific methods that worked on our test dataset have generality, here we are advocating a process (e.g., identifying convergent groupings with redundant species composition that are ecologically interpretable) rather than the automatic use of any single statistical tool. We summarize this process in step-by-step guidance for the future use of these commonly available ecological and statistical methods in preparing assemblage data for use in ecological indicators.
ToNER: A tool for identifying nucleotide enrichment signals in feature-enriched RNA-seq data
Promworn, Yuttachon; Kaewprommal, Pavita; Shaw, Philip J.; Intarapanich, Apichart; Tongsima, Sissades
2017-01-01
Background Biochemical methods are available for enriching 5′ ends of RNAs in prokaryotes, which are employed in the differential RNA-seq (dRNA-seq) and the more recent Cappable-seq protocols. Computational methods are needed to locate RNA 5′ ends from these data by statistical analysis of the enrichment. Although statistical-based analysis methods have been developed for dRNA-seq, they may not be suitable for Cappable-seq data. The more efficient enrichment method employed in Cappable-seq compared with dRNA-seq could affect data distribution and thus algorithm performance. Results We present Transformation of Nucleotide Enrichment Ratios (ToNER), a tool for statistical modeling of enrichment from RNA-seq data obtained from enriched and unenriched libraries. The tool calculates nucleotide enrichment scores and determines the global transformation for fitting to the normal distribution using the Box-Cox procedure. From the transformed distribution, sites of significant enrichment are identified. To increase power of detection, meta-analysis across experimental replicates is offered. We tested the tool on Cappable-seq and dRNA-seq data for identifying Escherichia coli transcript 5′ ends and compared the results with those from the TSSAR tool, which is designed for analyzing dRNA-seq data. When combining results across Cappable-seq replicates, ToNER detects more known transcript 5′ ends than TSSAR. In general, the transcript 5′ ends detected by ToNER but not TSSAR occur in regions which cannot be locally modeled by TSSAR. Conclusion ToNER uses a simple yet robust statistical modeling approach, which can be used for detecting RNA 5′ends from Cappable-seq data, in particular when combining information from experimental replicates. The ToNER tool could potentially be applied for analyzing other RNA-seq datasets in which enrichment for other structural features of RNA is employed. The program is freely available for download at ToNER webpage (http://www4a.biotec.or.th/GI/tools/toner) and GitHub repository (https://github.com/PavitaKae/ToNER). PMID:28542466
An Environmental Decision Support System for Spatial Assessment and Selective Remediation
Spatial Analysis and Decision Assistance (SADA) is a Windows freeware program that incorporates environmental assessment tools for effective problem-solving. The software integrates modules for GIS, visualization, geospatial analysis, statistical analysis, human health and ecolog...
Byrska-Bishop, Marta; Wallace, John; Frase, Alexander T; Ritchie, Marylyn D
2018-01-01
Abstract Motivation BioBin is an automated bioinformatics tool for the multi-level biological binning of sequence variants. Herein, we present a significant update to BioBin which expands the software to facilitate a comprehensive rare variant analysis and incorporates novel features and analysis enhancements. Results In BioBin 2.3, we extend our software tool by implementing statistical association testing, updating the binning algorithm, as well as incorporating novel analysis features providing for a robust, highly customizable, and unified rare variant analysis tool. Availability and implementation The BioBin software package is open source and freely available to users at http://www.ritchielab.com/software/biobin-download Contact mdritchie@geisinger.edu Supplementary information Supplementary data are available at Bioinformatics online. PMID:28968757
USDA-ARS?s Scientific Manuscript database
Characterizing population genetic structure across geographic space is a fundamental challenge in population genetics. Multivariate statistical analyses are powerful tools for summarizing genetic variability, but geographic information and accompanying metadata is not always easily integrated into t...
Su, Cheng; Zhou, Lei; Hu, Zheng; Weng, Winnie; Subramani, Jayanthi; Tadkod, Vineet; Hamilton, Kortney; Bautista, Ami; Wu, Yu; Chirmule, Narendra; Zhong, Zhandong Don
2015-10-01
Biotherapeutics can elicit immune responses, which can alter the exposure, safety, and efficacy of the therapeutics. A well-designed and robust bioanalytical method is critical for the detection and characterization of relevant anti-drug antibody (ADA) and the success of an immunogenicity study. As a fundamental criterion in immunogenicity testing, assay cut points need to be statistically established with a risk-based approach to reduce subjectivity. This manuscript describes the development of a validated, web-based, multi-tier customized assay statistical tool (CAST) for assessing cut points of ADA assays. The tool provides an intuitive web interface that allows users to import experimental data generated from a standardized experimental design, select the assay factors, run the standardized analysis algorithms, and generate tables, figures, and listings (TFL). It allows bioanalytical scientists to perform complex statistical analysis at a click of the button to produce reliable assay parameters in support of immunogenicity studies. Copyright © 2015 Elsevier B.V. All rights reserved.
The Development of Design Tools for Fault Tolerant Quantum Dot Cellular Automata Based Logic
NASA Technical Reports Server (NTRS)
Armstrong, Curtis D.; Humphreys, William M.
2003-01-01
We are developing software to explore the fault tolerance of quantum dot cellular automata gate architectures in the presence of manufacturing variations and device defects. The Topology Optimization Methodology using Applied Statistics (TOMAS) framework extends the capabilities of the A Quantum Interconnected Network Array Simulator (AQUINAS) by adding front-end and back-end software and creating an environment that integrates all of these components. The front-end tools establish all simulation parameters, configure the simulation system, automate the Monte Carlo generation of simulation files, and execute the simulation of these files. The back-end tools perform automated data parsing, statistical analysis and report generation.
Graphical tools for network meta-analysis in STATA.
Chaimani, Anna; Higgins, Julian P T; Mavridis, Dimitris; Spyridonos, Panagiota; Salanti, Georgia
2013-01-01
Network meta-analysis synthesizes direct and indirect evidence in a network of trials that compare multiple interventions and has the potential to rank the competing treatments according to the studied outcome. Despite its usefulness network meta-analysis is often criticized for its complexity and for being accessible only to researchers with strong statistical and computational skills. The evaluation of the underlying model assumptions, the statistical technicalities and presentation of the results in a concise and understandable way are all challenging aspects in the network meta-analysis methodology. In this paper we aim to make the methodology accessible to non-statisticians by presenting and explaining a series of graphical tools via worked examples. To this end, we provide a set of STATA routines that can be easily employed to present the evidence base, evaluate the assumptions, fit the network meta-analysis model and interpret its results.
Graphical Tools for Network Meta-Analysis in STATA
Chaimani, Anna; Higgins, Julian P. T.; Mavridis, Dimitris; Spyridonos, Panagiota; Salanti, Georgia
2013-01-01
Network meta-analysis synthesizes direct and indirect evidence in a network of trials that compare multiple interventions and has the potential to rank the competing treatments according to the studied outcome. Despite its usefulness network meta-analysis is often criticized for its complexity and for being accessible only to researchers with strong statistical and computational skills. The evaluation of the underlying model assumptions, the statistical technicalities and presentation of the results in a concise and understandable way are all challenging aspects in the network meta-analysis methodology. In this paper we aim to make the methodology accessible to non-statisticians by presenting and explaining a series of graphical tools via worked examples. To this end, we provide a set of STATA routines that can be easily employed to present the evidence base, evaluate the assumptions, fit the network meta-analysis model and interpret its results. PMID:24098547
NASA Astrophysics Data System (ADS)
M, Vasu; Shivananda Nayaka, H.
2018-06-01
In this experimental work dry turning process carried out on EN47 spring steel with coated tungsten carbide tool insert with 0.8 mm nose radius are optimized by using statistical technique. Experiments were conducted at three different cutting speeds (625, 796 and 1250 rpm) with three different feed rates (0.046, 0.062 and 0.093 mm/rev) and depth of cuts (0.2, 0.3 and 0.4 mm). Experiments are conducted based on full factorial design (FFD) 33 three factors and three levels. Analysis of variance is used to identify significant factor for each output response. The result reveals that feed rate is the most significant factor influencing on cutting force followed by depth of cut and cutting speed having less significance. Optimum machining condition for cutting force obtained from the statistical technique. Tool wear measurements are performed with optimum condition of Vc = 796 rpm, ap = 0.2 mm, f = 0.046 mm/rev. The minimum tool wear observed as 0.086 mm with 5 min machining. Analysis of tool wear was done by confocal microscope it was observed that tool wear increases with increasing cutting time.
Getting the big picture in community science: methods that capture context.
Luke, Douglas A
2005-06-01
Community science has a rich tradition of using theories and research designs that are consistent with its core value of contextualism. However, a survey of empirical articles published in the American Journal of Community Psychology shows that community scientists utilize a narrow range of statistical tools that are not well suited to assess contextual data. Multilevel modeling, geographic information systems (GIS), social network analysis, and cluster analysis are recommended as useful tools to address contextual questions in community science. An argument for increased methodological consilience is presented, where community scientists are encouraged to adopt statistical methodology that is capable of modeling a greater proportion of the data than is typical with traditional methods.
LENS: web-based lens for enrichment and network studies of human proteins
2015-01-01
Background Network analysis is a common approach for the study of genetic view of diseases and biological pathways. Typically, when a set of genes are identified to be of interest in relation to a disease, say through a genome wide association study (GWAS) or a different gene expression study, these genes are typically analyzed in the context of their protein-protein interaction (PPI) networks. Further analysis is carried out to compute the enrichment of known pathways and disease-associations in the network. Having tools for such analysis at the fingertips of biologists without the requirement for computer programming or curation of data would accelerate the characterization of genes of interest. Currently available tools do not integrate network and enrichment analysis and their visualizations, and most of them present results in formats not most conducive to human cognition. Results We developed the tool Lens for Enrichment and Network Studies of human proteins (LENS) that performs network and pathway and diseases enrichment analyses on genes of interest to users. The tool creates a visualization of the network, provides easy to read statistics on network connectivity, and displays Venn diagrams with statistical significance values of the network's association with drugs, diseases, pathways, and GWASs. We used the tool to analyze gene sets related to craniofacial development, autism, and schizophrenia. Conclusion LENS is a web-based tool that does not require and download or plugins to use. The tool is free and does not require login for use, and is available at http://severus.dbmi.pitt.edu/LENS. PMID:26680011
Leyrat, Clémence; Caille, Agnès; Foucher, Yohann; Giraudeau, Bruno
2016-01-22
Despite randomization, baseline imbalance and confounding bias may occur in cluster randomized trials (CRTs). Covariate imbalance may jeopardize the validity of statistical inferences if they occur on prognostic factors. Thus, the diagnosis of a such imbalance is essential to adjust statistical analysis if required. We developed a tool based on the c-statistic of the propensity score (PS) model to detect global baseline covariate imbalance in CRTs and assess the risk of confounding bias. We performed a simulation study to assess the performance of the proposed tool and applied this method to analyze the data from 2 published CRTs. The proposed method had good performance for large sample sizes (n =500 per arm) and when the number of unbalanced covariates was not too small as compared with the total number of baseline covariates (≥40% of unbalanced covariates). We also provide a strategy for pre selection of the covariates needed to be included in the PS model to enhance imbalance detection. The proposed tool could be useful in deciding whether covariate adjustment is required before performing statistical analyses of CRTs.
Suurmond, Robert; van Rhee, Henk; Hak, Tony
2017-12-01
We present a new tool for meta-analysis, Meta-Essentials, which is free of charge and easy to use. In this paper, we introduce the tool and compare its features to other tools for meta-analysis. We also provide detailed information on the validation of the tool. Although free of charge and simple, Meta-Essentials automatically calculates effect sizes from a wide range of statistics and can be used for a wide range of meta-analysis applications, including subgroup analysis, moderator analysis, and publication bias analyses. The confidence interval of the overall effect is automatically based on the Knapp-Hartung adjustment of the DerSimonian-Laird estimator. However, more advanced meta-analysis methods such as meta-analytical structural equation modelling and meta-regression with multiple covariates are not available. In summary, Meta-Essentials may prove a valuable resource for meta-analysts, including researchers, teachers, and students. © 2017 The Authors. Research Synthesis Methods published by John Wiley & Sons Ltd.
A phylogenetic transform enhances analysis of compositional microbiota data.
Silverman, Justin D; Washburne, Alex D; Mukherjee, Sayan; David, Lawrence A
2017-02-15
Surveys of microbial communities (microbiota), typically measured as relative abundance of species, have illustrated the importance of these communities in human health and disease. Yet, statistical artifacts commonly plague the analysis of relative abundance data. Here, we introduce the PhILR transform, which incorporates microbial evolutionary models with the isometric log-ratio transform to allow off-the-shelf statistical tools to be safely applied to microbiota surveys. We demonstrate that analyses of community-level structure can be applied to PhILR transformed data with performance on benchmarks rivaling or surpassing standard tools. Additionally, by decomposing distance in the PhILR transformed space, we identified neighboring clades that may have adapted to distinct human body sites. Decomposing variance revealed that covariation of bacterial clades within human body sites increases with phylogenetic relatedness. Together, these findings illustrate how the PhILR transform combines statistical and phylogenetic models to overcome compositional data challenges and enable evolutionary insights relevant to microbial communities.
Kuretzki, Carlos Henrique; Campos, Antônio Carlos Ligocki; Malafaia, Osvaldo; Soares, Sandramara Scandelari Kusano de Paula; Tenório, Sérgio Bernardo; Timi, Jorge Rufino Ribas
2016-03-01
The use of information technology is often applied in healthcare. With regard to scientific research, the SINPE(c) - Integrated Electronic Protocols was created as a tool to support researchers, offering clinical data standardization. By the time, SINPE(c) lacked statistical tests obtained by automatic analysis. Add to SINPE(c) features for automatic realization of the main statistical methods used in medicine . The study was divided into four topics: check the interest of users towards the implementation of the tests; search the frequency of their use in health care; carry out the implementation; and validate the results with researchers and their protocols. It was applied in a group of users of this software in their thesis in the strict sensu master and doctorate degrees in one postgraduate program in surgery. To assess the reliability of the statistics was compared the data obtained both automatically by SINPE(c) as manually held by a professional in statistics with experience with this type of study. There was concern for the use of automatic statistical tests, with good acceptance. The chi-square, Mann-Whitney, Fisher and t-Student were considered as tests frequently used by participants in medical studies. These methods have been implemented and thereafter approved as expected. The incorporation of the automatic SINPE (c) Statistical Analysis was shown to be reliable and equal to the manually done, validating its use as a research tool for medical research.
Parolini, Giuditta
2015-01-01
During the twentieth century statistical methods have transformed research in the experimental and social sciences. Qualitative evidence has largely been replaced by quantitative results and the tools of statistical inference have helped foster a new ideal of objectivity in scientific knowledge. The paper will investigate this transformation by considering the genesis of analysis of variance and experimental design, statistical methods nowadays taught in every elementary course of statistics for the experimental and social sciences. These methods were developed by the mathematician and geneticist R. A. Fisher during the 1920s, while he was working at Rothamsted Experimental Station, where agricultural research was in turn reshaped by Fisher's methods. Analysis of variance and experimental design required new practices and instruments in field and laboratory research, and imposed a redistribution of expertise among statisticians, experimental scientists and the farm staff. On the other hand the use of statistical methods in agricultural science called for a systematization of information management and made computing an activity integral to the experimental research done at Rothamsted, permanently integrating the statisticians' tools and expertise into the station research programme. Fisher's statistical methods did not remain confined within agricultural research and by the end of the 1950s they had come to stay in psychology, sociology, education, chemistry, medicine, engineering, economics, quality control, just to mention a few of the disciplines which adopted them.
Conditional Probability Analysis: A Statistical Tool for Environmental Analysis.
The use and application of environmental conditional probability analysis (CPA) is relatively recent. The first presentation using CPA was made in 2002 at the New England Association of Environmental Biologists Annual Meeting in Newport. Rhode Island. CPA has been used since the...
NASA Astrophysics Data System (ADS)
Ryazanova, A. A.; Okladnikov, I. G.; Gordov, E. P.
2017-11-01
The frequency of occurrence and magnitude of precipitation and temperature extreme events show positive trends in several geographical regions. These events must be analyzed and studied in order to better understand their impact on the environment, predict their occurrences, and mitigate their effects. For this purpose, we augmented web-GIS called “CLIMATE” to include a dedicated statistical package developed in the R language. The web-GIS “CLIMATE” is a software platform for cloud storage processing and visualization of distributed archives of spatial datasets. It is based on a combined use of web and GIS technologies with reliable procedures for searching, extracting, processing, and visualizing the spatial data archives. The system provides a set of thematic online tools for the complex analysis of current and future climate changes and their effects on the environment. The package includes new powerful methods of time-dependent statistics of extremes, quantile regression and copula approach for the detailed analysis of various climate extreme events. Specifically, the very promising copula approach allows obtaining the structural connections between the extremes and the various environmental characteristics. The new statistical methods integrated into the web-GIS “CLIMATE” can significantly facilitate and accelerate the complex analysis of climate extremes using only a desktop PC connected to the Internet.
Network Analysis Tools: from biological networks to clusters and pathways.
Brohée, Sylvain; Faust, Karoline; Lima-Mendez, Gipsi; Vanderstocken, Gilles; van Helden, Jacques
2008-01-01
Network Analysis Tools (NeAT) is a suite of computer tools that integrate various algorithms for the analysis of biological networks: comparison between graphs, between clusters, or between graphs and clusters; network randomization; analysis of degree distribution; network-based clustering and path finding. The tools are interconnected to enable a stepwise analysis of the network through a complete analytical workflow. In this protocol, we present a typical case of utilization, where the tasks above are combined to decipher a protein-protein interaction network retrieved from the STRING database. The results returned by NeAT are typically subnetworks, networks enriched with additional information (i.e., clusters or paths) or tables displaying statistics. Typical networks comprising several thousands of nodes and arcs can be analyzed within a few minutes. The complete protocol can be read and executed in approximately 1 h.
Application of Transformations in Parametric Inference
ERIC Educational Resources Information Center
Brownstein, Naomi; Pensky, Marianna
2008-01-01
The objective of the present paper is to provide a simple approach to statistical inference using the method of transformations of variables. We demonstrate performance of this powerful tool on examples of constructions of various estimation procedures, hypothesis testing, Bayes analysis and statistical inference for the stress-strength systems.…
Bayesian Posterior Odds Ratios: Statistical Tools for Collaborative Evaluations
ERIC Educational Resources Information Center
Hicks, Tyler; Rodríguez-Campos, Liliana; Choi, Jeong Hoon
2018-01-01
To begin statistical analysis, Bayesians quantify their confidence in modeling hypotheses with priors. A prior describes the probability of a certain modeling hypothesis apart from the data. Bayesians should be able to defend their choice of prior to a skeptical audience. Collaboration between evaluators and stakeholders could make their choices…
2011-07-01
joined the project team in the statistical and research coordination role. Dr. Collin is an employee at the University of Pittsburgh. A successful...3. Submit to Ft. Detrick Completed Milestone: Statistical analysis planning 1. Review planned data metrics and data gathering tools...approach to performance assessment for continuous quality improvement. Analyzing data with modern statistical techniques to determine the
Shu, Jie; Dolman, G E; Duan, Jiang; Qiu, Guoping; Ilyas, Mohammad
2016-04-27
Colour is the most important feature used in quantitative immunohistochemistry (IHC) image analysis; IHC is used to provide information relating to aetiology and to confirm malignancy. Statistical modelling is a technique widely used for colour detection in computer vision. We have developed a statistical model of colour detection applicable to detection of stain colour in digital IHC images. Model was first trained by massive colour pixels collected semi-automatically. To speed up the training and detection processes, we removed luminance channel, Y channel of YCbCr colour space and chose 128 histogram bins which is the optimal number. A maximum likelihood classifier is used to classify pixels in digital slides into positively or negatively stained pixels automatically. The model-based tool was developed within ImageJ to quantify targets identified using IHC and histochemistry. The purpose of evaluation was to compare the computer model with human evaluation. Several large datasets were prepared and obtained from human oesophageal cancer, colon cancer and liver cirrhosis with different colour stains. Experimental results have demonstrated the model-based tool achieves more accurate results than colour deconvolution and CMYK model in the detection of brown colour, and is comparable to colour deconvolution in the detection of pink colour. We have also demostrated the proposed model has little inter-dataset variations. A robust and effective statistical model is introduced in this paper. The model-based interactive tool in ImageJ, which can create a visual representation of the statistical model and detect a specified colour automatically, is easy to use and available freely at http://rsb.info.nih.gov/ij/plugins/ihc-toolbox/index.html . Testing to the tool by different users showed only minor inter-observer variations in results.
CADDIS Volume 4. Data Analysis: Download Software
Overview of the data analysis tools available for download on CADDIS. Provides instructions for downloading and installing CADStat, access to Microsoft Excel macro for computing SSDs, a brief overview of command line use of R, a statistical software.
NASA Technical Reports Server (NTRS)
Wolf, S. F.; Lipschutz, M. E.
1993-01-01
Multivariate statistical analysis techniques (linear discriminant analysis and logistic regression) can provide powerful discrimination tools which are generally unfamiliar to the planetary science community. Fall parameters were used to identify a group of 17 H chondrites (Cluster 1) that were part of a coorbital stream which intersected Earth's orbit in May, from 1855 - 1895, and can be distinguished from all other H chondrite falls. Using multivariate statistical techniques, it was demonstrated that a totally different criterion, labile trace element contents - hence thermal histories - or 13 Cluster 1 meteorites are distinguishable from those of 45 non-Cluster 1 H chondrites. Here, we focus upon the principles of multivariate statistical techniques and illustrate their application using non-meteoritic and meteoritic examples.
[Factor Analysis: Principles to Evaluate Measurement Tools for Mental Health].
Campo-Arias, Adalberto; Herazo, Edwin; Oviedo, Heidi Celina
2012-09-01
The validation of a measurement tool in mental health is a complex process that usually starts by estimating reliability, to later approach its validity. Factor analysis is a way to know the number of dimensions, domains or factors of a measuring tool, generally related to the construct validity of the scale. The analysis could be exploratory or confirmatory, and helps in the selection of the items with better performance. For an acceptable factor analysis, it is necessary to follow some steps and recommendations, conduct some statistical tests, and rely on a proper sample of participants. Copyright © 2012 Asociación Colombiana de Psiquiatría. Publicado por Elsevier España. All rights reserved.
Lee, L.; Helsel, D.
2005-01-01
Trace contaminants in water, including metals and organics, often are measured at sufficiently low concentrations to be reported only as values below the instrument detection limit. Interpretation of these "less thans" is complicated when multiple detection limits occur. Statistical methods for multiply censored, or multiple-detection limit, datasets have been developed for medical and industrial statistics, and can be employed to estimate summary statistics or model the distributions of trace-level environmental data. We describe S-language-based software tools that perform robust linear regression on order statistics (ROS). The ROS method has been evaluated as one of the most reliable procedures for developing summary statistics of multiply censored data. It is applicable to any dataset that has 0 to 80% of its values censored. These tools are a part of a software library, or add-on package, for the R environment for statistical computing. This library can be used to generate ROS models and associated summary statistics, plot modeled distributions, and predict exceedance probabilities of water-quality standards. ?? 2005 Elsevier Ltd. All rights reserved.
Webb-Vargas, Yenny; Chen, Shaojie; Fisher, Aaron; Mejia, Amanda; Xu, Yuting; Crainiceanu, Ciprian; Caffo, Brian; Lindquist, Martin A
2017-12-01
Big Data are of increasing importance in a variety of areas, especially in the biosciences. There is an emerging critical need for Big Data tools and methods, because of the potential impact of advancements in these areas. Importantly, statisticians and statistical thinking have a major role to play in creating meaningful progress in this arena. We would like to emphasize this point in this special issue, as it highlights both the dramatic need for statistical input for Big Data analysis and for a greater number of statisticians working on Big Data problems. We use the field of statistical neuroimaging to demonstrate these points. As such, this paper covers several applications and novel methodological developments of Big Data tools applied to neuroimaging data.
Choi, Hyungwon; Kim, Sinae; Fermin, Damian; Tsou, Chih-Chiang; Nesvizhskii, Alexey I
2015-11-03
We introduce QPROT, a statistical framework and computational tool for differential protein expression analysis using protein intensity data. QPROT is an extension of the QSPEC suite, originally developed for spectral count data, adapted for the analysis using continuously measured protein-level intensity data. QPROT offers a new intensity normalization procedure and model-based differential expression analysis, both of which account for missing data. Determination of differential expression of each protein is based on the standardized Z-statistic based on the posterior distribution of the log fold change parameter, guided by the false discovery rate estimated by a well-known Empirical Bayes method. We evaluated the classification performance of QPROT using the quantification calibration data from the clinical proteomic technology assessment for cancer (CPTAC) study and a recently published Escherichia coli benchmark dataset, with evaluation of FDR accuracy in the latter. QPROT is a statistical framework with computational software tool for comparative quantitative proteomics analysis. It features various extensions of QSPEC method originally built for spectral count data analysis, including probabilistic treatment of missing values in protein intensity data. With the increasing popularity of label-free quantitative proteomics data, the proposed method and accompanying software suite will be immediately useful for many proteomics laboratories. This article is part of a Special Issue entitled: Computational Proteomics. Copyright © 2015 Elsevier B.V. All rights reserved.
Advantages of Social Network Analysis in Educational Research
ERIC Educational Resources Information Center
Ushakov, K. M.; Kukso, K. N.
2015-01-01
Currently one of the main tools for the large scale studies of schools is statistical analysis. Although it is the most common method and it offers greatest opportunities for analysis, there are other quantitative methods for studying schools, such as network analysis. We discuss the potential advantages that network analysis has for educational…
A New Paradigm to Analyze Data Completeness of Patient Data.
Nasir, Ayan; Gurupur, Varadraj; Liu, Xinliang
2016-08-03
There is a need to develop a tool that will measure data completeness of patient records using sophisticated statistical metrics. Patient data integrity is important in providing timely and appropriate care. Completeness is an important step, with an emphasis on understanding the complex relationships between data fields and their relative importance in delivering care. This tool will not only help understand where data problems are but also help uncover the underlying issues behind them. Develop a tool that can be used alongside a variety of health care database software packages to determine the completeness of individual patient records as well as aggregate patient records across health care centers and subpopulations. The methodology of this project is encapsulated within the Data Completeness Analysis Package (DCAP) tool, with the major components including concept mapping, CSV parsing, and statistical analysis. The results from testing DCAP with Healthcare Cost and Utilization Project (HCUP) State Inpatient Database (SID) data show that this tool is successful in identifying relative data completeness at the patient, subpopulation, and database levels. These results also solidify a need for further analysis and call for hypothesis driven research to find underlying causes for data incompleteness. DCAP examines patient records and generates statistics that can be used to determine the completeness of individual patient data as well as the general thoroughness of record keeping in a medical database. DCAP uses a component that is customized to the settings of the software package used for storing patient data as well as a Comma Separated Values (CSV) file parser to determine the appropriate measurements. DCAP itself is assessed through a proof of concept exercise using hypothetical data as well as available HCUP SID patient data.
A New Paradigm to Analyze Data Completeness of Patient Data
Nasir, Ayan; Liu, Xinliang
2016-01-01
Summary Background There is a need to develop a tool that will measure data completeness of patient records using sophisticated statistical metrics. Patient data integrity is important in providing timely and appropriate care. Completeness is an important step, with an emphasis on understanding the complex relationships between data fields and their relative importance in delivering care. This tool will not only help understand where data problems are but also help uncover the underlying issues behind them. Objectives Develop a tool that can be used alongside a variety of health care database software packages to determine the completeness of individual patient records as well as aggregate patient records across health care centers and subpopulations. Methods The methodology of this project is encapsulated within the Data Completeness Analysis Package (DCAP) tool, with the major components including concept mapping, CSV parsing, and statistical analysis. Results The results from testing DCAP with Healthcare Cost and Utilization Project (HCUP) State Inpatient Database (SID) data show that this tool is successful in identifying relative data completeness at the patient, subpopulation, and database levels. These results also solidify a need for further analysis and call for hypothesis driven research to find underlying causes for data incompleteness. Conclusion DCAP examines patient records and generates statistics that can be used to determine the completeness of individual patient data as well as the general thoroughness of record keeping in a medical database. DCAP uses a component that is customized to the settings of the software package used for storing patient data as well as a Comma Separated Values (CSV) file parser to determine the appropriate measurements. DCAP itself is assessed through a proof of concept exercise using hypothetical data as well as available HCUP SID patient data. PMID:27484918
The use and misuse of statistical methodologies in pharmacology research.
Marino, Michael J
2014-01-01
Descriptive, exploratory, and inferential statistics are necessary components of hypothesis-driven biomedical research. Despite the ubiquitous need for these tools, the emphasis on statistical methods in pharmacology has become dominated by inferential methods often chosen more by the availability of user-friendly software than by any understanding of the data set or the critical assumptions of the statistical tests. Such frank misuse of statistical methodology and the quest to reach the mystical α<0.05 criteria has hampered research via the publication of incorrect analysis driven by rudimentary statistical training. Perhaps more critically, a poor understanding of statistical tools limits the conclusions that may be drawn from a study by divorcing the investigator from their own data. The net result is a decrease in quality and confidence in research findings, fueling recent controversies over the reproducibility of high profile findings and effects that appear to diminish over time. The recent development of "omics" approaches leading to the production of massive higher dimensional data sets has amplified these issues making it clear that new approaches are needed to appropriately and effectively mine this type of data. Unfortunately, statistical education in the field has not kept pace. This commentary provides a foundation for an intuitive understanding of statistics that fosters an exploratory approach and an appreciation for the assumptions of various statistical tests that hopefully will increase the correct use of statistics, the application of exploratory data analysis, and the use of statistical study design, with the goal of increasing reproducibility and confidence in the literature. Copyright © 2013. Published by Elsevier Inc.
A new statistical methodology predicting chip failure probability considering electromigration
NASA Astrophysics Data System (ADS)
Sun, Ted
In this research thesis, we present a new approach to analyze chip reliability subject to electromigration (EM) whose fundamental causes and EM phenomenon happened in different materials are presented in this thesis. This new approach utilizes the statistical nature of EM failure in order to assess overall EM risk. It includes within-die temperature variations from the chip's temperature map extracted by an Electronic Design Automation (EDA) tool to estimate the failure probability of a design. Both the power estimation and thermal analysis are performed in the EDA flow. We first used the traditional EM approach to analyze the design with a single temperature across the entire chip that involves 6 metal and 5 via layers. Next, we used the same traditional approach but with a realistic temperature map. The traditional EM analysis approach and that coupled with a temperature map and the comparison between the results of considering and not considering temperature map are presented in in this research. A comparison between these two results confirms that using a temperature map yields a less pessimistic estimation of the chip's EM risk. Finally, we employed the statistical methodology we developed considering a temperature map and different use-condition voltages and frequencies to estimate the overall failure probability of the chip. The statistical model established considers the scaling work with the usage of traditional Black equation and four major conditions. The statistical result comparisons are within our expectations. The results of this statistical analysis confirm that the chip level failure probability is higher i) at higher use-condition frequencies for all use-condition voltages, and ii) when a single temperature instead of a temperature map across the chip is considered. In this thesis, I start with an overall review on current design types, common flows, and necessary verifications and reliability checking steps used in this IC design industry. Furthermore, the important concepts about "Scripting Automation" which is used in all the integration of using diversified EDA tools in this research work are also described in detail with several examples and my completed coding works are also put in the appendix for your reference. Hopefully, this construction of my thesis will give readers a thorough understanding about my research work from the automation of EDA tools to the statistical data generation, from the nature of EM to the statistical model construction, and the comparisons among the traditional EM analysis and the statistical EM analysis approaches.
Model Performance Evaluation and Scenario Analysis (MPESA) Tutorial
This tool consists of two parts: model performance evaluation and scenario analysis (MPESA). The model performance evaluation consists of two components: model performance evaluation metrics and model diagnostics. These metrics provides modelers with statistical goodness-of-fit m...
MAGMA: Generalized Gene-Set Analysis of GWAS Data
de Leeuw, Christiaan A.; Mooij, Joris M.; Heskes, Tom; Posthuma, Danielle
2015-01-01
By aggregating data for complex traits in a biologically meaningful way, gene and gene-set analysis constitute a valuable addition to single-marker analysis. However, although various methods for gene and gene-set analysis currently exist, they generally suffer from a number of issues. Statistical power for most methods is strongly affected by linkage disequilibrium between markers, multi-marker associations are often hard to detect, and the reliance on permutation to compute p-values tends to make the analysis computationally very expensive. To address these issues we have developed MAGMA, a novel tool for gene and gene-set analysis. The gene analysis is based on a multiple regression model, to provide better statistical performance. The gene-set analysis is built as a separate layer around the gene analysis for additional flexibility. This gene-set analysis also uses a regression structure to allow generalization to analysis of continuous properties of genes and simultaneous analysis of multiple gene sets and other gene properties. Simulations and an analysis of Crohn’s Disease data are used to evaluate the performance of MAGMA and to compare it to a number of other gene and gene-set analysis tools. The results show that MAGMA has significantly more power than other tools for both the gene and the gene-set analysis, identifying more genes and gene sets associated with Crohn’s Disease while maintaining a correct type 1 error rate. Moreover, the MAGMA analysis of the Crohn’s Disease data was found to be considerably faster as well. PMID:25885710
MAGMA: generalized gene-set analysis of GWAS data.
de Leeuw, Christiaan A; Mooij, Joris M; Heskes, Tom; Posthuma, Danielle
2015-04-01
By aggregating data for complex traits in a biologically meaningful way, gene and gene-set analysis constitute a valuable addition to single-marker analysis. However, although various methods for gene and gene-set analysis currently exist, they generally suffer from a number of issues. Statistical power for most methods is strongly affected by linkage disequilibrium between markers, multi-marker associations are often hard to detect, and the reliance on permutation to compute p-values tends to make the analysis computationally very expensive. To address these issues we have developed MAGMA, a novel tool for gene and gene-set analysis. The gene analysis is based on a multiple regression model, to provide better statistical performance. The gene-set analysis is built as a separate layer around the gene analysis for additional flexibility. This gene-set analysis also uses a regression structure to allow generalization to analysis of continuous properties of genes and simultaneous analysis of multiple gene sets and other gene properties. Simulations and an analysis of Crohn's Disease data are used to evaluate the performance of MAGMA and to compare it to a number of other gene and gene-set analysis tools. The results show that MAGMA has significantly more power than other tools for both the gene and the gene-set analysis, identifying more genes and gene sets associated with Crohn's Disease while maintaining a correct type 1 error rate. Moreover, the MAGMA analysis of the Crohn's Disease data was found to be considerably faster as well.
SOCR: Statistics Online Computational Resource
Dinov, Ivo D.
2011-01-01
The need for hands-on computer laboratory experience in undergraduate and graduate statistics education has been firmly established in the past decade. As a result a number of attempts have been undertaken to develop novel approaches for problem-driven statistical thinking, data analysis and result interpretation. In this paper we describe an integrated educational web-based framework for: interactive distribution modeling, virtual online probability experimentation, statistical data analysis, visualization and integration. Following years of experience in statistical teaching at all college levels using established licensed statistical software packages, like STATA, S-PLUS, R, SPSS, SAS, Systat, etc., we have attempted to engineer a new statistics education environment, the Statistics Online Computational Resource (SOCR). This resource performs many of the standard types of statistical analysis, much like other classical tools. In addition, it is designed in a plug-in object-oriented architecture and is completely platform independent, web-based, interactive, extensible and secure. Over the past 4 years we have tested, fine-tuned and reanalyzed the SOCR framework in many of our undergraduate and graduate probability and statistics courses and have evidence that SOCR resources build student’s intuition and enhance their learning. PMID:21451741
Rákosi, Csilla
2018-01-22
This paper proposes the use of the tools of statistical meta-analysis as a method of conflict resolution with respect to experiments in cognitive linguistics. With the help of statistical meta-analysis, the effect size of similar experiments can be compared, a well-founded and robust synthesis of the experimental data can be achieved, and possible causes of any divergence(s) in the outcomes can be revealed. This application of statistical meta-analysis offers a novel method of how diverging evidence can be dealt with. The workability of this idea is exemplified by a case study dealing with a series of experiments conducted as non-exact replications of Thibodeau and Boroditsky (PLoS ONE 6(2):e16782, 2011. https://doi.org/10.1371/journal.pone.0016782 ).
Lightweight and Statistical Techniques for Petascale PetaScale Debugging
DOE Office of Scientific and Technical Information (OSTI.GOV)
Miller, Barton
2014-06-30
This project investigated novel techniques for debugging scientific applications on petascale architectures. In particular, we developed lightweight tools that narrow the problem space when bugs are encountered. We also developed techniques that either limit the number of tasks and the code regions to which a developer must apply a traditional debugger or that apply statistical techniques to provide direct suggestions of the location and type of error. We extend previous work on the Stack Trace Analysis Tool (STAT), that has already demonstrated scalability to over one hundred thousand MPI tasks. We also extended statistical techniques developed to isolate programming errorsmore » in widely used sequential or threaded applications in the Cooperative Bug Isolation (CBI) project to large scale parallel applications. Overall, our research substantially improved productivity on petascale platforms through a tool set for debugging that complements existing commercial tools. Previously, Office Of Science application developers relied either on primitive manual debugging techniques based on printf or they use tools, such as TotalView, that do not scale beyond a few thousand processors. However, bugs often arise at scale and substantial effort and computation cycles are wasted in either reproducing the problem in a smaller run that can be analyzed with the traditional tools or in repeated runs at scale that use the primitive techniques. New techniques that work at scale and automate the process of identifying the root cause of errors were needed. These techniques significantly reduced the time spent debugging petascale applications, thus leading to a greater overall amount of time for application scientists to pursue the scientific objectives for which the systems are purchased. We developed a new paradigm for debugging at scale: techniques that reduced the debugging scenario to a scale suitable for traditional debuggers, e.g., by narrowing the search for the root-cause analysis to a small set of nodes or by identifying equivalence classes of nodes and sampling our debug targets from them. We implemented these techniques as lightweight tools that efficiently work on the full scale of the target machine. We explored four lightweight debugging refinements: generic classification parameters, such as stack traces, application-specific classification parameters, such as global variables, statistical data acquisition techniques and machine learning based approaches to perform root cause analysis. Work done under this project can be divided into two categories, new algorithms and techniques for scalable debugging, and foundation infrastructure work on our MRNet multicast-reduction framework for scalability, and Dyninst binary analysis and instrumentation toolkits.« less
Davidson, Robert L; Weber, Ralf J M; Liu, Haoyu; Sharma-Oates, Archana; Viant, Mark R
2016-01-01
Metabolomics is increasingly recognized as an invaluable tool in the biological, medical and environmental sciences yet lags behind the methodological maturity of other omics fields. To achieve its full potential, including the integration of multiple omics modalities, the accessibility, standardization and reproducibility of computational metabolomics tools must be improved significantly. Here we present our end-to-end mass spectrometry metabolomics workflow in the widely used platform, Galaxy. Named Galaxy-M, our workflow has been developed for both direct infusion mass spectrometry (DIMS) and liquid chromatography mass spectrometry (LC-MS) metabolomics. The range of tools presented spans from processing of raw data, e.g. peak picking and alignment, through data cleansing, e.g. missing value imputation, to preparation for statistical analysis, e.g. normalization and scaling, and principal components analysis (PCA) with associated statistical evaluation. We demonstrate the ease of using these Galaxy workflows via the analysis of DIMS and LC-MS datasets, and provide PCA scores and associated statistics to help other users to ensure that they can accurately repeat the processing and analysis of these two datasets. Galaxy and data are all provided pre-installed in a virtual machine (VM) that can be downloaded from the GigaDB repository. Additionally, source code, executables and installation instructions are available from GitHub. The Galaxy platform has enabled us to produce an easily accessible and reproducible computational metabolomics workflow. More tools could be added by the community to expand its functionality. We recommend that Galaxy-M workflow files are included within the supplementary information of publications, enabling metabolomics studies to achieve greater reproducibility.
The Power of 'Evidence': Reliable Science or a Set of Blunt Tools?
ERIC Educational Resources Information Center
Wrigley, Terry
2018-01-01
In response to the increasing emphasis on 'evidence-based teaching', this article examines the privileging of randomised controlled trials and their statistical synthesis (meta-analysis). It also pays particular attention to two third-level statistical syntheses: John Hattie's "Visible learning" project and the EEF's "Teaching and…
Kaneta, Tomohiro; Nakatsuka, Masahiro; Nakamura, Kei; Seki, Takashi; Yamaguchi, Satoshi; Tsuboi, Masahiro; Meguro, Kenichi
2016-01-01
SPECT is an important diagnostic tool for dementia. Recently, statistical analysis of SPECT has been commonly used for dementia research. In this study, we evaluated the accuracy of visual SPECT evaluation and/or statistical analysis for the diagnosis (Dx) of Alzheimer disease (AD) and other forms of dementia in our community-based study "The Osaki-Tajiri Project." Eighty-nine consecutive outpatients with dementia were enrolled and underwent brain perfusion SPECT with 99mTc-ECD. Diagnostic accuracy of SPECT was tested using 3 methods: visual inspection (SPECT Dx), automated diagnostic tool using statistical analysis with easy Z-score imaging system (eZIS Dx), and visual inspection plus eZIS (integrated Dx). Integrated Dx showed the highest sensitivity, specificity, and accuracy, whereas eZIS was the second most accurate method. We also observed that a higher than expected rate of SPECT images indicated false-negative cases of AD. Among these, 50% showed hypofrontality and were diagnosed as frontotemporal lobar degeneration. These cases typically showed regional "hot spots" in the primary sensorimotor cortex (ie, a sensorimotor hot spot sign), which we determined were associated with AD rather than frontotemporal lobar degeneration. We concluded that the diagnostic abilities were improved by the integrated use of visual assessment and statistical analysis. In addition, the detection of a sensorimotor hot spot sign was useful to detect AD when hypofrontality is present and improved the ability to properly diagnose AD.
A Tool for Estimating Variability in Wood Preservative Treatment Retention
Patricia K. Lebow; Adam M. Taylor; Timothy M. Young
2015-01-01
Composite sampling is standard practice for evaluation of preservative retention levels in preservative-treated wood. Current protocols provide an average retention value but no estimate of uncertainty. Here we describe a statistical method for calculating uncertainty estimates using the standard sampling regime with minimal additional chemical analysis. This tool can...
Automatic Rock Detection and Mapping from HiRISE Imagery
NASA Technical Reports Server (NTRS)
Huertas, Andres; Adams, Douglas S.; Cheng, Yang
2008-01-01
This system includes a C-code software program and a set of MATLAB software tools for statistical analysis and rock distribution mapping. The major functions include rock detection and rock detection validation. The rock detection code has been evolved into a production tool that can be used by engineers and geologists with minor training.
Proposing a Mathematical Software Tool in Physics Secondary Education
ERIC Educational Resources Information Center
Baltzis, Konstantinos B.
2009-01-01
MathCad® is a very popular software tool for mathematical and statistical analysis in science and engineering. Its low cost, ease of use, extensive function library, and worksheet-like user interface distinguish it among other commercial packages. Its features are also well suited to educational process. The use of natural mathematical notation…
ERIC Educational Resources Information Center
Hicks, Catherine
2018-01-01
Purpose: This paper aims to explore predicting employee learning activity via employee characteristics and usage for two online learning tools. Design/methodology/approach: Statistical analysis focused on observational data collected from user logs. Data are analyzed via regression models. Findings: Findings are presented for over 40,000…
Study Designs and Statistical Analyses for Biomarker Research
Gosho, Masahiko; Nagashima, Kengo; Sato, Yasunori
2012-01-01
Biomarkers are becoming increasingly important for streamlining drug discovery and development. In addition, biomarkers are widely expected to be used as a tool for disease diagnosis, personalized medication, and surrogate endpoints in clinical research. In this paper, we highlight several important aspects related to study design and statistical analysis for clinical research incorporating biomarkers. We describe the typical and current study designs for exploring, detecting, and utilizing biomarkers. Furthermore, we introduce statistical issues such as confounding and multiplicity for statistical tests in biomarker research. PMID:23012528
Weidner, Christopher; Fischer, Cornelius; Sauer, Sascha
2014-12-01
We introduce PHOXTRACK (PHOsphosite-X-TRacing Analysis of Causal Kinases), a user-friendly freely available software tool for analyzing large datasets of post-translational modifications of proteins, such as phosphorylation, which are commonly gained by mass spectrometry detection. In contrast to other currently applied data analysis approaches, PHOXTRACK uses full sets of quantitative proteomics data and applies non-parametric statistics to calculate whether defined kinase-specific sets of phosphosite sequences indicate statistically significant concordant differences between various biological conditions. PHOXTRACK is an efficient tool for extracting post-translational information of comprehensive proteomics datasets to decipher key regulatory proteins and to infer biologically relevant molecular pathways. PHOXTRACK will be maintained over the next years and is freely available as an online tool for non-commercial use at http://phoxtrack.molgen.mpg.de. Users will also find a tutorial at this Web site and can additionally give feedback at https://groups.google.com/d/forum/phoxtrack-discuss. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
CORSSA: The Community Online Resource for Statistical Seismicity Analysis
Michael, Andrew J.; Wiemer, Stefan
2010-01-01
Statistical seismology is the application of rigorous statistical methods to earthquake science with the goal of improving our knowledge of how the earth works. Within statistical seismology there is a strong emphasis on the analysis of seismicity data in order to improve our scientific understanding of earthquakes and to improve the evaluation and testing of earthquake forecasts, earthquake early warning, and seismic hazards assessments. Given the societal importance of these applications, statistical seismology must be done well. Unfortunately, a lack of educational resources and available software tools make it difficult for students and new practitioners to learn about this discipline. The goal of the Community Online Resource for Statistical Seismicity Analysis (CORSSA) is to promote excellence in statistical seismology by providing the knowledge and resources necessary to understand and implement the best practices, so that the reader can apply these methods to their own research. This introduction describes the motivation for and vision of CORRSA. It also describes its structure and contents.
Jaiswara, Ranjana; Nandi, Diptarup; Balakrishnan, Rohini
2013-01-01
Traditional taxonomy based on morphology has often failed in accurate species identification owing to the occurrence of cryptic species, which are reproductively isolated but morphologically identical. Molecular data have thus been used to complement morphology in species identification. The sexual advertisement calls in several groups of acoustically communicating animals are species-specific and can thus complement molecular data as non-invasive tools for identification. Several statistical tools and automated identifier algorithms have been used to investigate the efficiency of acoustic signals in species identification. Despite a plethora of such methods, there is a general lack of knowledge regarding the appropriate usage of these methods in specific taxa. In this study, we investigated the performance of two commonly used statistical methods, discriminant function analysis (DFA) and cluster analysis, in identification and classification based on acoustic signals of field cricket species belonging to the subfamily Gryllinae. Using a comparative approach we evaluated the optimal number of species and calling song characteristics for both the methods that lead to most accurate classification and identification. The accuracy of classification using DFA was high and was not affected by the number of taxa used. However, a constraint in using discriminant function analysis is the need for a priori classification of songs. Accuracy of classification using cluster analysis, which does not require a priori knowledge, was maximum for 6-7 taxa and decreased significantly when more than ten taxa were analysed together. We also investigated the efficacy of two novel derived acoustic features in improving the accuracy of identification. Our results show that DFA is a reliable statistical tool for species identification using acoustic signals. Our results also show that cluster analysis of acoustic signals in crickets works effectively for species classification and identification.
Multiple comparison analysis testing in ANOVA.
McHugh, Mary L
2011-01-01
The Analysis of Variance (ANOVA) test has long been an important tool for researchers conducting studies on multiple experimental groups and one or more control groups. However, ANOVA cannot provide detailed information on differences among the various study groups, or on complex combinations of study groups. To fully understand group differences in an ANOVA, researchers must conduct tests of the differences between particular pairs of experimental and control groups. Tests conducted on subsets of data tested previously in another analysis are called post hoc tests. A class of post hoc tests that provide this type of detailed information for ANOVA results are called "multiple comparison analysis" tests. The most commonly used multiple comparison analysis statistics include the following tests: Tukey, Newman-Keuls, Scheffee, Bonferroni and Dunnett. These statistical tools each have specific uses, advantages and disadvantages. Some are best used for testing theory while others are useful in generating new theory. Selection of the appropriate post hoc test will provide researchers with the most detailed information while limiting Type 1 errors due to alpha inflation.
Single-Case Time Series with Bayesian Analysis: A Practitioner's Guide.
ERIC Educational Resources Information Center
Jones, W. Paul
2003-01-01
This article illustrates a simplified time series analysis for use by the counseling researcher practitioner in single-case baseline plus intervention studies with a Bayesian probability analysis to integrate findings from replications. The C statistic is recommended as a primary analysis tool with particular relevance in the context of actual…
A κ-generalized statistical mechanics approach to income analysis
NASA Astrophysics Data System (ADS)
Clementi, F.; Gallegati, M.; Kaniadakis, G.
2009-02-01
This paper proposes a statistical mechanics approach to the analysis of income distribution and inequality. A new distribution function, having its roots in the framework of κ-generalized statistics, is derived that is particularly suitable for describing the whole spectrum of incomes, from the low-middle income region up to the high income Pareto power-law regime. Analytical expressions for the shape, moments and some other basic statistical properties are given. Furthermore, several well-known econometric tools for measuring inequality, which all exist in a closed form, are considered. A method for parameter estimation is also discussed. The model is shown to fit remarkably well the data on personal income for the United States, and the analysis of inequality performed in terms of its parameters is revealed as very powerful.
On Designing Multicore-Aware Simulators for Systems Biology Endowed with OnLine Statistics
Calcagno, Cristina; Coppo, Mario
2014-01-01
The paper arguments are on enabling methodologies for the design of a fully parallel, online, interactive tool aiming to support the bioinformatics scientists .In particular, the features of these methodologies, supported by the FastFlow parallel programming framework, are shown on a simulation tool to perform the modeling, the tuning, and the sensitivity analysis of stochastic biological models. A stochastic simulation needs thousands of independent simulation trajectories turning into big data that should be analysed by statistic and data mining tools. In the considered approach the two stages are pipelined in such a way that the simulation stage streams out the partial results of all simulation trajectories to the analysis stage that immediately produces a partial result. The simulation-analysis workflow is validated for performance and effectiveness of the online analysis in capturing biological systems behavior on a multicore platform and representative proof-of-concept biological systems. The exploited methodologies include pattern-based parallel programming and data streaming that provide key features to the software designers such as performance portability and efficient in-memory (big) data management and movement. Two paradigmatic classes of biological systems exhibiting multistable and oscillatory behavior are used as a testbed. PMID:25050327
On designing multicore-aware simulators for systems biology endowed with OnLine statistics.
Aldinucci, Marco; Calcagno, Cristina; Coppo, Mario; Damiani, Ferruccio; Drocco, Maurizio; Sciacca, Eva; Spinella, Salvatore; Torquati, Massimo; Troina, Angelo
2014-01-01
The paper arguments are on enabling methodologies for the design of a fully parallel, online, interactive tool aiming to support the bioinformatics scientists .In particular, the features of these methodologies, supported by the FastFlow parallel programming framework, are shown on a simulation tool to perform the modeling, the tuning, and the sensitivity analysis of stochastic biological models. A stochastic simulation needs thousands of independent simulation trajectories turning into big data that should be analysed by statistic and data mining tools. In the considered approach the two stages are pipelined in such a way that the simulation stage streams out the partial results of all simulation trajectories to the analysis stage that immediately produces a partial result. The simulation-analysis workflow is validated for performance and effectiveness of the online analysis in capturing biological systems behavior on a multicore platform and representative proof-of-concept biological systems. The exploited methodologies include pattern-based parallel programming and data streaming that provide key features to the software designers such as performance portability and efficient in-memory (big) data management and movement. Two paradigmatic classes of biological systems exhibiting multistable and oscillatory behavior are used as a testbed.
Constructing and Modifying Sequence Statistics for relevent Using informR in 𝖱
Marcum, Christopher Steven; Butts, Carter T.
2015-01-01
The informR package greatly simplifies the analysis of complex event histories in 𝖱 by providing user friendly tools to build sufficient statistics for the relevent package. Historically, building sufficient statistics to model event sequences (of the form a→b) using the egocentric generalization of Butts’ (2008) relational event framework for modeling social action has been cumbersome. The informR package simplifies the construction of the complex list of arrays needed by the rem() model fitting for a variety of cases involving egocentric event data, multiple event types, and/or support constraints. This paper introduces these tools using examples from real data extracted from the American Time Use Survey. PMID:26185488
RipleyGUI: software for analyzing spatial patterns in 3D cell distributions
Hansson, Kristin; Jafari-Mamaghani, Mehrdad; Krieger, Patrik
2013-01-01
The true revolution in the age of digital neuroanatomy is the ability to extensively quantify anatomical structures and thus investigate structure-function relationships in great detail. To facilitate the quantification of neuronal cell patterns we have developed RipleyGUI, a MATLAB-based software that can be used to detect patterns in the 3D distribution of cells. RipleyGUI uses Ripley's K-function to analyze spatial distributions. In addition the software contains statistical tools to determine quantitative statistical differences, and tools for spatial transformations that are useful for analyzing non-stationary point patterns. The software has a graphical user interface making it easy to use without programming experience, and an extensive user manual explaining the basic concepts underlying the different statistical tools used to analyze spatial point patterns. The described analysis tool can be used for determining the spatial organization of neurons that is important for a detailed study of structure-function relationships. For example, neocortex that can be subdivided into six layers based on cell density and cell types can also be analyzed in terms of organizational principles distinguishing the layers. PMID:23658544
Robbins, L G
2000-01-01
Graduate school programs in genetics have become so full that courses in statistics have often been eliminated. In addition, typical introductory statistics courses for the "statistics user" rather than the nascent statistician are laden with methods for analysis of measured variables while genetic data are most often discrete numbers. These courses are often seen by students and genetics professors alike as largely irrelevant cookbook courses. The powerful methods of likelihood analysis, although commonly employed in human genetics, are much less often used in other areas of genetics, even though current computational tools make this approach readily accessible. This article introduces the MLIKELY.PAS computer program and the logic of do-it-yourself maximum-likelihood statistics. The program itself, course materials, and expanded discussions of some examples that are only summarized here are available at http://www.unisi. it/ricerca/dip/bio_evol/sitomlikely/mlikely.h tml. PMID:10628965
Scala, Giovanni; Affinito, Ornella; Palumbo, Domenico; Florio, Ermanno; Monticelli, Antonella; Miele, Gennaro; Chiariotti, Lorenzo; Cocozza, Sergio
2016-11-25
CpG sites in an individual molecule may exist in a binary state (methylated or unmethylated) and each individual DNA molecule, containing a certain number of CpGs, is a combination of these states defining an epihaplotype. Classic quantification based approaches to study DNA methylation are intrinsically unable to fully represent the complexity of the underlying methylation substrate. Epihaplotype based approaches, on the other hand, allow methylation profiles of cell populations to be studied at the single molecule level. For such investigations, next-generation sequencing techniques can be used, both for quantitative and for epihaplotype analysis. Currently available tools for methylation analysis lack output formats that explicitly report CpG methylation profiles at the single molecule level and that have suited statistical tools for their interpretation. Here we present ampliMethProfiler, a python-based pipeline for the extraction and statistical epihaplotype analysis of amplicons from targeted deep bisulfite sequencing of multiple DNA regions. ampliMethProfiler tool provides an easy and user friendly way to extract and analyze the epihaplotype composition of reads from targeted bisulfite sequencing experiments. ampliMethProfiler is written in python language and requires a local installation of BLAST and (optionally) QIIME tools. It can be run on Linux and OS X platforms. The software is open source and freely available at http://amplimethprofiler.sourceforge.net .
Kim, Min-Uk; Moon, Kyong Whan; Sohn, Jong-Ryeul; Byeon, Sang-Hoon
2018-05-18
We studied sensitive weather variables for consequence analysis, in the case of chemical leaks on the user side of offsite consequence analysis (OCA) tools. We used OCA tools Korea Offsite Risk Assessment (KORA) and Areal Location of Hazardous Atmospheres (ALOHA) in South Korea and the United States, respectively. The chemicals used for this analysis were 28% ammonia (NH₃), 35% hydrogen chloride (HCl), 50% hydrofluoric acid (HF), and 69% nitric acid (HNO₃). The accident scenarios were based on leakage accidents in storage tanks. The weather variables were air temperature, wind speed, humidity, and atmospheric stability. Sensitivity analysis was performed using the Statistical Package for the Social Sciences (SPSS) program for dummy regression analysis. Sensitivity analysis showed that impact distance was not sensitive to humidity. Impact distance was most sensitive to atmospheric stability, and was also more sensitive to air temperature than wind speed, according to both the KORA and ALOHA tools. Moreover, the weather variables were more sensitive in rural conditions than in urban conditions, with the ALOHA tool being more influenced by weather variables than the KORA tool. Therefore, if using the ALOHA tool instead of the KORA tool in rural conditions, users should be careful not to cause any differences in impact distance due to input errors of weather variables, with the most sensitive one being atmospheric stability.
White Matter Fiber-based Analysis of T1w/T2w Ratio Map.
Chen, Haiwei; Budin, Francois; Noel, Jean; Prieto, Juan Carlos; Gilmore, John; Rasmussen, Jerod; Wadhwa, Pathik D; Entringer, Sonja; Buss, Claudia; Styner, Martin
2017-02-01
To develop, test, evaluate and apply a novel tool for the white matter fiber-based analysis of T1w/T2w ratio maps quantifying myelin content. The cerebral white matter in the human brain develops from a mostly non-myelinated state to a nearly fully mature white matter myelination within the first few years of life. High resolution T1w/T2w ratio maps are believed to be effective in quantitatively estimating myelin content on a voxel-wise basis. We propose the use of a fiber-tract-based analysis of such T1w/T2w ratio data, as it allows us to separate fiber bundles that a common regional analysis imprecisely groups together, and to associate effects to specific tracts rather than large, broad regions. We developed an intuitive, open source tool to facilitate such fiber-based studies of T1w/T2w ratio maps. Via its Graphical User Interface (GUI) the tool is accessible to non-technical users. The framework uses calibrated T1w/T2w ratio maps and a prior fiber atlas as an input to generate profiles of T1w/T2w values. The resulting fiber profiles are used in a statistical analysis that performs along-tract functional statistical analysis. We applied this approach to a preliminary study of early brain development in neonates. We developed an open-source tool for the fiber based analysis of T1w/T2w ratio maps and tested it in a study of brain development.
White matter fiber-based analysis of T1w/T2w ratio map
NASA Astrophysics Data System (ADS)
Chen, Haiwei; Budin, Francois; Noel, Jean; Prieto, Juan Carlos; Gilmore, John; Rasmussen, Jerod; Wadhwa, Pathik D.; Entringer, Sonja; Buss, Claudia; Styner, Martin
2017-02-01
Purpose: To develop, test, evaluate and apply a novel tool for the white matter fiber-based analysis of T1w/T2w ratio maps quantifying myelin content. Background: The cerebral white matter in the human brain develops from a mostly non-myelinated state to a nearly fully mature white matter myelination within the first few years of life. High resolution T1w/T2w ratio maps are believed to be effective in quantitatively estimating myelin content on a voxel-wise basis. We propose the use of a fiber-tract-based analysis of such T1w/T2w ratio data, as it allows us to separate fiber bundles that a common regional analysis imprecisely groups together, and to associate effects to specific tracts rather than large, broad regions. Methods: We developed an intuitive, open source tool to facilitate such fiber-based studies of T1w/T2w ratio maps. Via its Graphical User Interface (GUI) the tool is accessible to non-technical users. The framework uses calibrated T1w/T2w ratio maps and a prior fiber atlas as an input to generate profiles of T1w/T2w values. The resulting fiber profiles are used in a statistical analysis that performs along-tract functional statistical analysis. We applied this approach to a preliminary study of early brain development in neonates. Results: We developed an open-source tool for the fiber based analysis of T1w/T2w ratio maps and tested it in a study of brain development.
2013-05-02
REPORT Statistical Relational Learning ( SRL ) as an Enabling Technology for Data Acquisition and Data Fusion in Video 14. ABSTRACT 16. SECURITY...particular, it is important to reason about which portions of video require expensive analysis and storage. This project aims to make these...inferences using new and existing tools from Statistical Relational Learning ( SRL ). SRL is a recently emerging technology that enables the effective 1
Velasco-Tapia, Fernando
2014-01-01
Magmatic processes have usually been identified and evaluated using qualitative or semiquantitative geochemical or isotopic tools based on a restricted number of variables. However, a more complete and quantitative view could be reached applying multivariate analysis, mass balance techniques, and statistical tests. As an example, in this work a statistical and quantitative scheme is applied to analyze the geochemical features for the Sierra de las Cruces (SC) volcanic range (Mexican Volcanic Belt). In this locality, the volcanic activity (3.7 to 0.5 Ma) was dominantly dacitic, but the presence of spheroidal andesitic enclaves and/or diverse disequilibrium features in majority of lavas confirms the operation of magma mixing/mingling. New discriminant-function-based multidimensional diagrams were used to discriminate tectonic setting. Statistical tests of discordancy and significance were applied to evaluate the influence of the subducting Cocos plate, which seems to be rather negligible for the SC magmas in relation to several major and trace elements. A cluster analysis following Ward's linkage rule was carried out to classify the SC volcanic rocks geochemical groups. Finally, two mass-balance schemes were applied for the quantitative evaluation of the proportion of the end-member components (dacitic and andesitic magmas) in the comingled lavas (binary mixtures).
Davis, J.C.
2000-01-01
Geologists may feel that geological data are not amenable to statistical analysis, or at best require specialized approaches such as nonparametric statistics and geostatistics. However, there are many circumstances, particularly in systematic studies conducted for environmental or regulatory purposes, where traditional parametric statistical procedures can be beneficial. An example is the application of analysis of variance to data collected in an annual program of measuring groundwater levels in Kansas. Influences such as well conditions, operator effects, and use of the water can be assessed and wells that yield less reliable measurements can be identified. Such statistical studies have resulted in yearly improvements in the quality and reliability of the collected hydrologic data. Similar benefits may be achieved in other geological studies by the appropriate use of classical statistical tools.
Zackay, Arie; Steinhoff, Christine
2010-12-15
Exploration of DNA methylation and its impact on various regulatory mechanisms has become a very active field of research. Simultaneously there is an arising need for tools to process and analyse the data together with statistical investigation and visualisation. MethVisual is a new application that enables exploratory analysis and intuitive visualization of DNA methylation data as is typically generated by bisulfite sequencing. The package allows the import of DNA methylation sequences, aligns them and performs quality control comparison. It comprises basic analysis steps as lollipop visualization, co-occurrence display of methylation of neighbouring and distant CpG sites, summary statistics on methylation status, clustering and correspondence analysis. The package has been developed for methylation data but can be also used for other data types for which binary coding can be inferred. The application of the package, as well as a comparison to existing DNA methylation analysis tools and its workflow based on two datasets is presented in this paper. The R package MethVisual offers various analysis procedures for data that can be binarized, in particular for bisulfite sequenced methylation data. R/Bioconductor has become one of the most important environments for statistical analysis of various types of biological and medical data. Therefore, any data analysis within R that allows the integration of various data types as provided from different technological platforms is convenient. It is the first and so far the only specific package for DNA methylation analysis, in particular for bisulfite sequenced data available in R/Bioconductor enviroment. The package is available for free at http://methvisual.molgen.mpg.de/ and from the Bioconductor Consortium http://www.bioconductor.org.
2010-01-01
Background Exploration of DNA methylation and its impact on various regulatory mechanisms has become a very active field of research. Simultaneously there is an arising need for tools to process and analyse the data together with statistical investigation and visualisation. Findings MethVisual is a new application that enables exploratory analysis and intuitive visualization of DNA methylation data as is typically generated by bisulfite sequencing. The package allows the import of DNA methylation sequences, aligns them and performs quality control comparison. It comprises basic analysis steps as lollipop visualization, co-occurrence display of methylation of neighbouring and distant CpG sites, summary statistics on methylation status, clustering and correspondence analysis. The package has been developed for methylation data but can be also used for other data types for which binary coding can be inferred. The application of the package, as well as a comparison to existing DNA methylation analysis tools and its workflow based on two datasets is presented in this paper. Conclusions The R package MethVisual offers various analysis procedures for data that can be binarized, in particular for bisulfite sequenced methylation data. R/Bioconductor has become one of the most important environments for statistical analysis of various types of biological and medical data. Therefore, any data analysis within R that allows the integration of various data types as provided from different technological platforms is convenient. It is the first and so far the only specific package for DNA methylation analysis, in particular for bisulfite sequenced data available in R/Bioconductor enviroment. The package is available for free at http://methvisual.molgen.mpg.de/ and from the Bioconductor Consortium http://www.bioconductor.org. PMID:21159174
A phylogenetic transform enhances analysis of compositional microbiota data
Silverman, Justin D; Washburne, Alex D; Mukherjee, Sayan; David, Lawrence A
2017-01-01
Surveys of microbial communities (microbiota), typically measured as relative abundance of species, have illustrated the importance of these communities in human health and disease. Yet, statistical artifacts commonly plague the analysis of relative abundance data. Here, we introduce the PhILR transform, which incorporates microbial evolutionary models with the isometric log-ratio transform to allow off-the-shelf statistical tools to be safely applied to microbiota surveys. We demonstrate that analyses of community-level structure can be applied to PhILR transformed data with performance on benchmarks rivaling or surpassing standard tools. Additionally, by decomposing distance in the PhILR transformed space, we identified neighboring clades that may have adapted to distinct human body sites. Decomposing variance revealed that covariation of bacterial clades within human body sites increases with phylogenetic relatedness. Together, these findings illustrate how the PhILR transform combines statistical and phylogenetic models to overcome compositional data challenges and enable evolutionary insights relevant to microbial communities. DOI: http://dx.doi.org/10.7554/eLife.21887.001 PMID:28198697
Teaching Students to Use Summary Statistics and Graphics to Clean and Analyze Data
ERIC Educational Resources Information Center
Holcomb, John; Spalsbury, Angela
2005-01-01
Textbooks and websites today abound with real data. One neglected issue is that statistical investigations often require a good deal of "cleaning" to ready data for analysis. The purpose of this dataset and exercise is to teach students to use exploratory tools to identify erroneous observations. This article discusses the merits of such…
Tools for Basic Statistical Analysis
NASA Technical Reports Server (NTRS)
Luz, Paul L.
2005-01-01
Statistical Analysis Toolset is a collection of eight Microsoft Excel spreadsheet programs, each of which performs calculations pertaining to an aspect of statistical analysis. These programs present input and output data in user-friendly, menu-driven formats, with automatic execution. The following types of calculations are performed: Descriptive statistics are computed for a set of data x(i) (i = 1, 2, 3 . . . ) entered by the user. Normal Distribution Estimates will calculate the statistical value that corresponds to cumulative probability values, given a sample mean and standard deviation of the normal distribution. Normal Distribution from two Data Points will extend and generate a cumulative normal distribution for the user, given two data points and their associated probability values. Two programs perform two-way analysis of variance (ANOVA) with no replication or generalized ANOVA for two factors with four levels and three repetitions. Linear Regression-ANOVA will curvefit data to the linear equation y=f(x) and will do an ANOVA to check its significance.
Pounds, Stan; Cao, Xueyuan; Cheng, Cheng; Yang, Jun; Campana, Dario; Evans, William E.; Pui, Ching-Hon; Relling, Mary V.
2010-01-01
Powerful methods for integrated analysis of multiple biological data sets are needed to maximize interpretation capacity and acquire meaningful knowledge. We recently developed Projection Onto the Most Interesting Statistical Evidence (PROMISE). PROMISE is a statistical procedure that incorporates prior knowledge about the biological relationships among endpoint variables into an integrated analysis of microarray gene expression data with multiple biological and clinical endpoints. Here, PROMISE is adapted to the integrated analysis of pharmacologic, clinical, and genome-wide genotype data that incorporating knowledge about the biological relationships among pharmacologic and clinical response data. An efficient permutation-testing algorithm is introduced so that statistical calculations are computationally feasible in this higher-dimension setting. The new method is applied to a pediatric leukemia data set. The results clearly indicate that PROMISE is a powerful statistical tool for identifying genomic features that exhibit a biologically meaningful pattern of association with multiple endpoint variables. PMID:21516175
48 CFR 1852.223-76 - Federal Automotive Statistical Tool Reporting.
Code of Federal Regulations, 2010 CFR
2010-10-01
... Statistical Tool Reporting. 1852.223-76 Section 1852.223-76 Federal Acquisition Regulations System NATIONAL... Provisions and Clauses 1852.223-76 Federal Automotive Statistical Tool Reporting. As prescribed at 1823.271 and 1851.205, insert the following clause: Federal Automotive Statistical Tool Reporting (JUL 2003) If...
Navigating freely-available software tools for metabolomics analysis.
Spicer, Rachel; Salek, Reza M; Moreno, Pablo; Cañueto, Daniel; Steinbeck, Christoph
2017-01-01
The field of metabolomics has expanded greatly over the past two decades, both as an experimental science with applications in many areas, as well as in regards to data standards and bioinformatics software tools. The diversity of experimental designs and instrumental technologies used for metabolomics has led to the need for distinct data analysis methods and the development of many software tools. To compile a comprehensive list of the most widely used freely available software and tools that are used primarily in metabolomics. The most widely used tools were selected for inclusion in the review by either ≥ 50 citations on Web of Science (as of 08/09/16) or the use of the tool being reported in the recent Metabolomics Society survey. Tools were then categorised by the type of instrumental data (i.e. LC-MS, GC-MS or NMR) and the functionality (i.e. pre- and post-processing, statistical analysis, workflow and other functions) they are designed for. A comprehensive list of the most used tools was compiled. Each tool is discussed within the context of its application domain and in relation to comparable tools of the same domain. An extended list including additional tools is available at https://github.com/RASpicer/MetabolomicsTools which is classified and searchable via a simple controlled vocabulary. This review presents the most widely used tools for metabolomics analysis, categorised based on their main functionality. As future work, we suggest a direct comparison of tools' abilities to perform specific data analysis tasks e.g. peak picking.
ERIC Educational Resources Information Center
Aderibigbe, Semiyu Adejare; Ajasa, Folorunso Adekemi
2013-01-01
Purpose: The purpose of this paper is to explore the perceptions of college tutors on peer coaching as a tool for professional development to determine its formal institutionalisation. Design/methodology/approach: A survey questionnaire was used for data collection, while analysis of data was done using descriptive statistics. Findings: The…
A survey of tools for the analysis of quantitative PCR (qPCR) data.
Pabinger, Stephan; Rödiger, Stefan; Kriegner, Albert; Vierlinger, Klemens; Weinhäusel, Andreas
2014-09-01
Real-time quantitative polymerase-chain-reaction (qPCR) is a standard technique in most laboratories used for various applications in basic research. Analysis of qPCR data is a crucial part of the entire experiment, which has led to the development of a plethora of methods. The released tools either cover specific parts of the workflow or provide complete analysis solutions. Here, we surveyed 27 open-access software packages and tools for the analysis of qPCR data. The survey includes 8 Microsoft Windows, 5 web-based, 9 R-based and 5 tools from other platforms. Reviewed packages and tools support the analysis of different qPCR applications, such as RNA quantification, DNA methylation, genotyping, identification of copy number variations, and digital PCR. We report an overview of the functionality, features and specific requirements of the individual software tools, such as data exchange formats, availability of a graphical user interface, included procedures for graphical data presentation, and offered statistical methods. In addition, we provide an overview about quantification strategies, and report various applications of qPCR. Our comprehensive survey showed that most tools use their own file format and only a fraction of the currently existing tools support the standardized data exchange format RDML. To allow a more streamlined and comparable analysis of qPCR data, more vendors and tools need to adapt the standardized format to encourage the exchange of data between instrument software, analysis tools, and researchers.
Gene ARMADA: an integrated multi-analysis platform for microarray data implemented in MATLAB.
Chatziioannou, Aristotelis; Moulos, Panagiotis; Kolisis, Fragiskos N
2009-10-27
The microarray data analysis realm is ever growing through the development of various tools, open source and commercial. However there is absence of predefined rational algorithmic analysis workflows or batch standardized processing to incorporate all steps, from raw data import up to the derivation of significantly differentially expressed gene lists. This absence obfuscates the analytical procedure and obstructs the massive comparative processing of genomic microarray datasets. Moreover, the solutions provided, heavily depend on the programming skills of the user, whereas in the case of GUI embedded solutions, they do not provide direct support of various raw image analysis formats or a versatile and simultaneously flexible combination of signal processing methods. We describe here Gene ARMADA (Automated Robust MicroArray Data Analysis), a MATLAB implemented platform with a Graphical User Interface. This suite integrates all steps of microarray data analysis including automated data import, noise correction and filtering, normalization, statistical selection of differentially expressed genes, clustering, classification and annotation. In its current version, Gene ARMADA fully supports 2 coloured cDNA and Affymetrix oligonucleotide arrays, plus custom arrays for which experimental details are given in tabular form (Excel spreadsheet, comma separated values, tab-delimited text formats). It also supports the analysis of already processed results through its versatile import editor. Besides being fully automated, Gene ARMADA incorporates numerous functionalities of the Statistics and Bioinformatics Toolboxes of MATLAB. In addition, it provides numerous visualization and exploration tools plus customizable export data formats for seamless integration by other analysis tools or MATLAB, for further processing. Gene ARMADA requires MATLAB 7.4 (R2007a) or higher and is also distributed as a stand-alone application with MATLAB Component Runtime. Gene ARMADA provides a highly adaptable, integrative, yet flexible tool which can be used for automated quality control, analysis, annotation and visualization of microarray data, constituting a starting point for further data interpretation and integration with numerous other tools.
On-Line Analysis of Southern FIA Data
Michael P. Spinney; Paul C. Van Deusen; Francis A. Roesch
2006-01-01
The Southern On-Line Estimator (SOLE) is a web-based FIA database analysis tool designed with an emphasis on modularity. The Java-based user interface is simple and intuitive to use and the R-based analysis engine is fast and stable. Each component of the program (data retrieval, statistical analysis and output) can be individually modified to accommodate major...
Lee, Myeongjun; Kim, Hyunjung; Shin, Donghee; Lee, Sangyun
2016-01-01
Harassment means systemic and repeated unethical acts. Research on workplace harassment have been conducted widely and the NAQ-R has been widely used for the researches. But this tool, however the limitations in revealing differended in sub-factors depending on the culture and in reflecting that unique characteristics of the Koren society. So, The workplace harassment questionnaire for Korean finace and service workers has been developed to assess the level of personal harassment at work. This study aims to develop a tool to assess the level of personal harassment at work and to test its validity and reliability while examining specific characteristics of workplace harassment against finance and service workers in Korea. The framework of survey was established based on literature review, focused-group interview for the Korean finance and service workers. To verify its reliability, Cronbach's alpha coefficient was calculated; and to verify its validity, items and factors of the tool were analyzed. The correlation matrix analysis was examined to verify the tool's convergent validity and discriminant validity. Structural validity was verified by checking statistical significance in relation to the BDI-K. Cronbach's alpha coefficient of this survey was 0.93, which indicates a quite high level of reliability. To verify the appropriateness of this survey tool, its construct validity was examined through factor analysis. As a result of the factor analysis, 3 factors were extracted, explaining 56.5 % of the total variance. The loading values and communalities of the 20 items were 0.85 to 0.48 and 0.71 to 0.46. The convergent validity and discriminant validity were analyzed and rate of item discriminant validity was 100 %. Finally, for the concurrent validity, We examined the relationship between the WHI-KFSW and pschosocial stress by examining the correlation with the BDI-K. The results of chi-square test and multiple logistic analysis indicated that the correlation with the BDI-K was satatisctically significant. Workplace harassment in actual workplaces were investigated based on interviews, and the statistical analysis contributed to systematizing the types of actual workplace harassment. By statistical method, we developed the questionare, 20 items of 3 categories.
Statistical Symbolic Execution with Informed Sampling
NASA Technical Reports Server (NTRS)
Filieri, Antonio; Pasareanu, Corina S.; Visser, Willem; Geldenhuys, Jaco
2014-01-01
Symbolic execution techniques have been proposed recently for the probabilistic analysis of programs. These techniques seek to quantify the likelihood of reaching program events of interest, e.g., assert violations. They have many promising applications but have scalability issues due to high computational demand. To address this challenge, we propose a statistical symbolic execution technique that performs Monte Carlo sampling of the symbolic program paths and uses the obtained information for Bayesian estimation and hypothesis testing with respect to the probability of reaching the target events. To speed up the convergence of the statistical analysis, we propose Informed Sampling, an iterative symbolic execution that first explores the paths that have high statistical significance, prunes them from the state space and guides the execution towards less likely paths. The technique combines Bayesian estimation with a partial exact analysis for the pruned paths leading to provably improved convergence of the statistical analysis. We have implemented statistical symbolic execution with in- formed sampling in the Symbolic PathFinder tool. We show experimentally that the informed sampling obtains more precise results and converges faster than a purely statistical analysis and may also be more efficient than an exact symbolic analysis. When the latter does not terminate symbolic execution with informed sampling can give meaningful results under the same time and memory limits.
Bayesian models based on test statistics for multiple hypothesis testing problems.
Ji, Yuan; Lu, Yiling; Mills, Gordon B
2008-04-01
We propose a Bayesian method for the problem of multiple hypothesis testing that is routinely encountered in bioinformatics research, such as the differential gene expression analysis. Our algorithm is based on modeling the distributions of test statistics under both null and alternative hypotheses. We substantially reduce the complexity of the process of defining posterior model probabilities by modeling the test statistics directly instead of modeling the full data. Computationally, we apply a Bayesian FDR approach to control the number of rejections of null hypotheses. To check if our model assumptions for the test statistics are valid for various bioinformatics experiments, we also propose a simple graphical model-assessment tool. Using extensive simulations, we demonstrate the performance of our models and the utility of the model-assessment tool. In the end, we apply the proposed methodology to an siRNA screening and a gene expression experiment.
STAMPS: Software Tool for Automated MRI Post-processing on a supercomputer.
Bigler, Don C; Aksu, Yaman; Miller, David J; Yang, Qing X
2009-08-01
This paper describes a Software Tool for Automated MRI Post-processing (STAMP) of multiple types of brain MRIs on a workstation and for parallel processing on a supercomputer (STAMPS). This software tool enables the automation of nonlinear registration for a large image set and for multiple MR image types. The tool uses standard brain MRI post-processing tools (such as SPM, FSL, and HAMMER) for multiple MR image types in a pipeline fashion. It also contains novel MRI post-processing features. The STAMP image outputs can be used to perform brain analysis using Statistical Parametric Mapping (SPM) or single-/multi-image modality brain analysis using Support Vector Machines (SVMs). Since STAMPS is PBS-based, the supercomputer may be a multi-node computer cluster or one of the latest multi-core computers.
Li, Peter; Castrillo, Juan I; Velarde, Giles; Wassink, Ingo; Soiland-Reyes, Stian; Owen, Stuart; Withers, David; Oinn, Tom; Pocock, Matthew R; Goble, Carole A; Oliver, Stephen G; Kell, Douglas B
2008-08-07
There has been a dramatic increase in the amount of quantitative data derived from the measurement of changes at different levels of biological complexity during the post-genomic era. However, there are a number of issues associated with the use of computational tools employed for the analysis of such data. For example, computational tools such as R and MATLAB require prior knowledge of their programming languages in order to implement statistical analyses on data. Combining two or more tools in an analysis may also be problematic since data may have to be manually copied and pasted between separate user interfaces for each tool. Furthermore, this transfer of data may require a reconciliation step in order for there to be interoperability between computational tools. Developments in the Taverna workflow system have enabled pipelines to be constructed and enacted for generic and ad hoc analyses of quantitative data. Here, we present an example of such a workflow involving the statistical identification of differentially-expressed genes from microarray data followed by the annotation of their relationships to cellular processes. This workflow makes use of customised maxdBrowse web services, a system that allows Taverna to query and retrieve gene expression data from the maxdLoad2 microarray database. These data are then analysed by R to identify differentially-expressed genes using the Taverna RShell processor which has been developed for invoking this tool when it has been deployed as a service using the RServe library. In addition, the workflow uses Beanshell scripts to reconcile mismatches of data between services as well as to implement a form of user interaction for selecting subsets of microarray data for analysis as part of the workflow execution. A new plugin system in the Taverna software architecture is demonstrated by the use of renderers for displaying PDF files and CSV formatted data within the Taverna workbench. Taverna can be used by data analysis experts as a generic tool for composing ad hoc analyses of quantitative data by combining the use of scripts written in the R programming language with tools exposed as services in workflows. When these workflows are shared with colleagues and the wider scientific community, they provide an approach for other scientists wanting to use tools such as R without having to learn the corresponding programming language to analyse their own data.
Li, Peter; Castrillo, Juan I; Velarde, Giles; Wassink, Ingo; Soiland-Reyes, Stian; Owen, Stuart; Withers, David; Oinn, Tom; Pocock, Matthew R; Goble, Carole A; Oliver, Stephen G; Kell, Douglas B
2008-01-01
Background There has been a dramatic increase in the amount of quantitative data derived from the measurement of changes at different levels of biological complexity during the post-genomic era. However, there are a number of issues associated with the use of computational tools employed for the analysis of such data. For example, computational tools such as R and MATLAB require prior knowledge of their programming languages in order to implement statistical analyses on data. Combining two or more tools in an analysis may also be problematic since data may have to be manually copied and pasted between separate user interfaces for each tool. Furthermore, this transfer of data may require a reconciliation step in order for there to be interoperability between computational tools. Results Developments in the Taverna workflow system have enabled pipelines to be constructed and enacted for generic and ad hoc analyses of quantitative data. Here, we present an example of such a workflow involving the statistical identification of differentially-expressed genes from microarray data followed by the annotation of their relationships to cellular processes. This workflow makes use of customised maxdBrowse web services, a system that allows Taverna to query and retrieve gene expression data from the maxdLoad2 microarray database. These data are then analysed by R to identify differentially-expressed genes using the Taverna RShell processor which has been developed for invoking this tool when it has been deployed as a service using the RServe library. In addition, the workflow uses Beanshell scripts to reconcile mismatches of data between services as well as to implement a form of user interaction for selecting subsets of microarray data for analysis as part of the workflow execution. A new plugin system in the Taverna software architecture is demonstrated by the use of renderers for displaying PDF files and CSV formatted data within the Taverna workbench. Conclusion Taverna can be used by data analysis experts as a generic tool for composing ad hoc analyses of quantitative data by combining the use of scripts written in the R programming language with tools exposed as services in workflows. When these workflows are shared with colleagues and the wider scientific community, they provide an approach for other scientists wanting to use tools such as R without having to learn the corresponding programming language to analyse their own data. PMID:18687127
Configural Frequency Analysis as a Statistical Tool for Developmental Research.
ERIC Educational Resources Information Center
Lienert, Gustav A.; Oeveste, Hans Zur
1985-01-01
Configural frequency analysis (CFA) is suggested as a technique for longitudinal research in developmental psychology. Stability and change in answers to multiple choice and yes-no item patterns obtained with repeated measurements are identified by CFA and illustrated by developmental analysis of an item from Gorham's Proverb Test. (Author/DWH)
Coloc-stats: a unified web interface to perform colocalization analysis of genomic features.
Simovski, Boris; Kanduri, Chakravarthi; Gundersen, Sveinung; Titov, Dmytro; Domanska, Diana; Bock, Christoph; Bossini-Castillo, Lara; Chikina, Maria; Favorov, Alexander; Layer, Ryan M; Mironov, Andrey A; Quinlan, Aaron R; Sheffield, Nathan C; Trynka, Gosia; Sandve, Geir K
2018-06-05
Functional genomics assays produce sets of genomic regions as one of their main outputs. To biologically interpret such region-sets, researchers often use colocalization analysis, where the statistical significance of colocalization (overlap, spatial proximity) between two or more region-sets is tested. Existing colocalization analysis tools vary in the statistical methodology and analysis approaches, thus potentially providing different conclusions for the same research question. As the findings of colocalization analysis are often the basis for follow-up experiments, it is helpful to use several tools in parallel and to compare the results. We developed the Coloc-stats web service to facilitate such analyses. Coloc-stats provides a unified interface to perform colocalization analysis across various analytical methods and method-specific options (e.g. colocalization measures, resolution, null models). Coloc-stats helps the user to find a method that supports their experimental requirements and allows for a straightforward comparison across methods. Coloc-stats is implemented as a web server with a graphical user interface that assists users with configuring their colocalization analyses. Coloc-stats is freely available at https://hyperbrowser.uio.no/coloc-stats/.
Comparisons of non-Gaussian statistical models in DNA methylation analysis.
Ma, Zhanyu; Teschendorff, Andrew E; Yu, Hong; Taghia, Jalil; Guo, Jun
2014-06-16
As a key regulatory mechanism of gene expression, DNA methylation patterns are widely altered in many complex genetic diseases, including cancer. DNA methylation is naturally quantified by bounded support data; therefore, it is non-Gaussian distributed. In order to capture such properties, we introduce some non-Gaussian statistical models to perform dimension reduction on DNA methylation data. Afterwards, non-Gaussian statistical model-based unsupervised clustering strategies are applied to cluster the data. Comparisons and analysis of different dimension reduction strategies and unsupervised clustering methods are presented. Experimental results show that the non-Gaussian statistical model-based methods are superior to the conventional Gaussian distribution-based method. They are meaningful tools for DNA methylation analysis. Moreover, among several non-Gaussian methods, the one that captures the bounded nature of DNA methylation data reveals the best clustering performance.
Comparisons of Non-Gaussian Statistical Models in DNA Methylation Analysis
Ma, Zhanyu; Teschendorff, Andrew E.; Yu, Hong; Taghia, Jalil; Guo, Jun
2014-01-01
As a key regulatory mechanism of gene expression, DNA methylation patterns are widely altered in many complex genetic diseases, including cancer. DNA methylation is naturally quantified by bounded support data; therefore, it is non-Gaussian distributed. In order to capture such properties, we introduce some non-Gaussian statistical models to perform dimension reduction on DNA methylation data. Afterwards, non-Gaussian statistical model-based unsupervised clustering strategies are applied to cluster the data. Comparisons and analysis of different dimension reduction strategies and unsupervised clustering methods are presented. Experimental results show that the non-Gaussian statistical model-based methods are superior to the conventional Gaussian distribution-based method. They are meaningful tools for DNA methylation analysis. Moreover, among several non-Gaussian methods, the one that captures the bounded nature of DNA methylation data reveals the best clustering performance. PMID:24937687
New Statistics for Testing Differential Expression of Pathways from Microarray Data
NASA Astrophysics Data System (ADS)
Siu, Hoicheong; Dong, Hua; Jin, Li; Xiong, Momiao
Exploring biological meaning from microarray data is very important but remains a great challenge. Here, we developed three new statistics: linear combination test, quadratic test and de-correlation test to identify differentially expressed pathways from gene expression profile. We apply our statistics to two rheumatoid arthritis datasets. Notably, our results reveal three significant pathways and 275 genes in common in two datasets. The pathways we found are meaningful to uncover the disease mechanisms of rheumatoid arthritis, which implies that our statistics are a powerful tool in functional analysis of gene expression data.
Translating statistical species-habitat models to interactive decision support tools
Wszola, Lyndsie S.; Simonsen, Victoria L.; Stuber, Erica F.; Gillespie, Caitlyn R.; Messinger, Lindsey N.; Decker, Karie L.; Lusk, Jeffrey J.; Jorgensen, Christopher F.; Bishop, Andrew A.; Fontaine, Joseph J.
2017-01-01
Understanding species-habitat relationships is vital to successful conservation, but the tools used to communicate species-habitat relationships are often poorly suited to the information needs of conservation practitioners. Here we present a novel method for translating a statistical species-habitat model, a regression analysis relating ring-necked pheasant abundance to landcover, into an interactive online tool. The Pheasant Habitat Simulator combines the analytical power of the R programming environment with the user-friendly Shiny web interface to create an online platform in which wildlife professionals can explore the effects of variation in local landcover on relative pheasant habitat suitability within spatial scales relevant to individual wildlife managers. Our tool allows users to virtually manipulate the landcover composition of a simulated space to explore how changes in landcover may affect pheasant relative habitat suitability, and guides users through the economic tradeoffs of landscape changes. We offer suggestions for development of similar interactive applications and demonstrate their potential as innovative science delivery tools for diverse professional and public audiences.
Translating statistical species-habitat models to interactive decision support tools.
Wszola, Lyndsie S; Simonsen, Victoria L; Stuber, Erica F; Gillespie, Caitlyn R; Messinger, Lindsey N; Decker, Karie L; Lusk, Jeffrey J; Jorgensen, Christopher F; Bishop, Andrew A; Fontaine, Joseph J
2017-01-01
Understanding species-habitat relationships is vital to successful conservation, but the tools used to communicate species-habitat relationships are often poorly suited to the information needs of conservation practitioners. Here we present a novel method for translating a statistical species-habitat model, a regression analysis relating ring-necked pheasant abundance to landcover, into an interactive online tool. The Pheasant Habitat Simulator combines the analytical power of the R programming environment with the user-friendly Shiny web interface to create an online platform in which wildlife professionals can explore the effects of variation in local landcover on relative pheasant habitat suitability within spatial scales relevant to individual wildlife managers. Our tool allows users to virtually manipulate the landcover composition of a simulated space to explore how changes in landcover may affect pheasant relative habitat suitability, and guides users through the economic tradeoffs of landscape changes. We offer suggestions for development of similar interactive applications and demonstrate their potential as innovative science delivery tools for diverse professional and public audiences.
Translating statistical species-habitat models to interactive decision support tools
Simonsen, Victoria L.; Stuber, Erica F.; Gillespie, Caitlyn R.; Messinger, Lindsey N.; Decker, Karie L.; Lusk, Jeffrey J.; Jorgensen, Christopher F.; Bishop, Andrew A.; Fontaine, Joseph J.
2017-01-01
Understanding species-habitat relationships is vital to successful conservation, but the tools used to communicate species-habitat relationships are often poorly suited to the information needs of conservation practitioners. Here we present a novel method for translating a statistical species-habitat model, a regression analysis relating ring-necked pheasant abundance to landcover, into an interactive online tool. The Pheasant Habitat Simulator combines the analytical power of the R programming environment with the user-friendly Shiny web interface to create an online platform in which wildlife professionals can explore the effects of variation in local landcover on relative pheasant habitat suitability within spatial scales relevant to individual wildlife managers. Our tool allows users to virtually manipulate the landcover composition of a simulated space to explore how changes in landcover may affect pheasant relative habitat suitability, and guides users through the economic tradeoffs of landscape changes. We offer suggestions for development of similar interactive applications and demonstrate their potential as innovative science delivery tools for diverse professional and public audiences. PMID:29236707
Web-based tools for modelling and analysis of multivariate data: California ozone pollution activity
Dinov, Ivo D.; Christou, Nicolas
2014-01-01
This article presents a hands-on web-based activity motivated by the relation between human health and ozone pollution in California. This case study is based on multivariate data collected monthly at 20 locations in California between 1980 and 2006. Several strategies and tools for data interrogation and exploratory data analysis, model fitting and statistical inference on these data are presented. All components of this case study (data, tools, activity) are freely available online at: http://wiki.stat.ucla.edu/socr/index.php/SOCR_MotionCharts_CAOzoneData. Several types of exploratory (motion charts, box-and-whisker plots, spider charts) and quantitative (inference, regression, analysis of variance (ANOVA)) data analyses tools are demonstrated. Two specific human health related questions (temporal and geographic effects of ozone pollution) are discussed as motivational challenges. PMID:24465054
Dinov, Ivo D; Christou, Nicolas
2011-09-01
This article presents a hands-on web-based activity motivated by the relation between human health and ozone pollution in California. This case study is based on multivariate data collected monthly at 20 locations in California between 1980 and 2006. Several strategies and tools for data interrogation and exploratory data analysis, model fitting and statistical inference on these data are presented. All components of this case study (data, tools, activity) are freely available online at: http://wiki.stat.ucla.edu/socr/index.php/SOCR_MotionCharts_CAOzoneData. Several types of exploratory (motion charts, box-and-whisker plots, spider charts) and quantitative (inference, regression, analysis of variance (ANOVA)) data analyses tools are demonstrated. Two specific human health related questions (temporal and geographic effects of ozone pollution) are discussed as motivational challenges.
Visual analytics in cheminformatics: user-supervised descriptor selection for QSAR methods.
Martínez, María Jimena; Ponzoni, Ignacio; Díaz, Mónica F; Vazquez, Gustavo E; Soto, Axel J
2015-01-01
The design of QSAR/QSPR models is a challenging problem, where the selection of the most relevant descriptors constitutes a key step of the process. Several feature selection methods that address this step are concentrated on statistical associations among descriptors and target properties, whereas the chemical knowledge is left out of the analysis. For this reason, the interpretability and generality of the QSAR/QSPR models obtained by these feature selection methods are drastically affected. Therefore, an approach for integrating domain expert's knowledge in the selection process is needed for increase the confidence in the final set of descriptors. In this paper a software tool, which we named Visual and Interactive DEscriptor ANalysis (VIDEAN), that combines statistical methods with interactive visualizations for choosing a set of descriptors for predicting a target property is proposed. Domain expertise can be added to the feature selection process by means of an interactive visual exploration of data, and aided by statistical tools and metrics based on information theory. Coordinated visual representations are presented for capturing different relationships and interactions among descriptors, target properties and candidate subsets of descriptors. The competencies of the proposed software were assessed through different scenarios. These scenarios reveal how an expert can use this tool to choose one subset of descriptors from a group of candidate subsets or how to modify existing descriptor subsets and even incorporate new descriptors according to his or her own knowledge of the target property. The reported experiences showed the suitability of our software for selecting sets of descriptors with low cardinality, high interpretability, low redundancy and high statistical performance in a visual exploratory way. Therefore, it is possible to conclude that the resulting tool allows the integration of a chemist's expertise in the descriptor selection process with a low cognitive effort in contrast with the alternative of using an ad-hoc manual analysis of the selected descriptors. Graphical abstractVIDEAN allows the visual analysis of candidate subsets of descriptors for QSAR/QSPR. In the two panels on the top, users can interactively explore numerical correlations as well as co-occurrences in the candidate subsets through two interactive graphs.
Eijssen, Lars M T; Goelela, Varshna S; Kelder, Thomas; Adriaens, Michiel E; Evelo, Chris T; Radonjic, Marijana
2015-06-30
Illumina whole-genome expression bead arrays are a widely used platform for transcriptomics. Most of the tools available for the analysis of the resulting data are not easily applicable by less experienced users. ArrayAnalysis.org provides researchers with an easy-to-use and comprehensive interface to the functionality of R and Bioconductor packages for microarray data analysis. As a modular open source project, it allows developers to contribute modules that provide support for additional types of data or extend workflows. To enable data analysis of Illumina bead arrays for a broad user community, we have developed a module for ArrayAnalysis.org that provides a free and user-friendly web interface for quality control and pre-processing for these arrays. This module can be used together with existing modules for statistical and pathway analysis to provide a full workflow for Illumina gene expression data analysis. The module accepts data exported from Illumina's GenomeStudio, and provides the user with quality control plots and normalized data. The outputs are directly linked to the existing statistics module of ArrayAnalysis.org, but can also be downloaded for further downstream analysis in third-party tools. The Illumina bead arrays analysis module is available at http://www.arrayanalysis.org . A user guide, a tutorial demonstrating the analysis of an example dataset, and R scripts are available. The module can be used as a starting point for statistical evaluation and pathway analysis provided on the website or to generate processed input data for a broad range of applications in life sciences research.
Dinov, Ivo D; Sanchez, Juana; Christou, Nicolas
2008-01-01
Technology-based instruction represents a new recent pedagogical paradigm that is rooted in the realization that new generations are much more comfortable with, and excited about, new technologies. The rapid technological advancement over the past decade has fueled an enormous demand for the integration of modern networking, informational and computational tools with classical pedagogical instruments. Consequently, teaching with technology typically involves utilizing a variety of IT and multimedia resources for online learning, course management, electronic course materials, and novel tools of communication, engagement, experimental, critical thinking and assessment.The NSF-funded Statistics Online Computational Resource (SOCR) provides a number of interactive tools for enhancing instruction in various undergraduate and graduate courses in probability and statistics. These resources include online instructional materials, statistical calculators, interactive graphical user interfaces, computational and simulation applets, tools for data analysis and visualization. The tools provided as part of SOCR include conceptual simulations and statistical computing interfaces, which are designed to bridge between the introductory and the more advanced computational and applied probability and statistics courses. In this manuscript, we describe our designs for utilizing SOCR technology in instruction in a recent study. In addition, present the results of the effectiveness of using SOCR tools at two different course intensity levels on three outcome measures: exam scores, student satisfaction and choice of technology to complete assignments. Learning styles assessment was completed at baseline. We have used three very different designs for three different undergraduate classes. Each course included a treatment group, using the SOCR resources, and a control group, using classical instruction techniques. Our findings include marginal effects of the SOCR treatment per individual classes; however, pooling the results across all courses and sections, SOCR effects on the treatment groups were exceptionally robust and significant. Coupling these findings with a clear decrease in the variance of the quantitative examination measures in the treatment groups indicates that employing technology, like SOCR, in a sound pedagogical and scientific manner enhances overall the students' understanding and suggests better long-term knowledge retention.
Dinov, Ivo D.; Sanchez, Juana; Christou, Nicolas
2009-01-01
Technology-based instruction represents a new recent pedagogical paradigm that is rooted in the realization that new generations are much more comfortable with, and excited about, new technologies. The rapid technological advancement over the past decade has fueled an enormous demand for the integration of modern networking, informational and computational tools with classical pedagogical instruments. Consequently, teaching with technology typically involves utilizing a variety of IT and multimedia resources for online learning, course management, electronic course materials, and novel tools of communication, engagement, experimental, critical thinking and assessment. The NSF-funded Statistics Online Computational Resource (SOCR) provides a number of interactive tools for enhancing instruction in various undergraduate and graduate courses in probability and statistics. These resources include online instructional materials, statistical calculators, interactive graphical user interfaces, computational and simulation applets, tools for data analysis and visualization. The tools provided as part of SOCR include conceptual simulations and statistical computing interfaces, which are designed to bridge between the introductory and the more advanced computational and applied probability and statistics courses. In this manuscript, we describe our designs for utilizing SOCR technology in instruction in a recent study. In addition, present the results of the effectiveness of using SOCR tools at two different course intensity levels on three outcome measures: exam scores, student satisfaction and choice of technology to complete assignments. Learning styles assessment was completed at baseline. We have used three very different designs for three different undergraduate classes. Each course included a treatment group, using the SOCR resources, and a control group, using classical instruction techniques. Our findings include marginal effects of the SOCR treatment per individual classes; however, pooling the results across all courses and sections, SOCR effects on the treatment groups were exceptionally robust and significant. Coupling these findings with a clear decrease in the variance of the quantitative examination measures in the treatment groups indicates that employing technology, like SOCR, in a sound pedagogical and scientific manner enhances overall the students’ understanding and suggests better long-term knowledge retention. PMID:19750185
Web-TCGA: an online platform for integrated analysis of molecular cancer data sets.
Deng, Mario; Brägelmann, Johannes; Schultze, Joachim L; Perner, Sven
2016-02-06
The Cancer Genome Atlas (TCGA) is a pool of molecular data sets publicly accessible and freely available to cancer researchers anywhere around the world. However, wide spread use is limited since an advanced knowledge of statistics and statistical software is required. In order to improve accessibility we created Web-TCGA, a web based, freely accessible online tool, which can also be run in a private instance, for integrated analysis of molecular cancer data sets provided by TCGA. In contrast to already available tools, Web-TCGA utilizes different methods for analysis and visualization of TCGA data, allowing users to generate global molecular profiles across different cancer entities simultaneously. In addition to global molecular profiles, Web-TCGA offers highly detailed gene and tumor entity centric analysis by providing interactive tables and views. As a supplement to other already available tools, such as cBioPortal (Sci Signal 6:pl1, 2013, Cancer Discov 2:401-4, 2012), Web-TCGA is offering an analysis service, which does not require any installation or configuration, for molecular data sets available at the TCGA. Individual processing requests (queries) are generated by the user for mutation, methylation, expression and copy number variation (CNV) analyses. The user can focus analyses on results from single genes and cancer entities or perform a global analysis (multiple cancer entities and genes simultaneously).
Liem, Franziskus; Mérillat, Susan; Bezzola, Ladina; Hirsiger, Sarah; Philipp, Michel; Madhyastha, Tara; Jäncke, Lutz
2015-03-01
FreeSurfer is a tool to quantify cortical and subcortical brain anatomy automatically and noninvasively. Previous studies have reported reliability and statistical power analyses in relatively small samples or only selected one aspect of brain anatomy. Here, we investigated reliability and statistical power of cortical thickness, surface area, volume, and the volume of subcortical structures in a large sample (N=189) of healthy elderly subjects (64+ years). Reliability (intraclass correlation coefficient) of cortical and subcortical parameters is generally high (cortical: ICCs>0.87, subcortical: ICCs>0.95). Surface-based smoothing increases reliability of cortical thickness maps, while it decreases reliability of cortical surface area and volume. Nevertheless, statistical power of all measures benefits from smoothing. When aiming to detect a 10% difference between groups, the number of subjects required to test effects with sufficient power over the entire cortex varies between cortical measures (cortical thickness: N=39, surface area: N=21, volume: N=81; 10mm smoothing, power=0.8, α=0.05). For subcortical regions this number is between 16 and 76 subjects, depending on the region. We also demonstrate the advantage of within-subject designs over between-subject designs. Furthermore, we publicly provide a tool that allows researchers to perform a priori power analysis and sensitivity analysis to help evaluate previously published studies and to design future studies with sufficient statistical power. Copyright © 2014 Elsevier Inc. All rights reserved.
Analysis of Two Methods to Evaluate Antioxidants
ERIC Educational Resources Information Center
Tomasina, Florencia; Carabio, Claudio; Celano, Laura; Thomson, Leonor
2012-01-01
This exercise is intended to introduce undergraduate biochemistry students to the analysis of antioxidants as a biotechnological tool. In addition, some statistical resources will also be used and discussed. Antioxidants play an important metabolic role, preventing oxidative stress-mediated cell and tissue injury. Knowing the antioxidant content…
Identifying currents in the gene pool for bacterial populations using an integrative approach.
Tang, Jing; Hanage, William P; Fraser, Christophe; Corander, Jukka
2009-08-01
The evolution of bacterial populations has recently become considerably better understood due to large-scale sequencing of population samples. It has become clear that DNA sequences from a multitude of genes, as well as a broad sample coverage of a target population, are needed to obtain a relatively unbiased view of its genetic structure and the patterns of ancestry connected to the strains. However, the traditional statistical methods for evolutionary inference, such as phylogenetic analysis, are associated with several difficulties under such an extensive sampling scenario, in particular when a considerable amount of recombination is anticipated to have taken place. To meet the needs of large-scale analyses of population structure for bacteria, we introduce here several statistical tools for the detection and representation of recombination between populations. Also, we introduce a model-based description of the shape of a population in sequence space, in terms of its molecular variability and affinity towards other populations. Extensive real data from the genus Neisseria are utilized to demonstrate the potential of an approach where these population genetic tools are combined with an phylogenetic analysis. The statistical tools introduced here are freely available in BAPS 5.2 software, which can be downloaded from http://web.abo.fi/fak/mnf/mate/jc/software/baps.html.
Effects of Cognitive Load on Trust
2013-10-01
that may be affected by load Build a parsing tool to extract relevant features Statistical analysis of results (by load components) Achieved...for a business application. Participants assessed potential job candidates and reviewed the applicants’ virtual resume which included standard...substantially different from each other that would make any confounding problems or other issues. Some statistics of the Australian data collection are
Using Data Analysis to Explore Class Enrollment.
ERIC Educational Resources Information Center
Davis, Gretchen
1990-01-01
Describes classroom activities and shows that statistics is a practical tool for solving real problems. Presents a histogram, a stem plot, and a box plot to compare data involving class enrollments. (YP)
Khan, Haseeb Ahmad
2004-01-01
The massive surge in the production of microarray data poses a great challenge for proper analysis and interpretation. In recent years numerous computational tools have been developed to extract meaningful interpretation of microarray gene expression data. However, a convenient tool for two-groups comparison of microarray data is still lacking and users have to rely on commercial statistical packages that might be costly and require special skills, in addition to extra time and effort for transferring data from one platform to other. Various statistical methods, including the t-test, analysis of variance, Pearson test and Mann-Whitney U test, have been reported for comparing microarray data, whereas the utilization of the Wilcoxon signed-rank test, which is an appropriate test for two-groups comparison of gene expression data, has largely been neglected in microarray studies. The aim of this investigation was to build an integrated tool, ArraySolver, for colour-coded graphical display and comparison of gene expression data using the Wilcoxon signed-rank test. The results of software validation showed similar outputs with ArraySolver and SPSS for large datasets. Whereas the former program appeared to be more accurate for 25 or fewer pairs (n < or = 25), suggesting its potential application in analysing molecular signatures that usually contain small numbers of genes. The main advantages of ArraySolver are easy data selection, convenient report format, accurate statistics and the familiar Excel platform.
2004-01-01
The massive surge in the production of microarray data poses a great challenge for proper analysis and interpretation. In recent years numerous computational tools have been developed to extract meaningful interpretation of microarray gene expression data. However, a convenient tool for two-groups comparison of microarray data is still lacking and users have to rely on commercial statistical packages that might be costly and require special skills, in addition to extra time and effort for transferring data from one platform to other. Various statistical methods, including the t-test, analysis of variance, Pearson test and Mann–Whitney U test, have been reported for comparing microarray data, whereas the utilization of the Wilcoxon signed-rank test, which is an appropriate test for two-groups comparison of gene expression data, has largely been neglected in microarray studies. The aim of this investigation was to build an integrated tool, ArraySolver, for colour-coded graphical display and comparison of gene expression data using the Wilcoxon signed-rank test. The results of software validation showed similar outputs with ArraySolver and SPSS for large datasets. Whereas the former program appeared to be more accurate for 25 or fewer pairs (n ≤ 25), suggesting its potential application in analysing molecular signatures that usually contain small numbers of genes. The main advantages of ArraySolver are easy data selection, convenient report format, accurate statistics and the familiar Excel platform. PMID:18629036
Smith, Paul F.
2017-01-01
Effective inferential statistical analysis is essential for high quality studies in neuroscience. However, recently, neuroscience has been criticised for the poor use of experimental design and statistical analysis. Many of the statistical issues confronting neuroscience are similar to other areas of biology; however, there are some that occur more regularly in neuroscience studies. This review attempts to provide a succinct overview of some of the major issues that arise commonly in the analyses of neuroscience data. These include: the non-normal distribution of the data; inequality of variance between groups; extensive correlation in data for repeated measurements across time or space; excessive multiple testing; inadequate statistical power due to small sample sizes; pseudo-replication; and an over-emphasis on binary conclusions about statistical significance as opposed to effect sizes. Statistical analysis should be viewed as just another neuroscience tool, which is critical to the final outcome of the study. Therefore, it needs to be done well and it is a good idea to be proactive and seek help early, preferably before the study even begins. PMID:29371855
Smith, Paul F
2017-01-01
Effective inferential statistical analysis is essential for high quality studies in neuroscience. However, recently, neuroscience has been criticised for the poor use of experimental design and statistical analysis. Many of the statistical issues confronting neuroscience are similar to other areas of biology; however, there are some that occur more regularly in neuroscience studies. This review attempts to provide a succinct overview of some of the major issues that arise commonly in the analyses of neuroscience data. These include: the non-normal distribution of the data; inequality of variance between groups; extensive correlation in data for repeated measurements across time or space; excessive multiple testing; inadequate statistical power due to small sample sizes; pseudo-replication; and an over-emphasis on binary conclusions about statistical significance as opposed to effect sizes. Statistical analysis should be viewed as just another neuroscience tool, which is critical to the final outcome of the study. Therefore, it needs to be done well and it is a good idea to be proactive and seek help early, preferably before the study even begins.
Additional Support for the Information Systems Analyst Exam as a Valid Program Assessment Tool
ERIC Educational Resources Information Center
Carpenter, Donald A.; Snyder, Johnny; Slauson, Gayla Jo; Bridge, Morgan K.
2011-01-01
This paper presents a statistical analysis to support the notion that the Information Systems Analyst (ISA) exam can be used as a program assessment tool in addition to measuring student performance. It compares ISA exam scores earned by students in one particular Computer Information Systems program with scores earned by the same students on the…
EEG and MEG data analysis in SPM8.
Litvak, Vladimir; Mattout, Jérémie; Kiebel, Stefan; Phillips, Christophe; Henson, Richard; Kilner, James; Barnes, Gareth; Oostenveld, Robert; Daunizeau, Jean; Flandin, Guillaume; Penny, Will; Friston, Karl
2011-01-01
SPM is a free and open source software written in MATLAB (The MathWorks, Inc.). In addition to standard M/EEG preprocessing, we presently offer three main analysis tools: (i) statistical analysis of scalp-maps, time-frequency images, and volumetric 3D source reconstruction images based on the general linear model, with correction for multiple comparisons using random field theory; (ii) Bayesian M/EEG source reconstruction, including support for group studies, simultaneous EEG and MEG, and fMRI priors; (iii) dynamic causal modelling (DCM), an approach combining neural modelling with data analysis for which there are several variants dealing with evoked responses, steady state responses (power spectra and cross-spectra), induced responses, and phase coupling. SPM8 is integrated with the FieldTrip toolbox , making it possible for users to combine a variety of standard analysis methods with new schemes implemented in SPM and build custom analysis tools using powerful graphical user interface (GUI) and batching tools.
EEG and MEG Data Analysis in SPM8
Litvak, Vladimir; Mattout, Jérémie; Kiebel, Stefan; Phillips, Christophe; Henson, Richard; Kilner, James; Barnes, Gareth; Oostenveld, Robert; Daunizeau, Jean; Flandin, Guillaume; Penny, Will; Friston, Karl
2011-01-01
SPM is a free and open source software written in MATLAB (The MathWorks, Inc.). In addition to standard M/EEG preprocessing, we presently offer three main analysis tools: (i) statistical analysis of scalp-maps, time-frequency images, and volumetric 3D source reconstruction images based on the general linear model, with correction for multiple comparisons using random field theory; (ii) Bayesian M/EEG source reconstruction, including support for group studies, simultaneous EEG and MEG, and fMRI priors; (iii) dynamic causal modelling (DCM), an approach combining neural modelling with data analysis for which there are several variants dealing with evoked responses, steady state responses (power spectra and cross-spectra), induced responses, and phase coupling. SPM8 is integrated with the FieldTrip toolbox , making it possible for users to combine a variety of standard analysis methods with new schemes implemented in SPM and build custom analysis tools using powerful graphical user interface (GUI) and batching tools. PMID:21437221
Analysis of High-Throughput ELISA Microarray Data
DOE Office of Scientific and Technical Information (OSTI.GOV)
White, Amanda M.; Daly, Don S.; Zangar, Richard C.
Our research group develops analytical methods and software for the high-throughput analysis of quantitative enzyme-linked immunosorbent assay (ELISA) microarrays. ELISA microarrays differ from DNA microarrays in several fundamental aspects and most algorithms for analysis of DNA microarray data are not applicable to ELISA microarrays. In this review, we provide an overview of the steps involved in ELISA microarray data analysis and how the statistically sound algorithms we have developed provide an integrated software suite to address the needs of each data-processing step. The algorithms discussed are available in a set of open-source software tools (http://www.pnl.gov/statistics/ProMAT).
Statistical Tolerance and Clearance Analysis for Assembly
NASA Technical Reports Server (NTRS)
Lee, S.; Yi, C.
1996-01-01
Tolerance is inevitable because manufacturing exactly equal parts is known to be impossible. Furthermore, the specification of tolerances is an integral part of product design since tolerances directly affect the assemblability, functionality, manufacturability, and cost effectiveness of a product. In this paper, we present statistical tolerance and clearance analysis for the assembly. Our proposed work is expected to make the following contributions: (i) to help the designers to evaluate products for assemblability, (ii) to provide a new perspective to tolerance problems, and (iii) to provide a tolerance analysis tool which can be incorporated into a CAD or solid modeling system.
Organization and Visualization for Initial Analysis of Forced-Choice Ipsative Data
ERIC Educational Resources Information Center
Cochran, Jill A.
2015-01-01
Forced-choice ipsative data are common in personality, philosophy and other preference-based studies. However, this type of data inherently contains dependencies that are challenging for usual statistical analysis. In order to utilize the structure of the data as a guide for analysis rather than as a challenge to manage, a visualisation tool was…
DOT National Transportation Integrated Search
2014-03-01
Recent research in highway safety has focused on the more advanced and statistically proven techniques of highway : safety analysis. This project focuses on the two most recent safety analysis tools, the Highway Safety Manual (HSM) : and SafetyAnalys...
Exploring Rating Quality in Rater-Mediated Assessments Using Mokken Scale Analysis
ERIC Educational Resources Information Center
Wind, Stefanie A.; Engelhard, George, Jr.
2016-01-01
Mokken scale analysis is a probabilistic nonparametric approach that offers statistical and graphical tools for evaluating the quality of social science measurement without placing potentially inappropriate restrictions on the structure of a data set. In particular, Mokken scaling provides a useful method for evaluating important measurement…
The Use of Modelling for Theory Building in Qualitative Analysis
ERIC Educational Resources Information Center
Briggs, Ann R. J.
2007-01-01
The purpose of this article is to exemplify and enhance the place of modelling as a qualitative process in educational research. Modelling is widely used in quantitative research as a tool for analysis, theory building and prediction. Statistical data lend themselves to graphical representation of values, interrelationships and operational…
Guidelines for the analysis of free energy calculations
Klimovich, Pavel V.; Shirts, Michael R.; Mobley, David L.
2015-01-01
Free energy calculations based on molecular dynamics (MD) simulations show considerable promise for applications ranging from drug discovery to prediction of physical properties and structure-function studies. But these calculations are still difficult and tedious to analyze, and best practices for analysis are not well defined or propagated. Essentially, each group analyzing these calculations needs to decide how to conduct the analysis and, usually, develop its own analysis tools. Here, we review and recommend best practices for analysis yielding reliable free energies from molecular simulations. Additionally, we provide a Python tool, alchemical–analysis.py, freely available on GitHub at https://github.com/choderalab/pymbar–examples, that implements the analysis practices reviewed here for several reference simulation packages, which can be adapted to handle data from other packages. Both this review and the tool covers analysis of alchemical calculations generally, including free energy estimates via both thermodynamic integration and free energy perturbation-based estimators. Our Python tool also handles output from multiple types of free energy calculations, including expanded ensemble and Hamiltonian replica exchange, as well as standard fixed ensemble calculations. We also survey a range of statistical and graphical ways of assessing the quality of the data and free energy estimates, and provide prototypes of these in our tool. We hope these tools and discussion will serve as a foundation for more standardization of and agreement on best practices for analysis of free energy calculations. PMID:25808134
Velasco-Tapia, Fernando
2014-01-01
Magmatic processes have usually been identified and evaluated using qualitative or semiquantitative geochemical or isotopic tools based on a restricted number of variables. However, a more complete and quantitative view could be reached applying multivariate analysis, mass balance techniques, and statistical tests. As an example, in this work a statistical and quantitative scheme is applied to analyze the geochemical features for the Sierra de las Cruces (SC) volcanic range (Mexican Volcanic Belt). In this locality, the volcanic activity (3.7 to 0.5 Ma) was dominantly dacitic, but the presence of spheroidal andesitic enclaves and/or diverse disequilibrium features in majority of lavas confirms the operation of magma mixing/mingling. New discriminant-function-based multidimensional diagrams were used to discriminate tectonic setting. Statistical tests of discordancy and significance were applied to evaluate the influence of the subducting Cocos plate, which seems to be rather negligible for the SC magmas in relation to several major and trace elements. A cluster analysis following Ward's linkage rule was carried out to classify the SC volcanic rocks geochemical groups. Finally, two mass-balance schemes were applied for the quantitative evaluation of the proportion of the end-member components (dacitic and andesitic magmas) in the comingled lavas (binary mixtures). PMID:24737994
A Divergence Statistics Extension to VTK for Performance Analysis
DOE Office of Scientific and Technical Information (OSTI.GOV)
Pebay, Philippe Pierre; Bennett, Janine Camille
This report follows the series of previous documents ([PT08, BPRT09b, PT09, BPT09, PT10, PB13], where we presented the parallel descriptive, correlative, multi-correlative, principal component analysis, contingency, k -means, order and auto-correlative statistics engines which we developed within the Visualization Tool Kit ( VTK ) as a scalable, parallel and versatile statistics package. We now report on a new engine which we developed for the calculation of divergence statistics, a concept which we hereafter explain and whose main goal is to quantify the discrepancy, in a stasticial manner akin to measuring a distance, between an observed empirical distribution and a theoretical,more » "ideal" one. The ease of use of the new diverence statistics engine is illustrated by the means of C++ code snippets. Although this new engine does not yet have a parallel implementation, it has already been applied to HPC performance analysis, of which we provide an example.« less
Diagnosing alternative conceptions of Fermi energy among undergraduate students
NASA Astrophysics Data System (ADS)
Sharma, Sapna; Ahluwalia, Pardeep Kumar
2012-07-01
Physics education researchers have scientifically established the fact that the understanding of new concepts and interpretation of incoming information are strongly influenced by the preexisting knowledge and beliefs of students, called epistemological beliefs. This can lead to a gap between what students actually learn and what the teacher expects them to learn. In a classroom, as a teacher, it is desirable that one tries to bridge this gap at least on the key concepts of a particular field which is being taught. One such key concept which crops up in statistical physics/solid-state physics courses, and around which the behaviour of materials is described, is Fermi energy (εF). In this paper, we present the results which emerged about misconceptions on Fermi energy in the process of administering a diagnostic tool called the Statistical Physics Concept Survey developed by the authors. It deals with eight themes of basic importance in learning undergraduate solid-state physics and statistical physics. The question items of the tool were put through well-established sequential processes: definition of themes, Delphi study, interview with students, drafting questions, administration, validity and reliability of the tool. The tool was administered to a group of undergraduate students and postgraduate students, in a pre-test and post-test design. In this paper, we have taken one of the themes i.e. Fermi energy of the diagnostic tool for our analysis and discussion. Students’ responses and reasoning comments given during interview were analysed. This analysis helped us to identify prevailing misconceptions/learning gaps among students on this topic. How spreadsheets can be effectively used to remove the identified misconceptions and help appreciate the finer nuances while visualizing the behaviour of the system around Fermi energy, normally sidestepped both by the teachers and learners, is also presented in this paper.
The magnitude and effects of extreme solar particle events
NASA Astrophysics Data System (ADS)
Jiggens, Piers; Chavy-Macdonald, Marc-Andre; Santin, Giovanni; Menicucci, Alessandra; Evans, Hugh; Hilgers, Alain
2014-06-01
The solar energetic particle (SEP) radiation environment is an important consideration for spacecraft design, spacecraft mission planning and human spaceflight. Herein is presented an investigation into the likely severity of effects of a very large Solar Particle Event (SPE) on technology and humans in space. Fluences for SPEs derived using statistical models are compared to historical SPEs to verify their appropriateness for use in the analysis which follows. By combining environment tools with tools to model effects behind varying layers of spacecraft shielding it is possible to predict what impact a large SPE would be likely to have on a spacecraft in Near-Earth interplanetary space or geostationary Earth orbit. Also presented is a comparison of results generated using the traditional method of inputting the environment spectra, determined using a statistical model, into effects tools and a new method developed as part of the ESA SEPEM Project allowing for the creation of an effect time series on which statistics, previously applied to the flux data, can be run directly. The SPE environment spectra is determined and presented as energy integrated proton fluence (cm-2) as a function of particle energy (in MeV). This is input into the SHIELDOSE-2, MULASSIS, NIEL, GRAS and SEU effects tools to provide the output results. In the case of the new method for analysis, the flux time series is fed directly into the MULASSIS and GEMAT tools integrated into the SEPEM system. The output effect quantities include total ionising dose (in rads), non-ionising energy loss (MeV g-1), single event upsets (upsets/bit) and the dose in humans compared to established limits for stochastic (or cancer-causing) effects and tissue reactions (such as acute radiation sickness) in humans given in grey-equivalent and sieverts respectively.
Skouroliakou, Maria; Soloupis, George; Gounaris, Antonis; Charitou, Antonia; Papasarantopoulos, Petros; Markantonis, Sophia L; Golna, Christina; Souliotis, Kyriakos
2008-07-28
This study assesses the results of implementation of a software program that allows for input of admission/discharge summary data (including cost) in a neonatal intensive care unit (NICU) in Greece, based on the establishment of a baseline statistical database for infants treated in a NICU and the statistical analysis of epidemiological and resource utilization data thus collected. A software tool was designed, developed, and implemented between April 2004 and March 2005 in the NICU of the LITO private maternity hospital in Athens, Greece, to allow for the first time for step-by-step collection and management of summary treatment data. Data collected over this period were subsequently analyzed using defined indicators as a basis to extract results related to treatment options, treatment duration, and relative resource utilization. Data for 499 babies were entered in the tool and processed. Information on medical costs (e.g., mean total cost +/- SD of treatment was euro310.44 +/- 249.17 and euro6704.27 +/- 4079.53 for babies weighing more than 2500 g and 1000-1500 g respectively), incidence of complications or disease (e.g., 4.3 percent and 14.3 percent of study babies weighing 1,000 to 1,500 g suffered from cerebral bleeding [grade I] and bronchopulmonary dysplasia, respectively, while overall 6.0 percent had microbial infections), and medical statistics (e.g., perinatal mortality was 6.8 percent) was obtained in a quick and robust manner. The software tool allowed for collection and analysis of data traditionally maintained in paper medical records in the NICU with greater ease and accuracy. Data codification and analysis led to significant findings at the epidemiological, medical resource utilization, and respective hospital cost levels that allowed comparisons with literature findings for the first time in Greece. The tool thus contributed to a clearer understanding of treatment practices in the NICU and set the baseline for the assessment of the impact of future interventions at the policy or hospital level.
NASA Astrophysics Data System (ADS)
Nguyen, A.; Mueller, C.; Brooks, A. N.; Kislik, E. A.; Baney, O. N.; Ramirez, C.; Schmidt, C.; Torres-Perez, J. L.
2014-12-01
The Sierra Nevada is experiencing changes in hydrologic regimes, such as decreases in snowmelt and peak runoff, which affect forest health and the availability of water resources. Currently, the USDA Forest Service Region 5 is undergoing Forest Plan revisions to include climate change impacts into mitigation and adaptation strategies. However, there are few processes in place to conduct quantitative assessments of forest conditions in relation to mountain hydrology, while easily and effectively delivering that information to forest managers. To assist the USDA Forest Service, this study is the final phase of a three-term project to create a Decision Support System (DSS) to allow ease of access to historical and forecasted hydrologic, climatic, and terrestrial conditions for the entire Sierra Nevada. This data is featured within three components of the DSS: the Mapping Viewer, Statistical Analysis Portal, and Geospatial Data Gateway. Utilizing ArcGIS Online, the Sierra DSS Mapping Viewer enables users to visually analyze and locate areas of interest. Once the areas of interest are targeted, the Statistical Analysis Portal provides subbasin level statistics for each variable over time by utilizing a recently developed web-based data analysis and visualization tool called Plotly. This tool allows users to generate graphs and conduct statistical analyses for the Sierra Nevada without the need to download the dataset of interest. For more comprehensive analysis, users are also able to download datasets via the Geospatial Data Gateway. The third phase of this project focused on Python-based data processing, the adaptation of the multiple capabilities of ArcGIS Online and Plotly, and the integration of the three Sierra DSS components within a website designed specifically for the USDA Forest Service.
GIS Tools For Improving Pedestrian & Bicycle Safety
DOT National Transportation Integrated Search
2000-07-01
Geographic Information System (GIS) software turns statistical data, such as accidents, and geographic data, such as roads and crash locations, into meaningful information for spatial analysis and mapping. In this project, GIS-based analytical techni...
Hearing Loss in Children: Types of Hearing Loss
... Records EHDI Data Analysis and Statistical Hub (DASH) Articles & Key Findings Research & Tracking EHDI Funded Projects EHDI FOA 1701 Funding Map Public Health Cycle Recommendations & Guidelines Free Materials Parent’s Guide Multimedia & Tools My ...
SNP_tools: A compact tool package for analysis and conversion of genotype data for MS-Excel
Chen, Bowang; Wilkening, Stefan; Drechsel, Marion; Hemminki, Kari
2009-01-01
Background Single nucleotide polymorphism (SNP) genotyping is a major activity in biomedical research. Scientists prefer to have a facile access to the results which may require conversions between data formats. First hand SNP data is often entered in or saved in the MS-Excel format, but this software lacks genetic and epidemiological related functions. A general tool to do basic genetic and epidemiological analysis and data conversion for MS-Excel is needed. Findings The SNP_tools package is prepared as an add-in for MS-Excel. The code is written in Visual Basic for Application, embedded in the Microsoft Office package. This add-in is an easy to use tool for users with basic computer knowledge (and requirements for basic statistical analysis). Conclusion Our implementation for Microsoft Excel 2000-2007 in Microsoft Windows 2000, XP, Vista and Windows 7 beta can handle files in different formats and converts them into other formats. It is a free software. PMID:19852806
SNP_tools: A compact tool package for analysis and conversion of genotype data for MS-Excel.
Chen, Bowang; Wilkening, Stefan; Drechsel, Marion; Hemminki, Kari
2009-10-23
Single nucleotide polymorphism (SNP) genotyping is a major activity in biomedical research. Scientists prefer to have a facile access to the results which may require conversions between data formats. First hand SNP data is often entered in or saved in the MS-Excel format, but this software lacks genetic and epidemiological related functions. A general tool to do basic genetic and epidemiological analysis and data conversion for MS-Excel is needed. The SNP_tools package is prepared as an add-in for MS-Excel. The code is written in Visual Basic for Application, embedded in the Microsoft Office package. This add-in is an easy to use tool for users with basic computer knowledge (and requirements for basic statistical analysis). Our implementation for Microsoft Excel 2000-2007 in Microsoft Windows 2000, XP, Vista and Windows 7 beta can handle files in different formats and converts them into other formats. It is a free software.
L.R. Iverson; A.M. Prasad; A. Liaw
2004-01-01
More and better machine learning tools are becoming available for landscape ecologists to aid in understanding species-environment relationships and to map probable species occurrence now and potentially into the future. To thal end, we evaluated three statistical models: Regression Tree Analybib (RTA), Bagging Trees (BT) and Random Forest (RF) for their utility in...
James S. Han; Theodore Mianowski; Yi-yu Lin
1999-01-01
The efficacy of fiber length measurement techniques such as digitizing, the Kajaani procedure, and NIH Image are compared in order to determine the optimal tool. Kenaf bast fibers, aspen, and red pine fibers were collected from different anatomical parts, and the fiber lengths were compared using various analytical tools. A statistical analysis on the validity of the...
Jaiswara, Ranjana; Nandi, Diptarup; Balakrishnan, Rohini
2013-01-01
Traditional taxonomy based on morphology has often failed in accurate species identification owing to the occurrence of cryptic species, which are reproductively isolated but morphologically identical. Molecular data have thus been used to complement morphology in species identification. The sexual advertisement calls in several groups of acoustically communicating animals are species-specific and can thus complement molecular data as non-invasive tools for identification. Several statistical tools and automated identifier algorithms have been used to investigate the efficiency of acoustic signals in species identification. Despite a plethora of such methods, there is a general lack of knowledge regarding the appropriate usage of these methods in specific taxa. In this study, we investigated the performance of two commonly used statistical methods, discriminant function analysis (DFA) and cluster analysis, in identification and classification based on acoustic signals of field cricket species belonging to the subfamily Gryllinae. Using a comparative approach we evaluated the optimal number of species and calling song characteristics for both the methods that lead to most accurate classification and identification. The accuracy of classification using DFA was high and was not affected by the number of taxa used. However, a constraint in using discriminant function analysis is the need for a priori classification of songs. Accuracy of classification using cluster analysis, which does not require a priori knowledge, was maximum for 6–7 taxa and decreased significantly when more than ten taxa were analysed together. We also investigated the efficacy of two novel derived acoustic features in improving the accuracy of identification. Our results show that DFA is a reliable statistical tool for species identification using acoustic signals. Our results also show that cluster analysis of acoustic signals in crickets works effectively for species classification and identification. PMID:24086666
SOCR Analyses - an Instructional Java Web-based Statistical Analysis Toolkit.
Chu, Annie; Cui, Jenny; Dinov, Ivo D
2009-03-01
The Statistical Online Computational Resource (SOCR) designs web-based tools for educational use in a variety of undergraduate courses (Dinov 2006). Several studies have demonstrated that these resources significantly improve students' motivation and learning experiences (Dinov et al. 2008). SOCR Analyses is a new component that concentrates on data modeling and analysis using parametric and non-parametric techniques supported with graphical model diagnostics. Currently implemented analyses include commonly used models in undergraduate statistics courses like linear models (Simple Linear Regression, Multiple Linear Regression, One-Way and Two-Way ANOVA). In addition, we implemented tests for sample comparisons, such as t-test in the parametric category; and Wilcoxon rank sum test, Kruskal-Wallis test, Friedman's test, in the non-parametric category. SOCR Analyses also include several hypothesis test models, such as Contingency tables, Friedman's test and Fisher's exact test.The code itself is open source (http://socr.googlecode.com/), hoping to contribute to the efforts of the statistical computing community. The code includes functionality for each specific analysis model and it has general utilities that can be applied in various statistical computing tasks. For example, concrete methods with API (Application Programming Interface) have been implemented in statistical summary, least square solutions of general linear models, rank calculations, etc. HTML interfaces, tutorials, source code, activities, and data are freely available via the web (www.SOCR.ucla.edu). Code examples for developers and demos for educators are provided on the SOCR Wiki website.In this article, the pedagogical utilization of the SOCR Analyses is discussed, as well as the underlying design framework. As the SOCR project is on-going and more functions and tools are being added to it, these resources are constantly improved. The reader is strongly encouraged to check the SOCR site for most updated information and newly added models.
NASA Astrophysics Data System (ADS)
Yepes-Calderon, Fernando; Brun, Caroline; Sant, Nishita; Thompson, Paul; Lepore, Natasha
2015-01-01
Tensor-Based Morphometry (TBM) is an increasingly popular method for group analysis of brain MRI data. The main steps in the analysis consist of a nonlinear registration to align each individual scan to a common space, and a subsequent statistical analysis to determine morphometric differences, or difference in fiber structure between groups. Recently, we implemented the Statistically-Assisted Fluid Registration Algorithm or SAFIRA,1 which is designed for tracking morphometric differences among populations. To this end, SAFIRA allows the inclusion of statistical priors extracted from the populations being studied as regularizers in the registration. This flexibility and degree of sophistication limit the tool to expert use, even more so considering that SAFIRA was initially implemented in command line mode. Here, we introduce a new, intuitive, easy to use, Matlab-based graphical user interface for SAFIRA's multivariate TBM. The interface also generates different choices for the TBM statistics, including both the traditional univariate statistics on the Jacobian matrix, and comparison of the full deformation tensors.2 This software will be freely disseminated to the neuroimaging research community.
ARTiiFACT: a tool for heart rate artifact processing and heart rate variability analysis.
Kaufmann, Tobias; Sütterlin, Stefan; Schulz, Stefan M; Vögele, Claus
2011-12-01
The importance of appropriate handling of artifacts in interbeat interval (IBI) data must not be underestimated. Even a single artifact may cause unreliable heart rate variability (HRV) results. Thus, a robust artifact detection algorithm and the option for manual intervention by the researcher form key components for confident HRV analysis. Here, we present ARTiiFACT, a software tool for processing electrocardiogram and IBI data. Both automated and manual artifact detection and correction are available in a graphical user interface. In addition, ARTiiFACT includes time- and frequency-based HRV analyses and descriptive statistics, thus offering the basic tools for HRV analysis. Notably, all program steps can be executed separately and allow for data export, thus offering high flexibility and interoperability with a whole range of applications.
ERIC Educational Resources Information Center
Zheng, Henry Y.; Stewart, Alice A.
This study explores data envelopment analysis (DEA) as a tool for assessing and benchmarking the performance of public research universities. Using of national databases such as those conducted by the National Science Foundation and the National Center for Education Statistics, DEA analysis was conducted of the research and instructional outcomes…
PCA as a practical indicator of OPLS-DA model reliability.
Worley, Bradley; Powers, Robert
Principal Component Analysis (PCA) and Orthogonal Projections to Latent Structures Discriminant Analysis (OPLS-DA) are powerful statistical modeling tools that provide insights into separations between experimental groups based on high-dimensional spectral measurements from NMR, MS or other analytical instrumentation. However, when used without validation, these tools may lead investigators to statistically unreliable conclusions. This danger is especially real for Partial Least Squares (PLS) and OPLS, which aggressively force separations between experimental groups. As a result, OPLS-DA is often used as an alternative method when PCA fails to expose group separation, but this practice is highly dangerous. Without rigorous validation, OPLS-DA can easily yield statistically unreliable group separation. A Monte Carlo analysis of PCA group separations and OPLS-DA cross-validation metrics was performed on NMR datasets with statistically significant separations in scores-space. A linearly increasing amount of Gaussian noise was added to each data matrix followed by the construction and validation of PCA and OPLS-DA models. With increasing added noise, the PCA scores-space distance between groups rapidly decreased and the OPLS-DA cross-validation statistics simultaneously deteriorated. A decrease in correlation between the estimated loadings (added noise) and the true (original) loadings was also observed. While the validity of the OPLS-DA model diminished with increasing added noise, the group separation in scores-space remained basically unaffected. Supported by the results of Monte Carlo analyses of PCA group separations and OPLS-DA cross-validation metrics, we provide practical guidelines and cross-validatory recommendations for reliable inference from PCA and OPLS-DA models.
Statistical parsimony networks and species assemblages in Cephalotrichid nemerteans (nemertea).
Chen, Haixia; Strand, Malin; Norenburg, Jon L; Sun, Shichun; Kajihara, Hiroshi; Chernyshev, Alexey V; Maslakova, Svetlana A; Sundberg, Per
2010-09-21
It has been suggested that statistical parsimony network analysis could be used to get an indication of species represented in a set of nucleotide data, and the approach has been used to discuss species boundaries in some taxa. Based on 635 base pairs of the mitochondrial protein-coding gene cytochrome c oxidase I (COI), we analyzed 152 nemertean specimens using statistical parsimony network analysis with the connection probability set to 95%. The analysis revealed 15 distinct networks together with seven singletons. Statistical parsimony yielded three networks supporting the species status of Cephalothrix rufifrons, C. major and C. spiralis as they currently have been delineated by morphological characters and geographical location. Many other networks contained haplotypes from nearby geographical locations. Cladistic structure by maximum likelihood analysis overall supported the network analysis, but indicated a false positive result where subnetworks should have been connected into one network/species. This probably is caused by undersampling of the intraspecific haplotype diversity. Statistical parsimony network analysis provides a rapid and useful tool for detecting possible undescribed/cryptic species among cephalotrichid nemerteans based on COI gene. It should be combined with phylogenetic analysis to get indications of false positive results, i.e., subnetworks that would have been connected with more extensive haplotype sampling.
The US EPA’s ToxCastTM program seeks to combine advances in high-throughput screening technology with methodologies from statistics and computer science to develop high-throughput decision support tools for assessing chemical hazard and risk. To develop new methods of analysis of...
Interactive visual analysis promotes exploration of long-term ecological data
T.N. Pham; J.A. Jones; R. Metoyer; F.J. Swanson; R.J. Pabst
2013-01-01
Long-term ecological data are crucial in helping ecologists understand ecosystem function and environmental change. Nevertheless, these kinds of data sets are difficult to analyze because they are usually large, multivariate, and spatiotemporal. Although existing analysis tools such as statistical methods and spreadsheet software permit rigorous tests of pre-conceived...
Kang, Kyung-Nam; Ryu, Tae-Kyu; Lee, Yoon-Sik
2009-01-01
Background Concerns have recently been raised about the negative effects of patents on innovation. In this study, the effects of patents on innovations in the Korean biotech SMEs (small and medium-sized entrepreneurs) were examined using survey data and statistical analysis. Results The survey results of this study provided some evidence that restricted access problems have occurred even though their frequency was not high. Statistical analysis revealed that difficulties in accessing patented research tools were not negatively correlated with the level of innovation performance and attitudes toward the patent system. Conclusion On the basis of the results of this investigation in combination with those of previous studies, we concluded that although restricted access problems have occurred, this has not yet deterred innovation in Korea. However, potential problems do exist, and the effects of restricted access should be constantly scrutinized. PMID:19321013
Kang, Kyung-Nam; Ryu, Tae-Kyu; Lee, Yoon-Sik
2009-03-26
Concerns have recently been raised about the negative effects of patents on innovation. In this study, the effects of patents on innovations in the Korean biotech SMEs (small and medium-sized entrepreneurs) were examined using survey data and statistical analysis. The survey results of this study provided some evidence that restricted access problems have occurred even though their frequency was not high. Statistical analysis revealed that difficulties in accessing patented research tools were not negatively correlated with the level of innovation performance and attitudes toward the patent system. On the basis of the results of this investigation in combination with those of previous studies, we concluded that although restricted access problems have occurred, this has not yet deterred innovation in Korea. However, potential problems do exist, and the effects of restricted access should be constantly scrutinized.
Detection of Cutting Tool Wear using Statistical Analysis and Regression Model
NASA Astrophysics Data System (ADS)
Ghani, Jaharah A.; Rizal, Muhammad; Nuawi, Mohd Zaki; Haron, Che Hassan Che; Ramli, Rizauddin
2010-10-01
This study presents a new method for detecting the cutting tool wear based on the measured cutting force signals. A statistical-based method called Integrated Kurtosis-based Algorithm for Z-Filter technique, called I-kaz was used for developing a regression model and 3D graphic presentation of I-kaz 3D coefficient during machining process. The machining tests were carried out using a CNC turning machine Colchester Master Tornado T4 in dry cutting condition. A Kistler 9255B dynamometer was used to measure the cutting force signals, which were transmitted, analyzed, and displayed in the DasyLab software. Various force signals from machining operation were analyzed, and each has its own I-kaz 3D coefficient. This coefficient was examined and its relationship with flank wear lands (VB) was determined. A regression model was developed due to this relationship, and results of the regression model shows that the I-kaz 3D coefficient value decreases as tool wear increases. The result then is used for real time tool wear monitoring.
Spectral Analysis of B Stars: An Application of Bayesian Statistics
NASA Astrophysics Data System (ADS)
Mugnes, J.-M.; Robert, C.
2012-12-01
To better understand the processes involved in stellar physics, it is necessary to obtain accurate stellar parameters (effective temperature, surface gravity, abundances…). Spectral analysis is a powerful tool for investigating stars, but it is also vital to reduce uncertainties at a decent computational cost. Here we present a spectral analysis method based on a combination of Bayesian statistics and grids of synthetic spectra obtained with TLUSTY. This method simultaneously constrains the stellar parameters by using all the lines accessible in observed spectra and thus greatly reduces uncertainties and improves the overall spectrum fitting. Preliminary results are shown using spectra from the Observatoire du Mont-Mégantic.
Software for Data Analysis with Graphical Models
NASA Technical Reports Server (NTRS)
Buntine, Wray L.; Roy, H. Scott
1994-01-01
Probabilistic graphical models are being used widely in artificial intelligence and statistics, for instance, in diagnosis and expert systems, as a framework for representing and reasoning with probabilities and independencies. They come with corresponding algorithms for performing statistical inference. This offers a unifying framework for prototyping and/or generating data analysis algorithms from graphical specifications. This paper illustrates the framework with an example and then presents some basic techniques for the task: problem decomposition and the calculation of exact Bayes factors. Other tools already developed, such as automatic differentiation, Gibbs sampling, and use of the EM algorithm, make this a broad basis for the generation of data analysis software.
Hafen, G M; Hurst, C; Yearwood, J; Smith, J; Dzalilov, Z; Robinson, P J
2008-10-05
Cystic fibrosis is the most common fatal genetic disorder in the Caucasian population. Scoring systems for assessment of Cystic fibrosis disease severity have been used for almost 50 years, without being adapted to the milder phenotype of the disease in the 21st century. The aim of this current project is to develop a new scoring system using a database and employing various statistical tools. This study protocol reports the development of the statistical tools in order to create such a scoring system. The evaluation is based on the Cystic Fibrosis database from the cohort at the Royal Children's Hospital in Melbourne. Initially, unsupervised clustering of the all data records was performed using a range of clustering algorithms. In particular incremental clustering algorithms were used. The clusters obtained were characterised using rules from decision trees and the results examined by clinicians. In order to obtain a clearer definition of classes expert opinion of each individual's clinical severity was sought. After data preparation including expert-opinion of an individual's clinical severity on a 3 point-scale (mild, moderate and severe disease), two multivariate techniques were used throughout the analysis to establish a method that would have a better success in feature selection and model derivation: 'Canonical Analysis of Principal Coordinates' and 'Linear Discriminant Analysis'. A 3-step procedure was performed with (1) selection of features, (2) extracting 5 severity classes out of a 3 severity class as defined per expert-opinion and (3) establishment of calibration datasets. (1) Feature selection: CAP has a more effective "modelling" focus than DA.(2) Extraction of 5 severity classes: after variables were identified as important in discriminating contiguous CF severity groups on the 3-point scale as mild/moderate and moderate/severe, Discriminant Function (DF) was used to determine the new groups mild, intermediate moderate, moderate, intermediate severe and severe disease. (3) Generated confusion tables showed a misclassification rate of 19.1% for males and 16.5% for females, with a majority of misallocations into adjacent severity classes particularly for males. Our preliminary data show that using CAP for detection of selection features and Linear DA to derive the actual model in a CF database might be helpful in developing a scoring system. However, there are several limitations, particularly more data entry points are needed to finalize a score and the statistical tools have further to be refined and validated, with re-running the statistical methods in the larger dataset.
Clark, Neil R.; Szymkiewicz, Maciej; Wang, Zichen; Monteiro, Caroline D.; Jones, Matthew R.; Ma’ayan, Avi
2016-01-01
Gene set analysis of differential expression, which identifies collectively differentially expressed gene sets, has become an important tool for biology. The power of this approach lies in its reduction of the dimensionality of the statistical problem and its incorporation of biological interpretation by construction. Many approaches to gene set analysis have been proposed, but benchmarking their performance in the setting of real biological data is difficult due to the lack of a gold standard. In a previously published work we proposed a geometrical approach to differential expression which performed highly in benchmarking tests and compared well to the most popular methods of differential gene expression. As reported, this approach has a natural extension to gene set analysis which we call Principal Angle Enrichment Analysis (PAEA). PAEA employs dimensionality reduction and a multivariate approach for gene set enrichment analysis. However, the performance of this method has not been assessed nor its implementation as a web-based tool. Here we describe new benchmarking protocols for gene set analysis methods and find that PAEA performs highly. The PAEA method is implemented as a user-friendly web-based tool, which contains 70 gene set libraries and is freely available to the community. PMID:26848405
Clark, Neil R; Szymkiewicz, Maciej; Wang, Zichen; Monteiro, Caroline D; Jones, Matthew R; Ma'ayan, Avi
2015-11-01
Gene set analysis of differential expression, which identifies collectively differentially expressed gene sets, has become an important tool for biology. The power of this approach lies in its reduction of the dimensionality of the statistical problem and its incorporation of biological interpretation by construction. Many approaches to gene set analysis have been proposed, but benchmarking their performance in the setting of real biological data is difficult due to the lack of a gold standard. In a previously published work we proposed a geometrical approach to differential expression which performed highly in benchmarking tests and compared well to the most popular methods of differential gene expression. As reported, this approach has a natural extension to gene set analysis which we call Principal Angle Enrichment Analysis (PAEA). PAEA employs dimensionality reduction and a multivariate approach for gene set enrichment analysis. However, the performance of this method has not been assessed nor its implementation as a web-based tool. Here we describe new benchmarking protocols for gene set analysis methods and find that PAEA performs highly. The PAEA method is implemented as a user-friendly web-based tool, which contains 70 gene set libraries and is freely available to the community.
Al-Aziz, Jameel; Christou, Nicolas; Dinov, Ivo D.
2011-01-01
The amount, complexity and provenance of data have dramatically increased in the past five years. Visualization of observed and simulated data is a critical component of any social, environmental, biomedical or scientific quest. Dynamic, exploratory and interactive visualization of multivariate data, without preprocessing by dimensionality reduction, remains a nearly insurmountable challenge. The Statistics Online Computational Resource (www.SOCR.ucla.edu) provides portable online aids for probability and statistics education, technology-based instruction and statistical computing. We have developed a new Java-based infrastructure, SOCR Motion Charts, for discovery-based exploratory analysis of multivariate data. This interactive data visualization tool enables the visualization of high-dimensional longitudinal data. SOCR Motion Charts allows mapping of ordinal, nominal and quantitative variables onto time, 2D axes, size, colors, glyphs and appearance characteristics, which facilitates the interactive display of multidimensional data. We validated this new visualization paradigm using several publicly available multivariate datasets including Ice-Thickness, Housing Prices, Consumer Price Index, and California Ozone Data. SOCR Motion Charts is designed using object-oriented programming, implemented as a Java Web-applet and is available to the entire community on the web at www.socr.ucla.edu/SOCR_MotionCharts. It can be used as an instructional tool for rendering and interrogating high-dimensional data in the classroom, as well as a research tool for exploratory data analysis. PMID:21479108
Seismicity map tools for earthquake studies
NASA Astrophysics Data System (ADS)
Boucouvalas, Anthony; Kaskebes, Athanasios; Tselikas, Nikos
2014-05-01
We report on the development of new and online set of tools for use within Google Maps, for earthquake research. We demonstrate this server based and online platform (developped with PHP, Javascript, MySQL) with the new tools using a database system with earthquake data. The platform allows us to carry out statistical and deterministic analysis on earthquake data use of Google Maps and plot various seismicity graphs. The tool box has been extended to draw on the map line segments, multiple straight lines horizontally and vertically as well as multiple circles, including geodesic lines. The application is demonstrated using localized seismic data from the geographic region of Greece as well as other global earthquake data. The application also offers regional segmentation (NxN) which allows the studying earthquake clustering, and earthquake cluster shift within the segments in space. The platform offers many filters such for plotting selected magnitude ranges or time periods. The plotting facility allows statistically based plots such as cumulative earthquake magnitude plots and earthquake magnitude histograms, calculation of 'b' etc. What is novel for the platform is the additional deterministic tools. Using the newly developed horizontal and vertical line and circle tools we have studied the spatial distribution trends of many earthquakes and we here show for the first time the link between Fibonacci Numbers and spatiotemporal location of some earthquakes. The new tools are valuable for examining visualizing trends in earthquake research as it allows calculation of statistics as well as deterministic precursors. We plan to show many new results based on our newly developed platform.
Statistical Methods for Rapid Aerothermal Analysis and Design Technology: Validation
NASA Technical Reports Server (NTRS)
DePriest, Douglas; Morgan, Carolyn
2003-01-01
The cost and safety goals for NASA s next generation of reusable launch vehicle (RLV) will require that rapid high-fidelity aerothermodynamic design tools be used early in the design cycle. To meet these requirements, it is desirable to identify adequate statistical models that quantify and improve the accuracy, extend the applicability, and enable combined analyses using existing prediction tools. The initial research work focused on establishing suitable candidate models for these purposes. The second phase is focused on assessing the performance of these models to accurately predict the heat rate for a given candidate data set. This validation work compared models and methods that may be useful in predicting the heat rate.
Visualizing statistical significance of disease clusters using cartograms.
Kronenfeld, Barry J; Wong, David W S
2017-05-15
Health officials and epidemiological researchers often use maps of disease rates to identify potential disease clusters. Because these maps exaggerate the prominence of low-density districts and hide potential clusters in urban (high-density) areas, many researchers have used density-equalizing maps (cartograms) as a basis for epidemiological mapping. However, we do not have existing guidelines for visual assessment of statistical uncertainty. To address this shortcoming, we develop techniques for visual determination of statistical significance of clusters spanning one or more districts on a cartogram. We developed the techniques within a geovisual analytics framework that does not rely on automated significance testing, and can therefore facilitate visual analysis to detect clusters that automated techniques might miss. On a cartogram of the at-risk population, the statistical significance of a disease cluster is determinate from the rate, area and shape of the cluster under standard hypothesis testing scenarios. We develop formulae to determine, for a given rate, the area required for statistical significance of a priori and a posteriori designated regions under certain test assumptions. Uniquely, our approach enables dynamic inference of aggregate regions formed by combining individual districts. The method is implemented in interactive tools that provide choropleth mapping, automated legend construction and dynamic search tools to facilitate cluster detection and assessment of the validity of tested assumptions. A case study of leukemia incidence analysis in California demonstrates the ability to visually distinguish between statistically significant and insignificant regions. The proposed geovisual analytics approach enables intuitive visual assessment of statistical significance of arbitrarily defined regions on a cartogram. Our research prompts a broader discussion of the role of geovisual exploratory analyses in disease mapping and the appropriate framework for visually assessing the statistical significance of spatial clusters.
Fernee, Christianne; Browne, Martin; Zakrzewski, Sonia
2017-01-01
This paper introduces statistical shape modelling (SSM) for use in osteoarchaeology research. SSM is a full field, multi-material analytical technique, and is presented as a supplementary geometric morphometric (GM) tool. Lower mandibular canines from two archaeological populations and one modern population were sampled, digitised using micro-CT, aligned, registered to a baseline and statistically modelled using principal component analysis (PCA). Sample material properties were incorporated as a binary enamel/dentin parameter. Results were assessed qualitatively and quantitatively using anatomical landmarks. Finally, the technique’s application was demonstrated for inter-sample comparison through analysis of the principal component (PC) weights. It was found that SSM could provide high detail qualitative and quantitative insight with respect to archaeological inter- and intra-sample variability. This technique has value for archaeological, biomechanical and forensic applications including identification, finite element analysis (FEA) and reconstruction from partial datasets. PMID:29216199
Implementation of statistical process control for proteomic experiments via LC MS/MS.
Bereman, Michael S; Johnson, Richard; Bollinger, James; Boss, Yuval; Shulman, Nick; MacLean, Brendan; Hoofnagle, Andrew N; MacCoss, Michael J
2014-04-01
Statistical process control (SPC) is a robust set of tools that aids in the visualization, detection, and identification of assignable causes of variation in any process that creates products, services, or information. A tool has been developed termed Statistical Process Control in Proteomics (SProCoP) which implements aspects of SPC (e.g., control charts and Pareto analysis) into the Skyline proteomics software. It monitors five quality control metrics in a shotgun or targeted proteomic workflow. None of these metrics require peptide identification. The source code, written in the R statistical language, runs directly from the Skyline interface, which supports the use of raw data files from several of the mass spectrometry vendors. It provides real time evaluation of the chromatographic performance (e.g., retention time reproducibility, peak asymmetry, and resolution), and mass spectrometric performance (targeted peptide ion intensity and mass measurement accuracy for high resolving power instruments) via control charts. Thresholds are experiment- and instrument-specific and are determined empirically from user-defined quality control standards that enable the separation of random noise and systematic error. Finally, Pareto analysis provides a summary of performance metrics and guides the user to metrics with high variance. The utility of these charts to evaluate proteomic experiments is illustrated in two case studies.
GPUs for statistical data analysis in HEP: a performance study of GooFit on GPUs vs. RooFit on CPUs
NASA Astrophysics Data System (ADS)
Pompili, Alexis; Di Florio, Adriano; CMS Collaboration
2016-10-01
In order to test the computing capabilities of GPUs with respect to traditional CPU cores a high-statistics toy Monte Carlo technique has been implemented both in ROOT/RooFit and GooFit frameworks with the purpose to estimate the statistical significance of the structure observed by CMS close to the kinematical boundary of the Jψϕ invariant mass in the three-body decay B +→JψϕK +. GooFit is a data analysis open tool under development that interfaces ROOT/RooFit to CUDA platform on nVidia GPU. The optimized GooFit application running on GPUs hosted by servers in the Bari Tier2 provides striking speed-up performances with respect to the RooFit application parallelised on multiple CPUs by means of PROOF-Lite tool. The considerably resulting speed-up, while comparing concurrent GooFit processes allowed by CUDA Multi Process Service and a RooFit/PROOF-Lite process with multiple CPU workers, is presented and discussed in detail. By means of GooFit it has also been possible to explore the behaviour of a likelihood ratio test statistic in different situations in which the Wilks Theorem may apply or does not apply because its regularity conditions are not satisfied.
New software for statistical analysis of Cambridge Structural Database data
Sykes, Richard A.; McCabe, Patrick; Allen, Frank H.; Battle, Gary M.; Bruno, Ian J.; Wood, Peter A.
2011-01-01
A collection of new software tools is presented for the analysis of geometrical, chemical and crystallographic data from the Cambridge Structural Database (CSD). This software supersedes the program Vista. The new functionality is integrated into the program Mercury in order to provide statistical, charting and plotting options alongside three-dimensional structural visualization and analysis. The integration also permits immediate access to other information about specific CSD entries through the Mercury framework, a common requirement in CSD data analyses. In addition, the new software includes a range of more advanced features focused towards structural analysis such as principal components analysis, cone-angle correction in hydrogen-bond analyses and the ability to deal with topological symmetry that may be exhibited in molecular search fragments. PMID:22477784
Usadel, Björn; Nagel, Axel; Steinhauser, Dirk; Gibon, Yves; Bläsing, Oliver E; Redestig, Henning; Sreenivasulu, Nese; Krall, Leonard; Hannah, Matthew A; Poree, Fabien; Fernie, Alisdair R; Stitt, Mark
2006-12-18
Microarray technology has become a widely accepted and standardized tool in biology. The first microarray data analysis programs were developed to support pair-wise comparison. However, as microarray experiments have become more routine, large scale experiments have become more common, which investigate multiple time points or sets of mutants or transgenics. To extract biological information from such high-throughput expression data, it is necessary to develop efficient analytical platforms, which combine manually curated gene ontologies with efficient visualization and navigation tools. Currently, most tools focus on a few limited biological aspects, rather than offering a holistic, integrated analysis. Here we introduce PageMan, a multiplatform, user-friendly, and stand-alone software tool that annotates, investigates, and condenses high-throughput microarray data in the context of functional ontologies. It includes a GUI tool to transform different ontologies into a suitable format, enabling the user to compare and choose between different ontologies. It is equipped with several statistical modules for data analysis, including over-representation analysis and Wilcoxon statistical testing. Results are exported in a graphical format for direct use, or for further editing in graphics programs.PageMan provides a fast overview of single treatments, allows genome-level responses to be compared across several microarray experiments covering, for example, stress responses at multiple time points. This aids in searching for trait-specific changes in pathways using mutants or transgenics, analyzing development time-courses, and comparison between species. In a case study, we analyze the results of publicly available microarrays of multiple cold stress experiments using PageMan, and compare the results to a previously published meta-analysis.PageMan offers a complete user's guide, a web-based over-representation analysis as well as a tutorial, and is freely available at http://mapman.mpimp-golm.mpg.de/pageman/. PageMan allows multiple microarray experiments to be efficiently condensed into a single page graphical display. The flexible interface allows data to be quickly and easily visualized, facilitating comparisons within experiments and to published experiments, thus enabling researchers to gain a rapid overview of the biological responses in the experiments.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jarocki, John Charles; Zage, David John; Fisher, Andrew N.
LinkShop is a software tool for applying the method of Linkography to the analysis time-sequence data. LinkShop provides command line, web, and application programming interfaces (API) for input and processing of time-sequence data, abstraction models, and ontologies. The software creates graph representations of the abstraction model, ontology, and derived linkograph. Finally, the tool allows the user to perform statistical measurements of the linkograph and refine the ontology through direct manipulation of the linkograph.
Fall 2014 SEI Research Review Probabilistic Analysis of Time Sensitive Systems
2014-10-28
Osmosis SMC Tool Osmosis is a tool for Statistical Model Checking (SMC) with Semantic Importance Sampling. • Input model is written in subset of C...ASSERT() statements in model indicate conditions that must hold. • Input probability distributions defined by the user. • Osmosis returns the...on: – Target relative error, or – Set number of simulations Osmosis Main Algorithm 1 http://dreal.cs.cmu.edu/ (?⃑?): Indicator
ERIC Educational Resources Information Center
Kriston, Levente; Melchior, Hanne; Hergert, Anika; Bergelt, Corinna; Watzke, Birgit; Schulz, Holger; von Wolff, Alessa
2011-01-01
The aim of our study was to develop a graphical tool that can be used in addition to standard statistical criteria to support decisions on the number of classes in explorative categorical latent variable modeling for rehabilitation research. Data from two rehabilitation research projects were used. In the first study, a latent profile analysis was…
Guidelines for the analysis of free energy calculations.
Klimovich, Pavel V; Shirts, Michael R; Mobley, David L
2015-05-01
Free energy calculations based on molecular dynamics simulations show considerable promise for applications ranging from drug discovery to prediction of physical properties and structure-function studies. But these calculations are still difficult and tedious to analyze, and best practices for analysis are not well defined or propagated. Essentially, each group analyzing these calculations needs to decide how to conduct the analysis and, usually, develop its own analysis tools. Here, we review and recommend best practices for analysis yielding reliable free energies from molecular simulations. Additionally, we provide a Python tool, alchemical-analysis.py, freely available on GitHub as part of the pymbar package (located at http://github.com/choderalab/pymbar), that implements the analysis practices reviewed here for several reference simulation packages, which can be adapted to handle data from other packages. Both this review and the tool covers analysis of alchemical calculations generally, including free energy estimates via both thermodynamic integration and free energy perturbation-based estimators. Our Python tool also handles output from multiple types of free energy calculations, including expanded ensemble and Hamiltonian replica exchange, as well as standard fixed ensemble calculations. We also survey a range of statistical and graphical ways of assessing the quality of the data and free energy estimates, and provide prototypes of these in our tool. We hope this tool and discussion will serve as a foundation for more standardization of and agreement on best practices for analysis of free energy calculations.
System engineering toolbox for design-oriented engineers
NASA Technical Reports Server (NTRS)
Goldberg, B. E.; Everhart, K.; Stevens, R.; Babbitt, N., III; Clemens, P.; Stout, L.
1994-01-01
This system engineering toolbox is designed to provide tools and methodologies to the design-oriented systems engineer. A tool is defined as a set of procedures to accomplish a specific function. A methodology is defined as a collection of tools, rules, and postulates to accomplish a purpose. For each concept addressed in the toolbox, the following information is provided: (1) description, (2) application, (3) procedures, (4) examples, if practical, (5) advantages, (6) limitations, and (7) bibliography and/or references. The scope of the document includes concept development tools, system safety and reliability tools, design-related analytical tools, graphical data interpretation tools, a brief description of common statistical tools and methodologies, so-called total quality management tools, and trend analysis tools. Both relationship to project phase and primary functional usage of the tools are also delineated. The toolbox also includes a case study for illustrative purposes. Fifty-five tools are delineated in the text.
ProteoSign: an end-user online differential proteomics statistical analysis platform.
Efstathiou, Georgios; Antonakis, Andreas N; Pavlopoulos, Georgios A; Theodosiou, Theodosios; Divanach, Peter; Trudgian, David C; Thomas, Benjamin; Papanikolaou, Nikolas; Aivaliotis, Michalis; Acuto, Oreste; Iliopoulos, Ioannis
2017-07-03
Profiling of proteome dynamics is crucial for understanding cellular behavior in response to intrinsic and extrinsic stimuli and maintenance of homeostasis. Over the last 20 years, mass spectrometry (MS) has emerged as the most powerful tool for large-scale identification and characterization of proteins. Bottom-up proteomics, the most common MS-based proteomics approach, has always been challenging in terms of data management, processing, analysis and visualization, with modern instruments capable of producing several gigabytes of data out of a single experiment. Here, we present ProteoSign, a freely available web application, dedicated in allowing users to perform proteomics differential expression/abundance analysis in a user-friendly and self-explanatory way. Although several non-commercial standalone tools have been developed for post-quantification statistical analysis of proteomics data, most of them are not end-user appealing as they often require very stringent installation of programming environments, third-party software packages and sometimes further scripting or computer programming. To avoid this bottleneck, we have developed a user-friendly software platform accessible via a web interface in order to enable proteomics laboratories and core facilities to statistically analyse quantitative proteomics data sets in a resource-efficient manner. ProteoSign is available at http://bioinformatics.med.uoc.gr/ProteoSign and the source code at https://github.com/yorgodillo/ProteoSign. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
Using Network Analysis to Characterize Biogeographic Data in a Community Archive
NASA Astrophysics Data System (ADS)
Wellman, T. P.; Bristol, S.
2017-12-01
Informative measures are needed to evaluate and compare data from multiple providers in a community-driven data archive. This study explores insights from network theory and other descriptive and inferential statistics to examine data content and application across an assemblage of publically available biogeographic data sets. The data are archived in ScienceBase, a collaborative catalog of scientific data supported by the U.S Geological Survey to enhance scientific inquiry and acuity. In gaining understanding through this investigation and other scientific venues our goal is to improve scientific insight and data use across a spectrum of scientific applications. Network analysis is a tool to reveal patterns of non-trivial topological features in the data that do not exhibit complete regularity or randomness. In this work, network analyses are used to explore shared events and dependencies between measures of data content and application derived from metadata and catalog information and measures relevant to biogeographic study. Descriptive statistical tools are used to explore relations between network analysis properties, while inferential statistics are used to evaluate the degree of confidence in these assessments. Network analyses have been used successfully in related fields to examine social awareness of scientific issues, taxonomic structures of biological organisms, and ecosystem resilience to environmental change. Use of network analysis also shows promising potential to identify relationships in biogeographic data that inform programmatic goals and scientific interests.
GOEAST: a web-based software toolkit for Gene Ontology enrichment analysis.
Zheng, Qi; Wang, Xiu-Jie
2008-07-01
Gene Ontology (GO) analysis has become a commonly used approach for functional studies of large-scale genomic or transcriptomic data. Although there have been a lot of software with GO-related analysis functions, new tools are still needed to meet the requirements for data generated by newly developed technologies or for advanced analysis purpose. Here, we present a Gene Ontology Enrichment Analysis Software Toolkit (GOEAST), an easy-to-use web-based toolkit that identifies statistically overrepresented GO terms within given gene sets. Compared with available GO analysis tools, GOEAST has the following improved features: (i) GOEAST displays enriched GO terms in graphical format according to their relationships in the hierarchical tree of each GO category (biological process, molecular function and cellular component), therefore, provides better understanding of the correlations among enriched GO terms; (ii) GOEAST supports analysis for data from various sources (probe or probe set IDs of Affymetrix, Illumina, Agilent or customized microarrays, as well as different gene identifiers) and multiple species (about 60 prokaryote and eukaryote species); (iii) One unique feature of GOEAST is to allow cross comparison of the GO enrichment status of multiple experiments to identify functional correlations among them. GOEAST also provides rigorous statistical tests to enhance the reliability of analysis results. GOEAST is freely accessible at http://omicslab.genetics.ac.cn/GOEAST/
Application of the Statistical ICA Technique in the DANCE Data Analysis
NASA Astrophysics Data System (ADS)
Baramsai, Bayarbadrakh; Jandel, M.; Bredeweg, T. A.; Rusev, G.; Walker, C. L.; Couture, A.; Mosby, S.; Ullmann, J. L.; Dance Collaboration
2015-10-01
The Detector for Advanced Neutron Capture Experiments (DANCE) at the Los Alamos Neutron Science Center is used to improve our understanding of the neutron capture reaction. DANCE is a highly efficient 4 π γ-ray detector array consisting of 160 BaF2 crystals which make it an ideal tool for neutron capture experiments. The (n, γ) reaction Q-value equals to the sum energy of all γ-rays emitted in the de-excitation cascades from the excited capture state to the ground state. The total γ-ray energy is used to identify reactions on different isotopes as well as the background. However, it's challenging to identify contribution in the Esum spectra from different isotopes with the similar Q-values. Recently we have tested the applicability of modern statistical methods such as Independent Component Analysis (ICA) to identify and separate different (n, γ) reaction yields on different isotopes that are present in the target material. ICA is a recently developed computational tool for separating multidimensional data into statistically independent additive subcomponents. In this conference talk, we present some results of the application of ICA algorithms and its modification for the DANCE experimental data analysis. This research is supported by the U. S. Department of Energy, Office of Science, Nuclear Physics under the Early Career Award No. LANL20135009.
Data Analysis and Statistical Methods for the Assessment and Interpretation of Geochronologic Data
NASA Astrophysics Data System (ADS)
Reno, B. L.; Brown, M.; Piccoli, P. M.
2007-12-01
Ages are traditionally reported as a weighted mean with an uncertainty based on least squares analysis of analytical error on individual dates. This method does not take into account geological uncertainties, and cannot accommodate asymmetries in the data. In most instances, this method will understate uncertainty on a given age, which may lead to over interpretation of age data. Geologic uncertainty is difficult to quantify, but is typically greater than analytical uncertainty. These factors make traditional statistical approaches inadequate to fully evaluate geochronologic data. We propose a protocol to assess populations within multi-event datasets and to calculate age and uncertainty from each population of dates interpreted to represent a single geologic event using robust and resistant statistical methods. To assess whether populations thought to represent different events are statistically separate exploratory data analysis is undertaken using a box plot, where the range of the data is represented by a 'box' of length given by the interquartile range, divided at the median of the data, with 'whiskers' that extend to the furthest datapoint that lies within 1.5 times the interquartile range beyond the box. If the boxes representing the populations do not overlap, they are interpreted to represent statistically different sets of dates. Ages are calculated from statistically distinct populations using a robust tool such as the tanh method of Kelsey et al. (2003, CMP, 146, 326-340), which is insensitive to any assumptions about the underlying probability distribution from which the data are drawn. Therefore, this method takes into account the full range of data, and is not drastically affected by outliers. The interquartile range of each population of dates (the interquartile range) gives a first pass at expressing uncertainty, which accommodates asymmetry in the dataset; outliers have a minor affect on the uncertainty. To better quantify the uncertainty, a resistant tool that is insensitive to local misbehavior of data is preferred, such as the normalized median absolute deviations proposed by Powell et al. (2002, Chem Geol, 185, 191-204). We illustrate the method using a dataset of 152 monazite dates determined using EPMA chemical data from a single sample from the Neoproterozoic Brasília Belt, Brazil. Results are compared with ages and uncertainties calculated using traditional methods to demonstrate the differences. The dataset was manually culled into three populations representing discrete compositional domains within chemically-zoned monazite grains. The weighted mean ages and least squares uncertainties for these populations are 633±6 (2σ) Ma for a core domain, 614±5 (2σ) Ma for an intermediate domain and 595±6 (2σ) Ma for a rim domain. Probability distribution plots indicate asymmetric distributions of all populations, which cannot be accounted for with traditional statistical tools. These three domains record distinct ages outside the interquartile range for each population of dates, with the core domain lying in the subrange 642-624 Ma, the intermediate domain 617-609 Ma and the rim domain 606-589 Ma. The tanh estimator yields ages of 631±7 (2σ) for the core domain, 616±7 (2σ) for the intermediate domain and 601±8 (2σ) for the rim domain. Whereas the uncertainties derived using a resistant statistical tool are larger than those derived from traditional statistical tools, the method yields more realistic uncertainties that better address the spread in the dataset and account for asymmetry in the data.
Applied Meteorology Unit (AMU) Quarterly Report Third Quarter FY-08
NASA Technical Reports Server (NTRS)
Bauman, William; Crawford, Winifred; Barrett, Joe; Watson, Leela; Dreher, Joseph
2008-01-01
This report summarizes the Applied Meteorology Unit (AMU) activities for the third quarter of Fiscal Year 2008 (April - June 2008). Tasks reported on are: Peak Wind Tool for User Launch Commit Criteria (LCC), Anvil Forecast Tool in AWIPS Phase II, Completion of the Edward Air Force Base (EAFB) Statistical Guidance Wind Tool, Volume Averaged Height Integ rated Radar Reflectivity (VAHIRR), Impact of Local Sensors, Radar Scan Strategies for the PAFB WSR-74C Replacement, VAHIRR Cost Benefit Analysis, and WRF Wind Sensitivity Study at Edwards Air Force Base
He, W; Zhao, S; Liu, X; Dong, S; Lv, J; Liu, D; Wang, J; Meng, Z
2013-12-04
Large-scale next-generation sequencing (NGS)-based resequencing detects sequence variations, constructs evolutionary histories, and identifies phenotype-related genotypes. However, NGS-based resequencing studies generate extraordinarily large amounts of data, making computations difficult. Effective use and analysis of these data for NGS-based resequencing studies remains a difficult task for individual researchers. Here, we introduce ReSeqTools, a full-featured toolkit for NGS (Illumina sequencing)-based resequencing analysis, which processes raw data, interprets mapping results, and identifies and annotates sequence variations. ReSeqTools provides abundant scalable functions for routine resequencing analysis in different modules to facilitate customization of the analysis pipeline. ReSeqTools is designed to use compressed data files as input or output to save storage space and facilitates faster and more computationally efficient large-scale resequencing studies in a user-friendly manner. It offers abundant practical functions and generates useful statistics during the analysis pipeline, which significantly simplifies resequencing analysis. Its integrated algorithms and abundant sub-functions provide a solid foundation for special demands in resequencing projects. Users can combine these functions to construct their own pipelines for other purposes.
Statistical process control: separating signal from noise in emergency department operations.
Pimentel, Laura; Barrueto, Fermin
2015-05-01
Statistical process control (SPC) is a visually appealing and statistically rigorous methodology very suitable to the analysis of emergency department (ED) operations. We demonstrate that the control chart is the primary tool of SPC; it is constructed by plotting data measuring the key quality indicators of operational processes in rationally ordered subgroups such as units of time. Control limits are calculated using formulas reflecting the variation in the data points from one another and from the mean. SPC allows managers to determine whether operational processes are controlled and predictable. We review why the moving range chart is most appropriate for use in the complex ED milieu, how to apply SPC to ED operations, and how to determine when performance improvement is needed. SPC is an excellent tool for operational analysis and quality improvement for these reasons: 1) control charts make large data sets intuitively coherent by integrating statistical and visual descriptions; 2) SPC provides analysis of process stability and capability rather than simple comparison with a benchmark; 3) SPC allows distinction between special cause variation (signal), indicating an unstable process requiring action, and common cause variation (noise), reflecting a stable process; and 4) SPC keeps the focus of quality improvement on process rather than individual performance. Because data have no meaning apart from their context, and every process generates information that can be used to improve it, we contend that SPC should be seriously considered for driving quality improvement in emergency medicine. Copyright © 2015 Elsevier Inc. All rights reserved.
Crux: Rapid Open Source Protein Tandem Mass Spectrometry Analysis
2015-01-01
Efficiently and accurately analyzing big protein tandem mass spectrometry data sets requires robust software that incorporates state-of-the-art computational, machine learning, and statistical methods. The Crux mass spectrometry analysis software toolkit (http://cruxtoolkit.sourceforge.net) is an open source project that aims to provide users with a cross-platform suite of analysis tools for interpreting protein mass spectrometry data. PMID:25182276
NASA Astrophysics Data System (ADS)
Rose, Michael Benjamin
A novel trajectory and attitude control and navigation analysis tool for powered ascent is developed. The tool is capable of rapid trade-space analysis and is designed to ultimately reduce turnaround time for launch vehicle design, mission planning, and redesign work. It is streamlined to quickly determine trajectory and attitude control dispersions, propellant dispersions, orbit insertion dispersions, and navigation errors and their sensitivities to sensor errors, actuator execution uncertainties, and random disturbances. The tool is developed by applying both Monte Carlo and linear covariance analysis techniques to a closed-loop, launch vehicle guidance, navigation, and control (GN&C) system. The nonlinear dynamics and flight GN&C software models of a closed-loop, six-degree-of-freedom (6-DOF), Monte Carlo simulation are formulated and developed. The nominal reference trajectory (NRT) for the proposed lunar ascent trajectory is defined and generated. The Monte Carlo truth models and GN&C algorithms are linearized about the NRT, the linear covariance equations are formulated, and the linear covariance simulation is developed. The performance of the launch vehicle GN&C system is evaluated using both Monte Carlo and linear covariance techniques and their trajectory and attitude control dispersion, propellant dispersion, orbit insertion dispersion, and navigation error results are validated and compared. Statistical results from linear covariance analysis are generally within 10% of Monte Carlo results, and in most cases the differences are less than 5%. This is an excellent result given the many complex nonlinearities that are embedded in the ascent GN&C problem. Moreover, the real value of this tool lies in its speed, where the linear covariance simulation is 1036.62 times faster than the Monte Carlo simulation. Although the application and results presented are for a lunar, single-stage-to-orbit (SSTO), ascent vehicle, the tools, techniques, and mathematical formulations that are discussed are applicable to ascent on Earth or other planets as well as other rocket-powered systems such as sounding rockets and ballistic missiles.
Statistical strategy for anisotropic adventitia modelling in IVUS.
Gil, Debora; Hernández, Aura; Rodriguez, Oriol; Mauri, Josepa; Radeva, Petia
2006-06-01
Vessel plaque assessment by analysis of intravascular ultrasound sequences is a useful tool for cardiac disease diagnosis and intervention. Manual detection of luminal (inner) and media-adventitia (external) vessel borders is the main activity of physicians in the process of lumen narrowing (plaque) quantification. Difficult definition of vessel border descriptors, as well as, shades, artifacts, and blurred signal response due to ultrasound physical properties trouble automated adventitia segmentation. In order to efficiently approach such a complex problem, we propose blending advanced anisotropic filtering operators and statistical classification techniques into a vessel border modelling strategy. Our systematic statistical analysis shows that the reported adventitia detection achieves an accuracy in the range of interobserver variability regardless of plaque nature, vessel geometry, and incomplete vessel borders.
Holmes, Susan; Alekseyenko, Alexander; Timme, Alden; Nelson, Tyrrell; Pasricha, Pankaj Jay; Spormann, Alfred
2011-01-01
This article explains the statistical and computational methodology used to analyze species abundances collected using the LNBL Phylochip in a study of Irritable Bowel Syndrome (IBS) in rats. Some tools already available for the analysis of ordinary microarray data are useful in this type of statistical analysis. For instance in correcting for multiple testing we use Family Wise Error rate control and step-down tests (available in the multtest package). Once the most significant species are chosen we use the hypergeometric tests familiar for testing GO categories to test specific phyla and families. We provide examples of normalization, multivariate projections, batch effect detection and integration of phylogenetic covariation, as well as tree equalization and robustification methods.
Analysis of Student Activity in Web-Supported Courses as a Tool for Predicting Dropout
ERIC Educational Resources Information Center
Cohen, Anat
2017-01-01
Persistence in learning processes is perceived as a central value; therefore, dropouts from studies are a prime concern for educators. This study focuses on the quantitative analysis of data accumulated on 362 students in three academic course website log files in the disciplines of mathematics and statistics, in order to examine whether student…
ERIC Educational Resources Information Center
Kennedy, Kate; Peters, Mary; Thomas, Mike
2012-01-01
Value-added analysis is the most robust, statistically significant method available for helping educators quantify student progress over time. This powerful tool also reveals tangible strategies for improving instruction. Built around the work of Battelle for Kids, this book provides a field-tested continuous improvement model for using…
ERIC Educational Resources Information Center
Ammentorp, William
There is much to be gained by using systems analysis in educational administration. Most administrators, presently relying on classical statistical techniques restricted to problems having few variables, should be trained to use more sophisticated tools such as systems analysis. The systems analyst, interested in the basic processes of a group or…
A Content Analysis of Dissertations in the Field of Educational Technology: The Case of Turkey
ERIC Educational Resources Information Center
Durak, Gurhan; Cankaya, Serkan; Yunkul, Eyup; Misirli, Zeynel Abidin
2018-01-01
The present study aimed at conducting content analysis on dissertations carried out so far in the field of Educational Technology in Turkey. A total of 137 dissertations were examined to determine the key words, academic discipline, research areas, theoretical frameworks, research designs and models, statistical analyses, data collection tools,…
Biological Parametric Mapping: A Statistical Toolbox for Multi-Modality Brain Image Analysis
Casanova, Ramon; Ryali, Srikanth; Baer, Aaron; Laurienti, Paul J.; Burdette, Jonathan H.; Hayasaka, Satoru; Flowers, Lynn; Wood, Frank; Maldjian, Joseph A.
2006-01-01
In recent years multiple brain MR imaging modalities have emerged; however, analysis methodologies have mainly remained modality specific. In addition, when comparing across imaging modalities, most researchers have been forced to rely on simple region-of-interest type analyses, which do not allow the voxel-by-voxel comparisons necessary to answer more sophisticated neuroscience questions. To overcome these limitations, we developed a toolbox for multimodal image analysis called biological parametric mapping (BPM), based on a voxel-wise use of the general linear model. The BPM toolbox incorporates information obtained from other modalities as regressors in a voxel-wise analysis, thereby permitting investigation of more sophisticated hypotheses. The BPM toolbox has been developed in MATLAB with a user friendly interface for performing analyses, including voxel-wise multimodal correlation, ANCOVA, and multiple regression. It has a high degree of integration with the SPM (statistical parametric mapping) software relying on it for visualization and statistical inference. Furthermore, statistical inference for a correlation field, rather than a widely-used T-field, has been implemented in the correlation analysis for more accurate results. An example with in-vivo data is presented demonstrating the potential of the BPM methodology as a tool for multimodal image analysis. PMID:17070709
Inflammatory Cytokines as Preclinical Markers of Adverse Responses to Chemical Stressors
Abstract: The in vivo cytokine response to chemical stressors is a promising mainstream tool used to assess potential systemic inflammation and immune function changes. Notably, new instrumentation and statistical analysis provide the selectivity and sensitivity to rapidly diff...
Bumm, Klaus; Zheng, Mingzhong; Bailey, Clyde; Zhan, Fenghuang; Chiriva-Internati, M; Eddlemon, Paul; Terry, Julian; Barlogie, Bart; Shaughnessy, John D
2002-02-01
Clinical GeneOrganizer (CGO) is a novel windows-based archiving, organization and data mining software for the integration of gene expression profiling in clinical medicine. The program implements various user-friendly tools and extracts data for further statistical analysis. This software was written for Affymetrix GeneChip *.txt files, but can also be used for any other microarray-derived data. The MS-SQL server version acts as a data mart and links microarray data with clinical parameters of any other existing database and therefore represents a valuable tool for combining gene expression analysis and clinical disease characteristics.
EEG analysis using wavelet-based information tools.
Rosso, O A; Martin, M T; Figliola, A; Keller, K; Plastino, A
2006-06-15
Wavelet-based informational tools for quantitative electroencephalogram (EEG) record analysis are reviewed. Relative wavelet energies, wavelet entropies and wavelet statistical complexities are used in the characterization of scalp EEG records corresponding to secondary generalized tonic-clonic epileptic seizures. In particular, we show that the epileptic recruitment rhythm observed during seizure development is well described in terms of the relative wavelet energies. In addition, during the concomitant time-period the entropy diminishes while complexity grows. This is construed as evidence supporting the conjecture that an epileptic focus, for this kind of seizures, triggers a self-organized brain state characterized by both order and maximal complexity.
Zhu, Yun; Fan, Ruzong; Xiong, Momiao
2017-01-01
Investigating the pleiotropic effects of genetic variants can increase statistical power, provide important information to achieve deep understanding of the complex genetic structures of disease, and offer powerful tools for designing effective treatments with fewer side effects. However, the current multiple phenotype association analysis paradigm lacks breadth (number of phenotypes and genetic variants jointly analyzed at the same time) and depth (hierarchical structure of phenotype and genotypes). A key issue for high dimensional pleiotropic analysis is to effectively extract informative internal representation and features from high dimensional genotype and phenotype data. To explore correlation information of genetic variants, effectively reduce data dimensions, and overcome critical barriers in advancing the development of novel statistical methods and computational algorithms for genetic pleiotropic analysis, we proposed a new statistic method referred to as a quadratically regularized functional CCA (QRFCCA) for association analysis which combines three approaches: (1) quadratically regularized matrix factorization, (2) functional data analysis and (3) canonical correlation analysis (CCA). Large-scale simulations show that the QRFCCA has a much higher power than that of the ten competing statistics while retaining the appropriate type 1 errors. To further evaluate performance, the QRFCCA and ten other statistics are applied to the whole genome sequencing dataset from the TwinsUK study. We identify a total of 79 genes with rare variants and 67 genes with common variants significantly associated with the 46 traits using QRFCCA. The results show that the QRFCCA substantially outperforms the ten other statistics. PMID:29040274
2008 Homeland Security S and T Stakeholders Conference West-Volume 3 Tuesday
2008-01-16
Architecture ( PNNL SRS) • Online data collection / entry • Data Warehouse • On Demand Analysis and Reporting Tools • Reports, Charts & Graphs • Visual / Data...Sustainability 2007– 2016 Our region wide investment include all PANYNJ business areas Computer Statistical Analysis COMPSTAT •NYPD 1990’s •Personnel Management...Coast Guard, and public health Expertise, Depth, Agility Staff Degrees 6 Our Value Added Capabilities • Risk Analysis • Operations Analysis
Rock, Adam J.; Coventry, William L.; Morgan, Methuen I.; Loi, Natasha M.
2016-01-01
Generally, academic psychologists are mindful of the fact that, for many students, the study of research methods and statistics is anxiety provoking (Gal et al., 1997). Given the ubiquitous and distributed nature of eLearning systems (Nof et al., 2015), teachers of research methods and statistics need to cultivate an understanding of how to effectively use eLearning tools to inspire psychology students to learn. Consequently, the aim of the present paper is to discuss critically how using eLearning systems might engage psychology students in research methods and statistics. First, we critically appraise definitions of eLearning. Second, we examine numerous important pedagogical principles associated with effectively teaching research methods and statistics using eLearning systems. Subsequently, we provide practical examples of our own eLearning-based class activities designed to engage psychology students to learn statistical concepts such as Factor Analysis and Discriminant Function Analysis. Finally, we discuss general trends in eLearning and possible futures that are pertinent to teachers of research methods and statistics in psychology. PMID:27014147
Rock, Adam J; Coventry, William L; Morgan, Methuen I; Loi, Natasha M
2016-01-01
Generally, academic psychologists are mindful of the fact that, for many students, the study of research methods and statistics is anxiety provoking (Gal et al., 1997). Given the ubiquitous and distributed nature of eLearning systems (Nof et al., 2015), teachers of research methods and statistics need to cultivate an understanding of how to effectively use eLearning tools to inspire psychology students to learn. Consequently, the aim of the present paper is to discuss critically how using eLearning systems might engage psychology students in research methods and statistics. First, we critically appraise definitions of eLearning. Second, we examine numerous important pedagogical principles associated with effectively teaching research methods and statistics using eLearning systems. Subsequently, we provide practical examples of our own eLearning-based class activities designed to engage psychology students to learn statistical concepts such as Factor Analysis and Discriminant Function Analysis. Finally, we discuss general trends in eLearning and possible futures that are pertinent to teachers of research methods and statistics in psychology.
Managing complex research datasets using electronic tools: A meta-analysis exemplar
Brown, Sharon A.; Martin, Ellen E.; Garcia, Theresa J.; Winter, Mary A.; García, Alexandra A.; Brown, Adama; Cuevas, Heather E.; Sumlin, Lisa L.
2013-01-01
Meta-analyses of broad scope and complexity require investigators to organize many study documents and manage communication among several research staff. Commercially available electronic tools, e.g., EndNote, Adobe Acrobat Pro, Blackboard, Excel, and IBM SPSS Statistics (SPSS), are useful for organizing and tracking the meta-analytic process, as well as enhancing communication among research team members. The purpose of this paper is to describe the electronic processes we designed, using commercially available software, for an extensive quantitative model-testing meta-analysis we are conducting. Specific electronic tools improved the efficiency of (a) locating and screening studies, (b) screening and organizing studies and other project documents, (c) extracting data from primary studies, (d) checking data accuracy and analyses, and (e) communication among team members. The major limitation in designing and implementing a fully electronic system for meta-analysis was the requisite upfront time to: decide on which electronic tools to use, determine how these tools would be employed, develop clear guidelines for their use, and train members of the research team. The electronic process described here has been useful in streamlining the process of conducting this complex meta-analysis and enhancing communication and sharing documents among research team members. PMID:23681256
Managing complex research datasets using electronic tools: a meta-analysis exemplar.
Brown, Sharon A; Martin, Ellen E; Garcia, Theresa J; Winter, Mary A; García, Alexandra A; Brown, Adama; Cuevas, Heather E; Sumlin, Lisa L
2013-06-01
Meta-analyses of broad scope and complexity require investigators to organize many study documents and manage communication among several research staff. Commercially available electronic tools, for example, EndNote, Adobe Acrobat Pro, Blackboard, Excel, and IBM SPSS Statistics (SPSS), are useful for organizing and tracking the meta-analytic process as well as enhancing communication among research team members. The purpose of this article is to describe the electronic processes designed, using commercially available software, for an extensive, quantitative model-testing meta-analysis. Specific electronic tools improved the efficiency of (a) locating and screening studies, (b) screening and organizing studies and other project documents, (c) extracting data from primary studies, (d) checking data accuracy and analyses, and (e) communication among team members. The major limitation in designing and implementing a fully electronic system for meta-analysis was the requisite upfront time to decide on which electronic tools to use, determine how these tools would be used, develop clear guidelines for their use, and train members of the research team. The electronic process described here has been useful in streamlining the process of conducting this complex meta-analysis and enhancing communication and sharing documents among research team members.
Advanced Vibration Analysis Tool Developed for Robust Engine Rotor Designs
NASA Technical Reports Server (NTRS)
Min, James B.
2005-01-01
The primary objective of this research program is to develop vibration analysis tools, design tools, and design strategies to significantly improve the safety and robustness of turbine engine rotors. Bladed disks in turbine engines always feature small, random blade-to-blade differences, or mistuning. Mistuning can lead to a dramatic increase in blade forced-response amplitudes and stresses. Ultimately, this results in high-cycle fatigue, which is a major safety and cost concern. In this research program, the necessary steps will be taken to transform a state-of-the-art vibration analysis tool, the Turbo- Reduce forced-response prediction code, into an effective design tool by enhancing and extending the underlying modeling and analysis methods. Furthermore, novel techniques will be developed to assess the safety of a given design. In particular, a procedure will be established for using natural-frequency curve veerings to identify ranges of operating conditions (rotational speeds and engine orders) in which there is a great risk that the rotor blades will suffer high stresses. This work also will aid statistical studies of the forced response by reducing the necessary number of simulations. Finally, new strategies for improving the design of rotors will be pursued.
Schwämmle, Veit; León, Ileana Rodríguez; Jensen, Ole Nørregaard
2013-09-06
Large-scale quantitative analyses of biological systems are often performed with few replicate experiments, leading to multiple nonidentical data sets due to missing values. For example, mass spectrometry driven proteomics experiments are frequently performed with few biological or technical replicates due to sample-scarcity or due to duty-cycle or sensitivity constraints, or limited capacity of the available instrumentation, leading to incomplete results where detection of significant feature changes becomes a challenge. This problem is further exacerbated for the detection of significant changes on the peptide level, for example, in phospho-proteomics experiments. In order to assess the extent of this problem and the implications for large-scale proteome analysis, we investigated and optimized the performance of three statistical approaches by using simulated and experimental data sets with varying numbers of missing values. We applied three tools, including standard t test, moderated t test, also known as limma, and rank products for the detection of significantly changing features in simulated and experimental proteomics data sets with missing values. The rank product method was improved to work with data sets containing missing values. Extensive analysis of simulated and experimental data sets revealed that the performance of the statistical analysis tools depended on simple properties of the data sets. High-confidence results were obtained by using the limma and rank products methods for analyses of triplicate data sets that exhibited more than 1000 features and more than 50% missing values. The maximum number of differentially represented features was identified by using limma and rank products methods in a complementary manner. We therefore recommend combined usage of these methods as a novel and optimal way to detect significantly changing features in these data sets. This approach is suitable for large quantitative data sets from stable isotope labeling and mass spectrometry experiments and should be applicable to large data sets of any type. An R script that implements the improved rank products algorithm and the combined analysis is available.
NASA Technical Reports Server (NTRS)
Tripp, John S.; Tcheng, Ping
1999-01-01
Statistical tools, previously developed for nonlinear least-squares estimation of multivariate sensor calibration parameters and the associated calibration uncertainty analysis, have been applied to single- and multiple-axis inertial model attitude sensors used in wind tunnel testing to measure angle of attack and roll angle. The analysis provides confidence and prediction intervals of calibrated sensor measurement uncertainty as functions of applied input pitch and roll angles. A comparative performance study of various experimental designs for inertial sensor calibration is presented along with corroborating experimental data. The importance of replicated calibrations over extended time periods has been emphasized; replication provides independent estimates of calibration precision and bias uncertainties, statistical tests for calibration or modeling bias uncertainty, and statistical tests for sensor parameter drift over time. A set of recommendations for a new standardized model attitude sensor calibration method and usage procedures is included. The statistical information provided by these procedures is necessary for the uncertainty analysis of aerospace test results now required by users of industrial wind tunnel test facilities.
MDTraj: A Modern Open Library for the Analysis of Molecular Dynamics Trajectories.
McGibbon, Robert T; Beauchamp, Kyle A; Harrigan, Matthew P; Klein, Christoph; Swails, Jason M; Hernández, Carlos X; Schwantes, Christian R; Wang, Lee-Ping; Lane, Thomas J; Pande, Vijay S
2015-10-20
As molecular dynamics (MD) simulations continue to evolve into powerful computational tools for studying complex biomolecular systems, the necessity of flexible and easy-to-use software tools for the analysis of these simulations is growing. We have developed MDTraj, a modern, lightweight, and fast software package for analyzing MD simulations. MDTraj reads and writes trajectory data in a wide variety of commonly used formats. It provides a large number of trajectory analysis capabilities including minimal root-mean-square-deviation calculations, secondary structure assignment, and the extraction of common order parameters. The package has a strong focus on interoperability with the wider scientific Python ecosystem, bridging the gap between MD data and the rapidly growing collection of industry-standard statistical analysis and visualization tools in Python. MDTraj is a powerful and user-friendly software package that simplifies the analysis of MD data and connects these datasets with the modern interactive data science software ecosystem in Python. Copyright © 2015 Biophysical Society. Published by Elsevier Inc. All rights reserved.
MDTraj: A Modern Open Library for the Analysis of Molecular Dynamics Trajectories
McGibbon, Robert T.; Beauchamp, Kyle A.; Harrigan, Matthew P.; Klein, Christoph; Swails, Jason M.; Hernández, Carlos X.; Schwantes, Christian R.; Wang, Lee-Ping; Lane, Thomas J.; Pande, Vijay S.
2015-01-01
As molecular dynamics (MD) simulations continue to evolve into powerful computational tools for studying complex biomolecular systems, the necessity of flexible and easy-to-use software tools for the analysis of these simulations is growing. We have developed MDTraj, a modern, lightweight, and fast software package for analyzing MD simulations. MDTraj reads and writes trajectory data in a wide variety of commonly used formats. It provides a large number of trajectory analysis capabilities including minimal root-mean-square-deviation calculations, secondary structure assignment, and the extraction of common order parameters. The package has a strong focus on interoperability with the wider scientific Python ecosystem, bridging the gap between MD data and the rapidly growing collection of industry-standard statistical analysis and visualization tools in Python. MDTraj is a powerful and user-friendly software package that simplifies the analysis of MD data and connects these datasets with the modern interactive data science software ecosystem in Python. PMID:26488642
SECIMTools: a suite of metabolomics data analysis tools.
Kirpich, Alexander S; Ibarra, Miguel; Moskalenko, Oleksandr; Fear, Justin M; Gerken, Joseph; Mi, Xinlei; Ashrafi, Ali; Morse, Alison M; McIntyre, Lauren M
2018-04-20
Metabolomics has the promise to transform the area of personalized medicine with the rapid development of high throughput technology for untargeted analysis of metabolites. Open access, easy to use, analytic tools that are broadly accessible to the biological community need to be developed. While technology used in metabolomics varies, most metabolomics studies have a set of features identified. Galaxy is an open access platform that enables scientists at all levels to interact with big data. Galaxy promotes reproducibility by saving histories and enabling the sharing workflows among scientists. SECIMTools (SouthEast Center for Integrated Metabolomics) is a set of Python applications that are available both as standalone tools and wrapped for use in Galaxy. The suite includes a comprehensive set of quality control metrics (retention time window evaluation and various peak evaluation tools), visualization techniques (hierarchical cluster heatmap, principal component analysis, modular modularity clustering), basic statistical analysis methods (partial least squares - discriminant analysis, analysis of variance, t-test, Kruskal-Wallis non-parametric test), advanced classification methods (random forest, support vector machines), and advanced variable selection tools (least absolute shrinkage and selection operator LASSO and Elastic Net). SECIMTools leverages the Galaxy platform and enables integrated workflows for metabolomics data analysis made from building blocks designed for easy use and interpretability. Standard data formats and a set of utilities allow arbitrary linkages between tools to encourage novel workflow designs. The Galaxy framework enables future data integration for metabolomics studies with other omics data.
Power-law statistics of neurophysiological processes analyzed using short signals
NASA Astrophysics Data System (ADS)
Pavlova, Olga N.; Runnova, Anastasiya E.; Pavlov, Alexey N.
2018-04-01
We discuss the problem of quantifying power-law statistics of complex processes from short signals. Based on the analysis of electroencephalograms (EEG) we compare three interrelated approaches which enable characterization of the power spectral density (PSD) and show that an application of the detrended fluctuation analysis (DFA) or the wavelet-transform modulus maxima (WTMM) method represents a useful way of indirect characterization of the PSD features from short data sets. We conclude that despite DFA- and WTMM-based measures can be obtained from the estimated PSD, these tools outperform the standard spectral analysis when characterization of the analyzed regime should be provided based on a very limited amount of data.
DOE Office of Scientific and Technical Information (OSTI.GOV)
McCord, R.A.; Olson, R.J.
1988-01-01
Environmental research and assessment activities at Oak Ridge National Laboratory (ORNL) include the analysis of spatial and temporal patterns of ecosystem response at a landscape scale. Analysis through use of geographic information system (GIS) involves an interaction between the user and thematic data sets frequently expressed as maps. A portion of GIS analysis has a mathematical or statistical aspect, especially for the analysis of temporal patterns. ARC/INFO is an excellent tool for manipulating GIS data and producing the appropriate map graphics. INFO also has some limited ability to produce statistical tabulation. At ORNL we have extended our capabilities by graphicallymore » interfacing ARC/INFO and SAS/GRAPH to provide a combined mapping and statistical graphics environment. With the data management, statistical, and graphics capabilities of SAS added to ARC/INFO, we have expanded the analytical and graphical dimensions of the GIS environment. Pie or bar charts, frequency curves, hydrographs, or scatter plots as produced by SAS can be added to maps from attribute data associated with ARC/INFO coverages. Numerous, small, simplified graphs can also become a source of complex map ''symbols.'' These additions extend the dimensions of GIS graphics to include time, details of the thematic composition, distribution, and interrelationships. 7 refs., 3 figs.« less
Automated clustering-based workload characterization
NASA Technical Reports Server (NTRS)
Pentakalos, Odysseas I.; Menasce, Daniel A.; Yesha, Yelena
1996-01-01
The demands placed on the mass storage systems at various federal agencies and national laboratories are continuously increasing in intensity. This forces system managers to constantly monitor the system, evaluate the demand placed on it, and tune it appropriately using either heuristics based on experience or analytic models. Performance models require an accurate workload characterization. This can be a laborious and time consuming process. It became evident from our experience that a tool is necessary to automate the workload characterization process. This paper presents the design and discusses the implementation of a tool for workload characterization of mass storage systems. The main features of the tool discussed here are: (1)Automatic support for peak-period determination. Histograms of system activity are generated and presented to the user for peak-period determination; (2) Automatic clustering analysis. The data collected from the mass storage system logs is clustered using clustering algorithms and tightness measures to limit the number of generated clusters; (3) Reporting of varied file statistics. The tool computes several statistics on file sizes such as average, standard deviation, minimum, maximum, frequency, as well as average transfer time. These statistics are given on a per cluster basis; (4) Portability. The tool can easily be used to characterize the workload in mass storage systems of different vendors. The user needs to specify through a simple log description language how the a specific log should be interpreted. The rest of this paper is organized as follows. Section two presents basic concepts in workload characterization as they apply to mass storage systems. Section three describes clustering algorithms and tightness measures. The following section presents the architecture of the tool. Section five presents some results of workload characterization using the tool.Finally, section six presents some concluding remarks.
fMRI paradigm designing and post-processing tools
James, Jija S; Rajesh, PG; Chandran, Anuvitha VS; Kesavadas, Chandrasekharan
2014-01-01
In this article, we first review some aspects of functional magnetic resonance imaging (fMRI) paradigm designing for major cognitive functions by using stimulus delivery systems like Cogent, E-Prime, Presentation, etc., along with their technical aspects. We also review the stimulus presentation possibilities (block, event-related) for visual or auditory paradigms and their advantage in both clinical and research setting. The second part mainly focus on various fMRI data post-processing tools such as Statistical Parametric Mapping (SPM) and Brain Voyager, and discuss the particulars of various preprocessing steps involved (realignment, co-registration, normalization, smoothing) in these software and also the statistical analysis principles of General Linear Modeling for final interpretation of a functional activation result. PMID:24851001
Chopra, Vikram; Bairagi, Mukesh; Trivedi, P; Nagar, Mona
2012-01-01
Statistical process control is the application of statistical methods to the measurement and analysis of variation process. Various regulatory authorities such as Validation Guidance for Industry (2011), International Conference on Harmonisation ICH Q10 (2009), the Health Canada guidelines (2009), Health Science Authority, Singapore: Guidance for Product Quality Review (2008), and International Organization for Standardization ISO-9000:2005 provide regulatory support for the application of statistical process control for better process control and understanding. In this study risk assessments, normal probability distributions, control charts, and capability charts are employed for selection of critical quality attributes, determination of normal probability distribution, statistical stability, and capability of production processes, respectively. The objective of this study is to determine tablet production process quality in the form of sigma process capability. By interpreting data and graph trends, forecasting of critical quality attributes, sigma process capability, and stability of process were studied. The overall study contributes to an assessment of process at the sigma level with respect to out-of-specification attributes produced. Finally, the study will point to an area where the application of quality improvement and quality risk assessment principles for achievement of six sigma-capable processes is possible. Statistical process control is the most advantageous tool for determination of the quality of any production process. This tool is new for the pharmaceutical tablet production process. In the case of pharmaceutical tablet production processes, the quality control parameters act as quality assessment parameters. Application of risk assessment provides selection of critical quality attributes among quality control parameters. Sequential application of normality distributions, control charts, and capability analyses provides a valid statistical process control study on process. Interpretation of such a study provides information about stability, process variability, changing of trends, and quantification of process ability against defective production. Comparative evaluation of critical quality attributes by Pareto charts provides the least capable and most variable process that is liable for improvement. Statistical process control thus proves to be an important tool for six sigma-capable process development and continuous quality improvement.
Machine learning patterns for neuroimaging-genetic studies in the cloud.
Da Mota, Benoit; Tudoran, Radu; Costan, Alexandru; Varoquaux, Gaël; Brasche, Goetz; Conrod, Patricia; Lemaitre, Herve; Paus, Tomas; Rietschel, Marcella; Frouin, Vincent; Poline, Jean-Baptiste; Antoniu, Gabriel; Thirion, Bertrand
2014-01-01
Brain imaging is a natural intermediate phenotype to understand the link between genetic information and behavior or brain pathologies risk factors. Massive efforts have been made in the last few years to acquire high-dimensional neuroimaging and genetic data on large cohorts of subjects. The statistical analysis of such data is carried out with increasingly sophisticated techniques and represents a great computational challenge. Fortunately, increasing computational power in distributed architectures can be harnessed, if new neuroinformatics infrastructures are designed and training to use these new tools is provided. Combining a MapReduce framework (TomusBLOB) with machine learning algorithms (Scikit-learn library), we design a scalable analysis tool that can deal with non-parametric statistics on high-dimensional data. End-users describe the statistical procedure to perform and can then test the model on their own computers before running the very same code in the cloud at a larger scale. We illustrate the potential of our approach on real data with an experiment showing how the functional signal in subcortical brain regions can be significantly fit with genome-wide genotypes. This experiment demonstrates the scalability and the reliability of our framework in the cloud with a 2 weeks deployment on hundreds of virtual machines.
ExAtlas: An interactive online tool for meta-analysis of gene expression data.
Sharov, Alexei A; Schlessinger, David; Ko, Minoru S H
2015-12-01
We have developed ExAtlas, an on-line software tool for meta-analysis and visualization of gene expression data. In contrast to existing software tools, ExAtlas compares multi-component data sets and generates results for all combinations (e.g. all gene expression profiles versus all Gene Ontology annotations). ExAtlas handles both users' own data and data extracted semi-automatically from the public repository (GEO/NCBI database). ExAtlas provides a variety of tools for meta-analyses: (1) standard meta-analysis (fixed effects, random effects, z-score, and Fisher's methods); (2) analyses of global correlations between gene expression data sets; (3) gene set enrichment; (4) gene set overlap; (5) gene association by expression profile; (6) gene specificity; and (7) statistical analysis (ANOVA, pairwise comparison, and PCA). ExAtlas produces graphical outputs, including heatmaps, scatter-plots, bar-charts, and three-dimensional images. Some of the most widely used public data sets (e.g. GNF/BioGPS, Gene Ontology, KEGG, GAD phenotypes, BrainScan, ENCODE ChIP-seq, and protein-protein interaction) are pre-loaded and can be used for functional annotations.
2012-01-01
Background It is known from recent studies that more than 90% of human multi-exon genes are subject to Alternative Splicing (AS), a key molecular mechanism in which multiple transcripts may be generated from a single gene. It is widely recognized that a breakdown in AS mechanisms plays an important role in cellular differentiation and pathologies. Polymerase Chain Reactions, microarrays and sequencing technologies have been applied to the study of transcript diversity arising from alternative expression. Last generation Affymetrix GeneChip Human Exon 1.0 ST Arrays offer a more detailed view of the gene expression profile providing information on the AS patterns. The exon array technology, with more than five million data points, can detect approximately one million exons, and it allows performing analyses at both gene and exon level. In this paper we describe BEAT, an integrated user-friendly bioinformatics framework to store, analyze and visualize exon arrays datasets. It combines a data warehouse approach with some rigorous statistical methods for assessing the AS of genes involved in diseases. Meta statistics are proposed as a novel approach to explore the analysis results. BEAT is available at http://beat.ba.itb.cnr.it. Results BEAT is a web tool which allows uploading and analyzing exon array datasets using standard statistical methods and an easy-to-use graphical web front-end. BEAT has been tested on a dataset with 173 samples and tuned using new datasets of exon array experiments from 28 colorectal cancer and 26 renal cell cancer samples produced at the Medical Genetics Unit of IRCCS Casa Sollievo della Sofferenza. To highlight all possible AS events, alternative names, accession Ids, Gene Ontology terms and biochemical pathways annotations are integrated with exon and gene level expression plots. The user can customize the results choosing custom thresholds for the statistical parameters and exploiting the available clinical data of the samples for a multivariate AS analysis. Conclusions Despite exon array chips being widely used for transcriptomics studies, there is a lack of analysis tools offering advanced statistical features and requiring no programming knowledge. BEAT provides a user-friendly platform for a comprehensive study of AS events in human diseases, displaying the analysis results with easily interpretable and interactive tables and graphics. PMID:22536968
NASA Technical Reports Server (NTRS)
Lambert, Winfred; Wheeler, Mark; Roeder, William
2005-01-01
The 45th Weather Squadron (45 WS) at Cape Canaveral Air-Force Station (CCAFS)ln Florida issues a probability of lightning occurrence in their daily 24-hour and weekly planning forecasts. This information is used for general planning of operations at CCAFS and Kennedy Space Center (KSC). These facilities are located in east-central Florida at the east end of a corridor known as 'Lightning Alley', an indication that lightning has a large impact on space-lift operations. Much of the current lightning probability forecast is based on a subjective analysis of model and observational data and an objective forecast tool developed over 30 years ago. The 45 WS requested that a new lightning probability forecast tool based on statistical analysis of more recent historical warm season (May-September) data be developed in order to increase the objectivity of the daily thunderstorm probability forecast. The resulting tool is a set of statistical lightning forecast equations, one for each month of the warm season, that provide a lightning occurrence probability for the day by 1100 UTC (0700 EDT) during the warm season.
THE CAUSAL ANALYSIS / DIAGNOSIS DECISION ...
CADDIS is an on-line decision support system that helps investigators in the regions, states and tribes find, access, organize, use and share information to produce causal evaluations in aquatic systems. It is based on the US EPA's Stressor Identification process which is a formal method for identifying causes of impairments in aquatic systems. CADDIS 2007 increases access to relevant information useful for causal analysis and provides methods and tools that practitioners can use to analyze their own data. The new Candidate Cause section provides overviews of commonly encountered causes of impairments to aquatic systems: metals, sediments, nutrients, flow alteration, temperature, ionic strength, and low dissolved oxygen. CADDIS includes new Conceptual Models that illustrate the relationships from sources to stressors to biological effects. An Interactive Conceptual Model for phosphorus links the diagram with supporting literature citations. The new Analyzing Data section helps practitioners analyze their data sets and interpret and use those results as evidence within the USEPA causal assessment process. Downloadable tools include a graphical user interface statistical package (CADStat), and programs for use with the freeware R statistical package, and a Microsoft Excel template. These tools can be used to quantify associations between causes and biological impairments using innovative methods such as species-sensitivity distributions, biological inferenc
Evidence-based pathology in its second decade: toward probabilistic cognitive computing.
Marchevsky, Alberto M; Walts, Ann E; Wick, Mark R
2017-03-01
Evidence-based pathology advocates using a combination of best available data ("evidence") from the literature and personal experience for the diagnosis, estimation of prognosis, and assessment of other variables that impact individual patient care. Evidence-based pathology relies on systematic reviews of the literature, evaluation of the quality of evidence as categorized by evidence levels and statistical tools such as meta-analyses, estimates of probabilities and odds, and others. However, it is well known that previously "statistically significant" information usually does not accurately forecast the future for individual patients. There is great interest in "cognitive computing" in which "data mining" is combined with "predictive analytics" designed to forecast future events and estimate the strength of those predictions. This study demonstrates the use of IBM Watson Analytics software to evaluate and predict the prognosis of 101 patients with typical and atypical pulmonary carcinoid tumors in which Ki-67 indices have been determined. The results obtained with this system are compared with those previously reported using "routine" statistical software and the help of a professional statistician. IBM Watson Analytics interactively provides statistical results that are comparable to those obtained with routine statistical tools but much more rapidly, with considerably less effort and with interactive graphics that are intuitively easy to apply. It also enables analysis of natural language variables and yields detailed survival predictions for patient subgroups selected by the user. Potential applications of this tool and basic concepts of cognitive computing are discussed. Copyright © 2016 Elsevier Inc. All rights reserved.
MaxEnt, second variation, and generalized statistics
NASA Astrophysics Data System (ADS)
Plastino, A.; Rocca, M. C.
2015-10-01
There are two kinds of Tsallis-probability distributions: heavy tail ones and compact support distributions. We show here, by appeal to functional analysis' tools, that for lower bound Hamiltonians, the second variation's analysis of the entropic functional guarantees that the heavy tail q-distribution constitutes a maximum of Tsallis' entropy. On the other hand, in the compact support instance, a case by case analysis is necessary in order to tackle the issue.
Joyce, Brendan; Lee, Danny; Rubio, Alex; Ogurtsov, Aleksey; Alves, Gelio; Yu, Yi-Kuo
2018-03-15
RAId is a software package that has been actively developed for the past 10 years for computationally and visually analyzing MS/MS data. Founded on rigorous statistical methods, RAId's core program computes accurate E-values for peptides and proteins identified during database searches. Making this robust tool readily accessible for the proteomics community by developing a graphical user interface (GUI) is our main goal here. We have constructed a graphical user interface to facilitate the use of RAId on users' local machines. Written in Java, RAId_GUI not only makes easy executions of RAId but also provides tools for data/spectra visualization, MS-product analysis, molecular isotopic distribution analysis, and graphing the retrieval versus the proportion of false discoveries. The results viewer displays and allows the users to download the analyses results. Both the knowledge-integrated organismal databases and the code package (containing source code, the graphical user interface, and a user manual) are available for download at https://www.ncbi.nlm.nih.gov/CBBresearch/Yu/downloads/raid.html .
SOME STATISTICAL TOOLS FOR EVALUATING COMPUTER SIMULATIONS: A DATA ANALYSIS. (R825381)
The perspectives, information and conclusions conveyed in research project abstracts, progress reports, final reports, journal abstracts and journal publications convey the viewpoints of the principal investigator and may not represent the views and policies of ORD and EPA. Concl...
Quality assurance software inspections at NASA Ames: Metrics for feedback and modification
NASA Technical Reports Server (NTRS)
Wenneson, G.
1985-01-01
Software inspections are a set of formal technical review procedures held at selected key points during software development in order to find defects in software documents--is described in terms of history, participants, tools, procedures, statistics, and database analysis.
Mi, Huaiyu; Huang, Xiaosong; Muruganujan, Anushya; Tang, Haiming; Mills, Caitlin; Kang, Diane; Thomas, Paul D
2017-01-04
The PANTHER database (Protein ANalysis THrough Evolutionary Relationships, http://pantherdb.org) contains comprehensive information on the evolution and function of protein-coding genes from 104 completely sequenced genomes. PANTHER software tools allow users to classify new protein sequences, and to analyze gene lists obtained from large-scale genomics experiments. In the past year, major improvements include a large expansion of classification information available in PANTHER, as well as significant enhancements to the analysis tools. Protein subfamily functional classifications have more than doubled due to progress of the Gene Ontology Phylogenetic Annotation Project. For human genes (as well as a few other organisms), PANTHER now also supports enrichment analysis using pathway classifications from the Reactome resource. The gene list enrichment tools include a new 'hierarchical view' of results, enabling users to leverage the structure of the classifications/ontologies; the tools also allow users to upload genetic variant data directly, rather than requiring prior conversion to a gene list. The updated coding single-nucleotide polymorphisms (SNP) scoring tool uses an improved algorithm. The hidden Markov model (HMM) search tools now use HMMER3, dramatically reducing search times and improving accuracy of E-value statistics. Finally, the PANTHER Tree-Attribute Viewer has been implemented in JavaScript, with new views for exploring protein sequence evolution. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
Huang, Ying; Li, Cao; Liu, Linhai; Jia, Xianbo; Lai, Song-Jia
2016-01-01
Although various computer tools have been elaborately developed to calculate a series of statistics in molecular population genetics for both small- and large-scale DNA data, there is no efficient and easy-to-use toolkit available yet for exclusively focusing on the steps of mathematical calculation. Here, we present PopSc, a bioinformatic toolkit for calculating 45 basic statistics in molecular population genetics, which could be categorized into three classes, including (i) genetic diversity of DNA sequences, (ii) statistical tests for neutral evolution, and (iii) measures of genetic differentiation among populations. In contrast to the existing computer tools, PopSc was designed to directly accept the intermediate metadata, such as allele frequencies, rather than the raw DNA sequences or genotyping results. PopSc is first implemented as the web-based calculator with user-friendly interface, which greatly facilitates the teaching of population genetics in class and also promotes the convenient and straightforward calculation of statistics in research. Additionally, we also provide the Python library and R package of PopSc, which can be flexibly integrated into other advanced bioinformatic packages of population genetics analysis. PMID:27792763
Chen, Shi-Yi; Deng, Feilong; Huang, Ying; Li, Cao; Liu, Linhai; Jia, Xianbo; Lai, Song-Jia
2016-01-01
Although various computer tools have been elaborately developed to calculate a series of statistics in molecular population genetics for both small- and large-scale DNA data, there is no efficient and easy-to-use toolkit available yet for exclusively focusing on the steps of mathematical calculation. Here, we present PopSc, a bioinformatic toolkit for calculating 45 basic statistics in molecular population genetics, which could be categorized into three classes, including (i) genetic diversity of DNA sequences, (ii) statistical tests for neutral evolution, and (iii) measures of genetic differentiation among populations. In contrast to the existing computer tools, PopSc was designed to directly accept the intermediate metadata, such as allele frequencies, rather than the raw DNA sequences or genotyping results. PopSc is first implemented as the web-based calculator with user-friendly interface, which greatly facilitates the teaching of population genetics in class and also promotes the convenient and straightforward calculation of statistics in research. Additionally, we also provide the Python library and R package of PopSc, which can be flexibly integrated into other advanced bioinformatic packages of population genetics analysis.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Apte, A; Veeraraghavan, H; Oh, J
Purpose: To present an open source and free platform to facilitate radiomics research — The “Radiomics toolbox” in CERR. Method: There is scarcity of open source tools that support end-to-end modeling of image features to predict patient outcomes. The “Radiomics toolbox” strives to fill the need for such a software platform. The platform supports (1) import of various kinds of image modalities like CT, PET, MR, SPECT, US. (2) Contouring tools to delineate structures of interest. (3) Extraction and storage of image based features like 1st order statistics, gray-scale co-occurrence and zonesize matrix based texture features and shape features andmore » (4) Statistical Analysis. Statistical analysis of the extracted features is supported with basic functionality that includes univariate correlations, Kaplan-Meir curves and advanced functionality that includes feature reduction and multivariate modeling. The graphical user interface and the data management are performed with Matlab for the ease of development and readability of code and features for wide audience. Open-source software developed with other programming languages is integrated to enhance various components of this toolbox. For example: Java-based DCM4CHE for import of DICOM, R for statistical analysis. Results: The Radiomics toolbox will be distributed as an open source, GNU copyrighted software. The toolbox was prototyped for modeling Oropharyngeal PET dataset at MSKCC. The analysis will be presented in a separate paper. Conclusion: The Radiomics Toolbox provides an extensible platform for extracting and modeling image features. To emphasize new uses of CERR for radiomics and image-based research, we have changed the name from the “Computational Environment for Radiotherapy Research” to the “Computational Environment for Radiological Research”.« less
Cer, Regina Z; Herrera-Galeano, J Enrique; Anderson, Joseph J; Bishop-Lilly, Kimberly A; Mokashi, Vishwesh P
2014-01-01
Understanding the biological roles of microRNAs (miRNAs) is a an active area of research that has produced a surge of publications in PubMed, particularly in cancer research. Along with this increasing interest, many open-source bioinformatics tools to identify existing and/or discover novel miRNAs in next-generation sequencing (NGS) reads become available. While miRNA identification and discovery tools are significantly improved, the development of miRNA differential expression analysis tools, especially in temporal studies, remains substantially challenging. Further, the installation of currently available software is non-trivial and steps of testing with example datasets, trying with one's own dataset, and interpreting the results require notable expertise and time. Subsequently, there is a strong need for a tool that allows scientists to normalize raw data, perform statistical analyses, and provide intuitive results without having to invest significant efforts. We have developed miRNA Temporal Analyzer (mirnaTA), a bioinformatics package to identify differentially expressed miRNAs in temporal studies. mirnaTA is written in Perl and R (Version 2.13.0 or later) and can be run across multiple platforms, such as Linux, Mac and Windows. In the current version, mirnaTA requires users to provide a simple, tab-delimited, matrix file containing miRNA name and count data from a minimum of two to a maximum of 20 time points and three replicates. To recalibrate data and remove technical variability, raw data is normalized using Normal Quantile Transformation (NQT), and linear regression model is used to locate any miRNAs which are differentially expressed in a linear pattern. Subsequently, remaining miRNAs which do not fit a linear model are further analyzed in two different non-linear methods 1) cumulative distribution function (CDF) or 2) analysis of variances (ANOVA). After both linear and non-linear analyses are completed, statistically significant miRNAs (P < 0.05) are plotted as heat maps using hierarchical cluster analysis and Euclidean distance matrix computation methods. mirnaTA is an open-source, bioinformatics tool to aid scientists in identifying differentially expressed miRNAs which could be further mined for biological significance. It is expected to provide researchers with a means of interpreting raw data to statistical summaries in a fast and intuitive manner.
Statistics, Structures & Satisfied Customers: Using Web Log Data to Improve Site Performance.
ERIC Educational Resources Information Center
Peacock, Darren
This paper explores some of the ways in which the National Museum of Australia is using Web analysis tools to shape its future directions in the delivery of online services. In particular, it explores the potential of quantitative analysis, based on Web server log data, to convert these ephemeral traces of user experience into a strategic…
Virtual Tool Mark Generation for Efficient Striation Analysis
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ekstrand, Laura; Zhang, Song; Grieve, Taylor
2014-02-16
This study introduces a tool mark analysis approach based upon 3D scans of screwdriver tip and marked plate surfaces at the micrometer scale from an optical microscope. An open-source 3D graphics software package is utilized to simulate the marking process as the projection of the tip's geometry in the direction of tool travel. The edge of this projection becomes a virtual tool mark that is compared to cross-sections of the marked plate geometry using the statistical likelihood algorithm introduced by Chumbley et al. In a study with both sides of six screwdriver tips and 34 corresponding marks, the method distinguishedmore » known matches from known nonmatches with zero false-positive matches and two false-negative matches. For matches, it could predict the correct marking angle within ±5–10°. Individual comparisons could be made in seconds on a desktop computer, suggesting that the method could save time for examiners.« less
Public data and open source tools for multi-assay genomic investigation of disease.
Kannan, Lavanya; Ramos, Marcel; Re, Angela; El-Hachem, Nehme; Safikhani, Zhaleh; Gendoo, Deena M A; Davis, Sean; Gomez-Cabrero, David; Castelo, Robert; Hansen, Kasper D; Carey, Vincent J; Morgan, Martin; Culhane, Aedín C; Haibe-Kains, Benjamin; Waldron, Levi
2016-07-01
Molecular interrogation of a biological sample through DNA sequencing, RNA and microRNA profiling, proteomics and other assays, has the potential to provide a systems level approach to predicting treatment response and disease progression, and to developing precision therapies. Large publicly funded projects have generated extensive and freely available multi-assay data resources; however, bioinformatic and statistical methods for the analysis of such experiments are still nascent. We review multi-assay genomic data resources in the areas of clinical oncology, pharmacogenomics and other perturbation experiments, population genomics and regulatory genomics and other areas, and tools for data acquisition. Finally, we review bioinformatic tools that are explicitly geared toward integrative genomic data visualization and analysis. This review provides starting points for accessing publicly available data and tools to support development of needed integrative methods. © The Author 2015. Published by Oxford University Press.
Kourgialas, Nektarios N; Dokou, Zoi; Karatzas, George P
2015-05-01
The purpose of this study was to create a modeling management tool for the simulation of extreme flow events under current and future climatic conditions. This tool is a combination of different components and can be applied in complex hydrogeological river basins, where frequent flood and drought phenomena occur. The first component is the statistical analysis of the available hydro-meteorological data. Specifically, principal components analysis was performed in order to quantify the importance of the hydro-meteorological parameters that affect the generation of extreme events. The second component is a prediction-forecasting artificial neural network (ANN) model that simulates, accurately and efficiently, river flow on an hourly basis. This model is based on a methodology that attempts to resolve a very difficult problem related to the accurate estimation of extreme flows. For this purpose, the available measurements (5 years of hourly data) were divided in two subsets: one for the dry and one for the wet periods of the hydrological year. This way, two ANNs were created, trained, tested and validated for a complex Mediterranean river basin in Crete, Greece. As part of the second management component a statistical downscaling tool was used for the creation of meteorological data according to the higher and lower emission climate change scenarios A2 and B1. These data are used as input in the ANN for the forecasting of river flow for the next two decades. The final component is the application of a meteorological index on the measured and forecasted precipitation and flow data, in order to assess the severity and duration of extreme events. Copyright © 2015 Elsevier Ltd. All rights reserved.
A spatial scan statistic for survival data based on Weibull distribution.
Bhatt, Vijaya; Tiwari, Neeraj
2014-05-20
The spatial scan statistic has been developed as a geographical cluster detection analysis tool for different types of data sets such as Bernoulli, Poisson, ordinal, normal and exponential. We propose a scan statistic for survival data based on Weibull distribution. It may also be used for other survival distributions, such as exponential, gamma, and log normal. The proposed method is applied on the survival data of tuberculosis patients for the years 2004-2005 in Nainital district of Uttarakhand, India. Simulation studies reveal that the proposed method performs well for different survival distribution functions. Copyright © 2013 John Wiley & Sons, Ltd.
Statistical characterization of handwriting characteristics using automated tools
NASA Astrophysics Data System (ADS)
Ball, Gregory R.; Srihari, Sargur N.
2011-01-01
We provide a statistical basis for reporting the results of handwriting examination by questioned document (QD) examiners. As a facet of Questioned Document (QD) examination, the analysis and reporting of handwriting examination suffers from the lack of statistical data concerning the frequency of occurrence of combinations of particular handwriting characteristics. QD examiners tend to assign probative values to specific handwriting characteristics and their combinations based entirely on the examiner's experience and power of recall. The research uses data bases of handwriting samples that are representative of the US population. Feature lists of characteristics provided by QD examiners, are used to determine as to what frequencies need to be evaluated. Algorithms are used to automatically extract those characteristics, e.g., a software tool for extracting most of the characteristics from the most common letter pair th, is functional. For each letter combination the marginal and conditional frequencies of their characteristics are evaluated. Based on statistical dependencies of the characteristics the probability of any given letter formation is computed. The resulting algorithms are incorporated into a system for writer verification known as CEDAR-FOX.
Validation tools for image segmentation
NASA Astrophysics Data System (ADS)
Padfield, Dirk; Ross, James
2009-02-01
A large variety of image analysis tasks require the segmentation of various regions in an image. For example, segmentation is required to generate accurate models of brain pathology that are important components of modern diagnosis and therapy. While the manual delineation of such structures gives accurate information, the automatic segmentation of regions such as the brain and tumors from such images greatly enhances the speed and repeatability of quantifying such structures. The ubiquitous need for such algorithms has lead to a wide range of image segmentation algorithms with various assumptions, parameters, and robustness. The evaluation of such algorithms is an important step in determining their effectiveness. Therefore, rather than developing new segmentation algorithms, we here describe validation methods for segmentation algorithms. Using similarity metrics comparing the automatic to manual segmentations, we demonstrate methods for optimizing the parameter settings for individual cases and across a collection of datasets using the Design of Experiment framework. We then employ statistical analysis methods to compare the effectiveness of various algorithms. We investigate several region-growing algorithms from the Insight Toolkit and compare their accuracy to that of a separate statistical segmentation algorithm. The segmentation algorithms are used with their optimized parameters to automatically segment the brain and tumor regions in MRI images of 10 patients. The validation tools indicate that none of the ITK algorithms studied are able to outperform with statistical significance the statistical segmentation algorithm although they perform reasonably well considering their simplicity.
Evaluation of the sustainability of contrasted pig farming systems: economy.
Ilari-Antoine, E; Bonneau, M; Klauke, T N; Gonzàlez, J; Dourmad, J Y; De Greef, K; Houwers, H W J; Fabrega, E; Zimmer, C; Hviid, M; Van der Oever, B; Edwards, S A
2014-12-01
The aim of this paper is to present an efficient tool for evaluating the economy part of the sustainability of pig farming systems. The selected tool IDEA was tested on a sample of farms from 15 contrasted systems in Europe. A statistical analysis was carried out to check the capacity of the indicators to illustrate the variability of the population and to analyze which of these indicators contributed the most towards it. The scores obtained for the farms were consistent with the reality of pig production; the variable distribution showed an important variability of the sample. The principal component analysis and cluster analysis separated the sample into five subgroups, in which the six main indicators significantly differed, which underlines the robustness of the tool. The IDEA method was proven to be easily comprehensible, requiring few initial variables and with an efficient benchmarking system; all six indicators contributed to fully describe a varied and contrasted population.
Identifying and Investigating Unexpected Response to Treatment: A Diabetes Case Study.
Ozery-Flato, Michal; Ein-Dor, Liat; Parush-Shear-Yashuv, Naama; Aharonov, Ranit; Neuvirth, Hani; Kohn, Martin S; Hu, Jianying
2016-09-01
The availability of electronic health records creates fertile ground for developing computational models of various medical conditions. We present a new approach for detecting and analyzing patients with unexpected responses to treatment, building on machine learning and statistical methodology. Given a specific patient, we compute a statistical score for the deviation of the patient's response from responses observed in other patients having similar characteristics and medication regimens. These scores are used to define cohorts of patients showing deviant responses. Statistical tests are then applied to identify clinical features that correlate with these cohorts. We implement this methodology in a tool that is designed to assist researchers in the pharmaceutical field to uncover new features associated with reduced response to a treatment. It can also aid physicians by flagging patients who are not responding to treatment as expected and hence deserve more attention. The tool provides comprehensive visualizations of the analysis results and the supporting data, both at the cohort level and at the level of individual patients. We demonstrate the utility of our methodology and tool in a population of type II diabetic patients, treated with antidiabetic drugs, and monitored by the HbA1C test.
Codifference as a practical tool to measure interdependence
NASA Astrophysics Data System (ADS)
Wyłomańska, Agnieszka; Chechkin, Aleksei; Gajda, Janusz; Sokolov, Igor M.
2015-03-01
Correlation and spectral analysis represent the standard tools to study interdependence in statistical data. However, for the stochastic processes with heavy-tailed distributions such that the variance diverges, these tools are inadequate. The heavy-tailed processes are ubiquitous in nature and finance. We here discuss codifference as a convenient measure to study statistical interdependence, and we aim to give a short introductory review of its properties. By taking different known stochastic processes as generic examples, we present explicit formulas for their codifferences. We show that for the Gaussian processes codifference is equivalent to covariance. For processes with finite variance these two measures behave similarly with time. For the processes with infinite variance the covariance does not exist, however, the codifference is relevant. We demonstrate the practical importance of the codifference by extracting this function from simulated as well as real data taken from turbulent plasma of fusion device and financial market. We conclude that the codifference serves as a convenient practical tool to study interdependence for stochastic processes with both infinite and finite variances as well.
Network meta-analysis: a technique to gather evidence from direct and indirect comparisons
2017-01-01
Systematic reviews and pairwise meta-analyses of randomized controlled trials, at the intersection of clinical medicine, epidemiology and statistics, are positioned at the top of evidence-based practice hierarchy. These are important tools to base drugs approval, clinical protocols and guidelines formulation and for decision-making. However, this traditional technique only partially yield information that clinicians, patients and policy-makers need to make informed decisions, since it usually compares only two interventions at the time. In the market, regardless the clinical condition under evaluation, usually many interventions are available and few of them have been studied in head-to-head studies. This scenario precludes conclusions to be drawn from comparisons of all interventions profile (e.g. efficacy and safety). The recent development and introduction of a new technique – usually referred as network meta-analysis, indirect meta-analysis, multiple or mixed treatment comparisons – has allowed the estimation of metrics for all possible comparisons in the same model, simultaneously gathering direct and indirect evidence. Over the last years this statistical tool has matured as technique with models available for all types of raw data, producing different pooled effect measures, using both Frequentist and Bayesian frameworks, with different software packages. However, the conduction, report and interpretation of network meta-analysis still poses multiple challenges that should be carefully considered, especially because this technique inherits all assumptions from pairwise meta-analysis but with increased complexity. Thus, we aim to provide a basic explanation of network meta-analysis conduction, highlighting its risks and benefits for evidence-based practice, including information on statistical methods evolution, assumptions and steps for performing the analysis. PMID:28503228
medplot: a web application for dynamic summary and analysis of longitudinal medical data based on R.
Ahlin, Črt; Stupica, Daša; Strle, Franc; Lusa, Lara
2015-01-01
In biomedical studies the patients are often evaluated numerous times and a large number of variables are recorded at each time-point. Data entry and manipulation of longitudinal data can be performed using spreadsheet programs, which usually include some data plotting and analysis capabilities and are straightforward to use, but are not designed for the analyses of complex longitudinal data. Specialized statistical software offers more flexibility and capabilities, but first time users with biomedical background often find its use difficult. We developed medplot, an interactive web application that simplifies the exploration and analysis of longitudinal data. The application can be used to summarize, visualize and analyze data by researchers that are not familiar with statistical programs and whose knowledge of statistics is limited. The summary tools produce publication-ready tables and graphs. The analysis tools include features that are seldom available in spreadsheet software, such as correction for multiple testing, repeated measurement analyses and flexible non-linear modeling of the association of the numerical variables with the outcome. medplot is freely available and open source, it has an intuitive graphical user interface (GUI), it is accessible via the Internet and can be used within a web browser, without the need for installing and maintaining programs locally on the user's computer. This paper describes the application and gives detailed examples describing how to use the application on real data from a clinical study including patients with early Lyme borreliosis.
GEOquery: a bridge between the Gene Expression Omnibus (GEO) and BioConductor.
Davis, Sean; Meltzer, Paul S
2007-07-15
Microarray technology has become a standard molecular biology tool. Experimental data have been generated on a huge number of organisms, tissue types, treatment conditions and disease states. The Gene Expression Omnibus (Barrett et al., 2005), developed by the National Center for Bioinformatics (NCBI) at the National Institutes of Health is a repository of nearly 140,000 gene expression experiments. The BioConductor project (Gentleman et al., 2004) is an open-source and open-development software project built in the R statistical programming environment (R Development core Team, 2005) for the analysis and comprehension of genomic data. The tools contained in the BioConductor project represent many state-of-the-art methods for the analysis of microarray and genomics data. We have developed a software tool that allows access to the wealth of information within GEO directly from BioConductor, eliminating many the formatting and parsing problems that have made such analyses labor-intensive in the past. The software, called GEOquery, effectively establishes a bridge between GEO and BioConductor. Easy access to GEO data from BioConductor will likely lead to new analyses of GEO data using novel and rigorous statistical and bioinformatic tools. Facilitating analyses and meta-analyses of microarray data will increase the efficiency with which biologically important conclusions can be drawn from published genomic data. GEOquery is available as part of the BioConductor project.
NASA Astrophysics Data System (ADS)
Soltani, E.; Shahali, H.; Zarepour, H.
2011-01-01
In this paper, the effect of machining parameters, namely, lubricant emulsion percentage and tool material on surface roughness has been studied in machining process of EN-AC 48000 aluminum alloy. EN-AC 48000 aluminum alloy is an important alloy in industries. Machining of this alloy is of vital importance due to built-up edge and tool wear. A L9 Taguchi standard orthogonal array has been applied as experimental design to investigate the effect of the factors and their interaction. Nine machining tests have been carried out with three random replications resulting in 27 experiments. Three type of cutting tools including coated carbide (CD1810), uncoated carbide (H10), and polycrystalline diamond (CD10) have been used in this research. Emulsion percentage of lubricant is selected at three levels including 3%, 5% and 10%. Statistical analysis has been employed to study the effect of factors and their interactions using ANOVA method. Moreover, the optimal factors level has been achieved through signal to noise ratio (S/N) analysis. Also, a regression model has been provided to predict the surface roughness. Finally, the results of the confirmation tests have been presented to verify the adequacy of the predictive model. In this research, surface quality was improved by 9% using lubricant and statistical optimization method.
History of water quality parameters - a study on the Sinos River/Brazil.
Konzen, G B; Figueiredo, J A S; Quevedo, D M
2015-05-01
Water is increasingly becoming a valuable resource, constituting one of the central themes of environmental, economic and social discussions. The Sinos River, located in southern Brazil, is the main river from the Sinos River Basin, representing a source of drinking water supply for a highly populated region. Considering its size and importance, it becomes necessary to conduct a study to follow up the water quality of this river, which is considered by some experts as one of the most polluted rivers in Brazil. As for this study, its great importance lies in the historical analysis of indicators. In this sense, we sought to develop aspects related to the management of water resources by performing a historical analysis of the Water Quality Index (WQI) of the Sinos River, using statistical methods. With regard to the methodological procedures, it should be pointed out that this study performs a time analysis of monitoring data on parameters related to a punctual measurement that is variable in time, using statistical tools. The data used refer to analyses of the water quality of the Sinos River (WQI) from the State Environmental Protection Agency Henrique Luiz Roessler (Fundação Estadual de Proteção Ambiental Henrique Luiz Roessler, FEPAM) covering the period between 2000 and 2008, as well as to a theoretical analysis focusing on the management of water resources. The study of WQI and its parameters by statistical analysis has shown to be effective, ensuring its effectiveness as a tool for the management of water resources. The descriptive analysis of the WQI and its parameters showed that the water quality of the Sinos River is concerning low, which reaffirms that it is one of the most polluted rivers in Brazil. It should be highlighted that there was an overall difficulty in obtaining data with the appropriate periodicity, as well as a long complete series, which limited the conduction of statistical studies such as the present one.
Valid Statistical Analysis for Logistic Regression with Multiple Sources
NASA Astrophysics Data System (ADS)
Fienberg, Stephen E.; Nardi, Yuval; Slavković, Aleksandra B.
Considerable effort has gone into understanding issues of privacy protection of individual information in single databases, and various solutions have been proposed depending on the nature of the data, the ways in which the database will be used and the precise nature of the privacy protection being offered. Once data are merged across sources, however, the nature of the problem becomes far more complex and a number of privacy issues arise for the linked individual files that go well beyond those that are considered with regard to the data within individual sources. In the paper, we propose an approach that gives full statistical analysis on the combined database without actually combining it. We focus mainly on logistic regression, but the method and tools described may be applied essentially to other statistical models as well.
Dynamic Hurricane Data Analysis Tool
NASA Technical Reports Server (NTRS)
Knosp, Brian W.; Li, Peggy; Vu, Quoc A.
2009-01-01
A dynamic hurricane data analysis tool allows users of the JPL Tropical Cyclone Information System (TCIS) to analyze data over a Web medium. The TCIS software is described in the previous article, Tropical Cyclone Information System (TCIS) (NPO-45748). This tool interfaces with the TCIS database to pull in data from several different atmospheric and oceanic data sets, both observed by instruments. Users can use this information to generate histograms, maps, and profile plots for specific storms. The tool also displays statistical values for the user-selected parameter for the mean, standard deviation, median, minimum, and maximum values. There is little wait time, allowing for fast data plots over date and spatial ranges. Users may also zoom-in for a closer look at a particular spatial range. This is version 1 of the software. Researchers will use the data and tools on the TCIS to understand hurricane processes, improve hurricane forecast models and identify what types of measurements the next generation of instruments will need to collect.
TU-FG-201-05: Varian MPC as a Statistical Process Control Tool
DOE Office of Scientific and Technical Information (OSTI.GOV)
Carver, A; Rowbottom, C
Purpose: Quality assurance in radiotherapy requires the measurement of various machine parameters to ensure they remain within permitted values over time. In Truebeam release 2.0 the Machine Performance Check (MPC) was released allowing beam output and machine axis movements to be assessed in a single test. We aim to evaluate the Varian Machine Performance Check (MPC) as a tool for Statistical Process Control (SPC). Methods: Varian’s MPC tool was used on three Truebeam and one EDGE linac for a period of approximately one year. MPC was commissioned against independent systems. After this period the data were reviewed to determine whethermore » or not the MPC was useful as a process control tool. Analyses on individual tests were analysed using Shewhart control plots, using Matlab for analysis. Principal component analysis was used to determine if a multivariate model was of any benefit in analysing the data. Results: Control charts were found to be useful to detect beam output changes, worn T-nuts and jaw calibration issues. Upper and lower control limits were defined at the 95% level. Multivariate SPC was performed using Principal Component Analysis. We found little evidence of clustering beyond that which might be naively expected such as beam uniformity and beam output. Whilst this makes multivariate analysis of little use it suggests that each test is giving independent information. Conclusion: The variety of independent parameters tested in MPC makes it a sensitive tool for routine machine QA. We have determined that using control charts in our QA programme would rapidly detect changes in machine performance. The use of control charts allows large quantities of tests to be performed on all linacs without visual inspection of all results. The use of control limits alerts users when data are inconsistent with previous measurements before they become out of specification. A. Carver has received a speaker’s honorarium from Varian.« less
Estimating population diversity with CatchAll
USDA-ARS?s Scientific Manuscript database
The massive quantity of data produced by next-generation sequencing has created a pressing need for advanced statistical tools, in particular for analysis of bacterial and phage communities. Here we address estimating the total diversity in a population – the species richness. This is an important s...
Increasing Army Supply Chain Performance: Using an Integrated End to End Metrics System
2017-01-01
Sched Deliver Sched Delinquent Contracts Current Metrics PQDR/SDRs Forecasting Accuracy Reliability Demand Management Asset Mgmt Strategies Pipeline...are identified and characterized by statistical analysis. The study proposed a framework and tool for inventory management based on factors such as
Development of a Relay Performance Web Tool for the Mars Network
NASA Technical Reports Server (NTRS)
Allard, Daniel A.; Edwards, Charles D.
2009-01-01
Modern Mars surface missions rely upon orbiting spacecraft to relay communications to and from Earth systems. An important component of this multi-mission relay process is the collection of relay performance statistics supporting strategic trend analysis and tactical anomaly identification and tracking.
SOCR Analyses – an Instructional Java Web-based Statistical Analysis Toolkit
Chu, Annie; Cui, Jenny; Dinov, Ivo D.
2011-01-01
The Statistical Online Computational Resource (SOCR) designs web-based tools for educational use in a variety of undergraduate courses (Dinov 2006). Several studies have demonstrated that these resources significantly improve students' motivation and learning experiences (Dinov et al. 2008). SOCR Analyses is a new component that concentrates on data modeling and analysis using parametric and non-parametric techniques supported with graphical model diagnostics. Currently implemented analyses include commonly used models in undergraduate statistics courses like linear models (Simple Linear Regression, Multiple Linear Regression, One-Way and Two-Way ANOVA). In addition, we implemented tests for sample comparisons, such as t-test in the parametric category; and Wilcoxon rank sum test, Kruskal-Wallis test, Friedman's test, in the non-parametric category. SOCR Analyses also include several hypothesis test models, such as Contingency tables, Friedman's test and Fisher's exact test. The code itself is open source (http://socr.googlecode.com/), hoping to contribute to the efforts of the statistical computing community. The code includes functionality for each specific analysis model and it has general utilities that can be applied in various statistical computing tasks. For example, concrete methods with API (Application Programming Interface) have been implemented in statistical summary, least square solutions of general linear models, rank calculations, etc. HTML interfaces, tutorials, source code, activities, and data are freely available via the web (www.SOCR.ucla.edu). Code examples for developers and demos for educators are provided on the SOCR Wiki website. In this article, the pedagogical utilization of the SOCR Analyses is discussed, as well as the underlying design framework. As the SOCR project is on-going and more functions and tools are being added to it, these resources are constantly improved. The reader is strongly encouraged to check the SOCR site for most updated information and newly added models. PMID:21546994
NASA Astrophysics Data System (ADS)
Hilliard, Antony
Energy Monitoring and Targeting is a well-established business process that develops information about utility energy consumption in a business or institution. While M&T has persisted as a worthwhile energy conservation support activity, it has not been widely adopted. This dissertation explains M&T challenges in terms of diagnosing and controlling energy consumption, informed by a naturalistic field study of M&T work. A Cognitive Work Analysis of M&T identifies structures that diagnosis can search, information flows un-supported in canonical support tools, and opportunities to extend the most popular tool for MM&T: Cumulative Sum of Residuals (CUSUM) charts. A design application outlines how CUSUM charts were augmented with a more contemporary statistical change detection strategy, Recursive Parameter Estimates, modified to better suit the M&T task using Representation Aiding principles. The design was experimentally evaluated in a controlled M&T synthetic task, and was shown to significantly improve diagnosis performance.
Egorova, K.S.; Kondakova, A.N.; Toukach, Ph.V.
2015-01-01
Carbohydrates are biological blocks participating in diverse and crucial processes both at cellular and organism levels. They protect individual cells, establish intracellular interactions, take part in the immune reaction and participate in many other processes. Glycosylation is considered as one of the most important modifications of proteins and other biologically active molecules. Still, the data on the enzymatic machinery involved in the carbohydrate synthesis and processing are scattered, and the advance on its study is hindered by the vast bulk of accumulated genetic information not supported by any experimental evidences for functions of proteins that are encoded by these genes. In this article, we present novel instruments for statistical analysis of glycomes in taxa. These tools may be helpful for investigating carbohydrate-related enzymatic activities in various groups of organisms and for comparison of their carbohydrate content. The instruments are developed on the Carbohydrate Structure Database (CSDB) platform and are available freely on the CSDB web-site at http://csdb.glycoscience.ru. Database URL: http://csdb.glycoscience.ru PMID:26337239
The application of data mining techniques to oral cancer prognosis.
Tseng, Wan-Ting; Chiang, Wei-Fan; Liu, Shyun-Yeu; Roan, Jinsheng; Lin, Chun-Nan
2015-05-01
This study adopted an integrated procedure that combines the clustering and classification features of data mining technology to determine the differences between the symptoms shown in past cases where patients died from or survived oral cancer. Two data mining tools, namely decision tree and artificial neural network, were used to analyze the historical cases of oral cancer, and their performance was compared with that of logistic regression, the popular statistical analysis tool. Both decision tree and artificial neural network models showed superiority to the traditional statistical model. However, as to clinician, the trees created by the decision tree models are relatively easier to interpret compared to that of the artificial neural network models. Cluster analysis also discovers that those stage 4 patients whose also possess the following four characteristics are having an extremely low survival rate: pN is N2b, level of RLNM is level I-III, AJCC-T is T4, and cells mutate situation (G) is moderate.
Quantitative analysis of diffusion tensor orientation: theoretical framework.
Wu, Yu-Chien; Field, Aaron S; Chung, Moo K; Badie, Benham; Alexander, Andrew L
2004-11-01
Diffusion-tensor MRI (DT-MRI) yields information about the magnitude, anisotropy, and orientation of water diffusion of brain tissues. Although white matter tractography and eigenvector color maps provide visually appealing displays of white matter tract organization, they do not easily lend themselves to quantitative and statistical analysis. In this study, a set of visual and quantitative tools for the investigation of tensor orientations in the human brain was developed. Visual tools included rose diagrams, which are spherical coordinate histograms of the major eigenvector directions, and 3D scatterplots of the major eigenvector angles. A scatter matrix of major eigenvector directions was used to describe the distribution of major eigenvectors in a defined anatomic region. A measure of eigenvector dispersion was developed to describe the degree of eigenvector coherence in the selected region. These tools were used to evaluate directional organization and the interhemispheric symmetry of DT-MRI data in five healthy human brains and two patients with infiltrative diseases of the white matter tracts. In normal anatomical white matter tracts, a high degree of directional coherence and interhemispheric symmetry was observed. The infiltrative diseases appeared to alter the eigenvector properties of affected white matter tracts, showing decreased eigenvector coherence and interhemispheric symmetry. This novel approach distills the rich, 3D information available from the diffusion tensor into a form that lends itself to quantitative analysis and statistical hypothesis testing. (c) 2004 Wiley-Liss, Inc.
Genetic Epidemiology of Glucose-6-Dehydrogenase Deficiency in the Arab World.
Doss, C George Priya; Alasmar, Dima R; Bux, Reem I; Sneha, P; Bakhsh, Fadheela Dad; Al-Azwani, Iman; Bekay, Rajaa El; Zayed, Hatem
2016-11-17
A systematic search was implemented using four literature databases (PubMed, Embase, Science Direct and Web of Science) to capture all the causative mutations of Glucose-6-phosphate dehydrogenase (G6PD) deficiency (G6PDD) in the 22 Arab countries. Our search yielded 43 studies that captured 33 mutations (23 missense, one silent, two deletions, and seven intronic mutations), in 3,430 Arab patients with G6PDD. The 23 missense mutations were then subjected to phenotypic classification using in silico prediction tools, which were compared to the WHO pathogenicity scale as a reference. These in silico tools were tested for their predicting efficiency using rigorous statistical analyses. Of the 23 missense mutations, p.S188F, p.I48T, p.N126D, and p.V68M, were identified as the most common mutations among Arab populations, but were not unique to the Arab world, interestingly, our search strategy found four other mutations (p.N135T, p.S179N, p.R246L, and p.Q307P) that are unique to Arabs. These mutations were exposed to structural analysis and molecular dynamics simulation analysis (MDSA), which predicting these mutant forms as potentially affect the enzyme function. The combination of the MDSA, structural analysis, and in silico predictions and statistical tools we used will provide a platform for future prediction accuracy for the pathogenicity of genetic mutations.
Krystkowiak, Izabella; Manguy, Jean; Davey, Norman E
2018-06-05
There is a pressing need for in silico tools that can aid in the identification of the complete repertoire of protein binding (SLiMs, MoRFs, miniMotifs) and modification (moiety attachment/removal, isomerization, cleavage) motifs. We have created PSSMSearch, an interactive web-based tool for rapid statistical modeling, visualization, discovery and annotation of protein motif specificity determinants to discover novel motifs in a proteome-wide manner. PSSMSearch analyses proteomes for regions with significant similarity to a motif specificity determinant model built from a set of aligned motif-containing peptides. Multiple scoring methods are available to build a position-specific scoring matrix (PSSM) describing the motif specificity determinant model. This model can then be modified by a user to add prior knowledge of specificity determinants through an interactive PSSM heatmap. PSSMSearch includes a statistical framework to calculate the significance of specificity determinant model matches against a proteome of interest. PSSMSearch also includes the SLiMSearch framework's annotation, motif functional analysis and filtering tools to highlight relevant discriminatory information. Additional tools to annotate statistically significant shared keywords and GO terms, or experimental evidence of interaction with a motif-recognizing protein have been added. Finally, PSSM-based conservation metrics have been created for taxonomic range analyses. The PSSMSearch web server is available at http://slim.ucd.ie/pssmsearch/.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Teuton, Jeremy R.; Griswold, Richard L.; Mehdi, Beata L.
Precise analysis of both (S)TEM images and video are time and labor intensive processes. As an example, determining when crystal growth and shrinkage occurs during the dynamic process of Li dendrite deposition and stripping involves manually scanning through each frame in the video to extract a specific set of frames/images. For large numbers of images, this process can be very time consuming, so a fast and accurate automated method is desirable. Given this need, we developed software that uses analysis of video compression statistics for detecting and characterizing events in large data sets. This software works by converting the datamore » into a series of images which it compresses into an MPEG-2 video using the open source “avconv” utility [1]. The software does not use the video itself, but rather analyzes the video statistics from the first pass of the video encoding that avconv records in the log file. This file contains statistics for each frame of the video including the frame quality, intra-texture and predicted texture bits, forward and backward motion vector resolution, among others. In all, avconv records 15 statistics for each frame. By combining different statistics, we have been able to detect events in various types of data. We have developed an interactive tool for exploring the data and the statistics that aids the analyst in selecting useful statistics for each analysis. Going forward, an algorithm for detecting and possibly describing events automatically can be written based on statistic(s) for each data type.« less
Statistical methods for the forensic analysis of striated tool marks
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hoeksema, Amy Beth
In forensics, fingerprints can be used to uniquely identify suspects in a crime. Similarly, a tool mark left at a crime scene can be used to identify the tool that was used. However, the current practice of identifying matching tool marks involves visual inspection of marks by forensic experts which can be a very subjective process. As a result, declared matches are often successfully challenged in court, so law enforcement agencies are particularly interested in encouraging research in more objective approaches. Our analysis is based on comparisons of profilometry data, essentially depth contours of a tool mark surface taken alongmore » a linear path. In current practice, for stronger support of a match or non-match, multiple marks are made in the lab under the same conditions by the suspect tool. We propose the use of a likelihood ratio test to analyze the difference between a sample of comparisons of lab tool marks to a field tool mark, against a sample of comparisons of two lab tool marks. Chumbley et al. (2010) point out that the angle of incidence between the tool and the marked surface can have a substantial impact on the tool mark and on the effectiveness of both manual and algorithmic matching procedures. To better address this problem, we describe how the analysis can be enhanced to model the effect of tool angle and allow for angle estimation for a tool mark left at a crime scene. With sufficient development, such methods may lead to more defensible forensic analyses.« less
... Doing AMIGAS Stay Informed Cancer Home Uterine Cancer Statistics Language: English (US) Español (Spanish) Recommend on Facebook ... the most commonly diagnosed gynecologic cancer. U.S. Cancer Statistics Data Visualizations Tool The Data Visualizations tool makes ...
NASA Astrophysics Data System (ADS)
Kanniyappan, Udayakumar; Gnanatheepaminstein, Einstein; Prakasarao, Aruna; Dornadula, Koteeswaran; Singaravelu, Ganesan
2017-02-01
Cancer is one of the most common human threats around the world and diagnosis based on optical spectroscopy especially fluorescence technique has been established as the standard approach among scientist to explore the biochemical and morphological changes in tissues. In this regard, the present work aims to extract spectral signatures of the various fluorophores present in oral tissues using parallel factor analysis (PARAFAC). Subsequently, the statistical analysis also to be performed to show its diagnostic potential in distinguishing malignant, premalignant from normal oral tissues. Hence, the present study may lead to the possible and/or alternative tool for oral cancer diagnosis.
Data Analysis with Graphical Models: Software Tools
NASA Technical Reports Server (NTRS)
Buntine, Wray L.
1994-01-01
Probabilistic graphical models (directed and undirected Markov fields, and combined in chain graphs) are used widely in expert systems, image processing and other areas as a framework for representing and reasoning with probabilities. They come with corresponding algorithms for performing probabilistic inference. This paper discusses an extension to these models by Spiegelhalter and Gilks, plates, used to graphically model the notion of a sample. This offers a graphical specification language for representing data analysis problems. When combined with general methods for statistical inference, this also offers a unifying framework for prototyping and/or generating data analysis algorithms from graphical specifications. This paper outlines the framework and then presents some basic tools for the task: a graphical version of the Pitman-Koopman Theorem for the exponential family, problem decomposition, and the calculation of exact Bayes factors. Other tools already developed, such as automatic differentiation, Gibbs sampling, and use of the EM algorithm, make this a broad basis for the generation of data analysis software.
Visual Data Analysis for Satellites
NASA Technical Reports Server (NTRS)
Lau, Yee; Bhate, Sachin; Fitzpatrick, Patrick
2008-01-01
The Visual Data Analysis Package is a collection of programs and scripts that facilitate visual analysis of data available from NASA and NOAA satellites, as well as dropsonde, buoy, and conventional in-situ observations. The package features utilities for data extraction, data quality control, statistical analysis, and data visualization. The Hierarchical Data Format (HDF) satellite data extraction routines from NASA's Jet Propulsion Laboratory were customized for specific spatial coverage and file input/output. Statistical analysis includes the calculation of the relative error, the absolute error, and the root mean square error. Other capabilities include curve fitting through the data points to fill in missing data points between satellite passes or where clouds obscure satellite data. For data visualization, the software provides customizable Generic Mapping Tool (GMT) scripts to generate difference maps, scatter plots, line plots, vector plots, histograms, timeseries, and color fill images.
NASA Astrophysics Data System (ADS)
Di Lorenzo, R.; Ingarao, G.; Fonti, V.
2007-05-01
The crucial task in the prevention of ductile fracture is the availability of a tool for the prediction of such defect occurrence. The technical literature presents a wide investigation on this topic and many contributions have been given by many authors following different approaches. The main class of approaches regards the development of fracture criteria: generally, such criteria are expressed by determining a critical value of a damage function which depends on stress and strain paths: ductile fracture is assumed to occur when such critical value is reached during the analysed process. There is a relevant drawback related to the utilization of ductile fracture criteria; in fact each criterion usually has good performances in the prediction of fracture for particular stress - strain paths, i.e. it works very well for certain processes but may provide no good results for other processes. On the other hand, the approaches based on damage mechanics formulation are very effective from a theoretical point of view but they are very complex and their proper calibration is quite difficult. In this paper, two different approaches are investigated to predict fracture occurrence in cold forming operations. The final aim of the proposed method is the achievement of a tool which has a general reliability i.e. it is able to predict fracture for different forming processes. The proposed approach represents a step forward within a research project focused on the utilization of innovative predictive tools for ductile fracture. The paper presents a comparison between an artificial neural network design procedure and an approach based on statistical tools; both the approaches were aimed to predict fracture occurrence/absence basing on a set of stress and strain paths data. The proposed approach is based on the utilization of experimental data available, for a given material, on fracture occurrence in different processes. More in detail, the approach consists in the analysis of experimental tests in which fracture occurs followed by the numerical simulations of such processes in order to track the stress-strain paths in the workpiece region where fracture is expected. Such data are utilized to build up a proper data set which was utilized both to train an artificial neural network and to perform a statistical analysis aimed to predict fracture occurrence. The developed statistical tool is properly designed and optimized and is able to recognize the fracture occurrence. The reliability and predictive capability of the statistical method were compared with the ones obtained from an artificial neural network developed to predict fracture occurrence. Moreover, the approach is validated also in forming processes characterized by a complex fracture mechanics.
Comparative analysis on the selection of number of clusters in community detection
NASA Astrophysics Data System (ADS)
Kawamoto, Tatsuro; Kabashima, Yoshiyuki
2018-02-01
We conduct a comparative analysis on various estimates of the number of clusters in community detection. An exhaustive comparison requires testing of all possible combinations of frameworks, algorithms, and assessment criteria. In this paper we focus on the framework based on a stochastic block model, and investigate the performance of greedy algorithms, statistical inference, and spectral methods. For the assessment criteria, we consider modularity, map equation, Bethe free energy, prediction errors, and isolated eigenvalues. From the analysis, the tendency of overfit and underfit that the assessment criteria and algorithms have becomes apparent. In addition, we propose that the alluvial diagram is a suitable tool to visualize statistical inference results and can be useful to determine the number of clusters.
Statistical analysis and interpolation of compositional data in materials science.
Pesenson, Misha Z; Suram, Santosh K; Gregoire, John M
2015-02-09
Compositional data are ubiquitous in chemistry and materials science: analysis of elements in multicomponent systems, combinatorial problems, etc., lead to data that are non-negative and sum to a constant (for example, atomic concentrations). The constant sum constraint restricts the sampling space to a simplex instead of the usual Euclidean space. Since statistical measures such as mean and standard deviation are defined for the Euclidean space, traditional correlation studies, multivariate analysis, and hypothesis testing may lead to erroneous dependencies and incorrect inferences when applied to compositional data. Furthermore, composition measurements that are used for data analytics may not include all of the elements contained in the material; that is, the measurements may be subcompositions of a higher-dimensional parent composition. Physically meaningful statistical analysis must yield results that are invariant under the number of composition elements, requiring the application of specialized statistical tools. We present specifics and subtleties of compositional data processing through discussion of illustrative examples. We introduce basic concepts, terminology, and methods required for the analysis of compositional data and utilize them for the spatial interpolation of composition in a sputtered thin film. The results demonstrate the importance of this mathematical framework for compositional data analysis (CDA) in the fields of materials science and chemistry.
Use of a Self-Reflection Tool to Enhance Resident Learning on an Adolescent Medicine Rotation.
Greenberg, Katherine Blumoff; Baldwin, Constance
2016-08-01
Adolescent Medicine (AM) educators in pediatric residency programs are seeking new ways to engage learners in adolescent health. This mixed-methods study presents a novel self-reflection tool and addresses whether self-reflection enhanced residents' perception of the value of an adolescent rotation, in particular, its relevance to their future practice. The self-reflection tool included 17 Likert scale items on residents' comfort with the essential tasks of adolescent care and open-ended questions that promoted self-reflection and goal setting. Semi-structured, postrotation interviews encouraged residents to discuss their experiences. Likert scale data were analyzed using descriptive statistics, and interview notes and written comments on the self-reflection tool were combined for qualitative data analysis. Residents' pre-to post-self-evaluations showed statistically significant increases in comfort with most of the adolescent health care tasks. Four major themes emerged from our qualitative analysis: (1) the value of observing skilled attendings as role models; (2) the comfort gained through broad and frequent adolescent care experiences; (3) the career relevance of AM; and (4) the ability to set personally meaningful goals for the rotation. Residents used the self-reflection tool to mindfully set goals and found their AM education valuable and relevant to their future careers. Our tool helped make explicit to residents the norms, values, and beliefs of the hidden curriculum applied to the care of adolescents and helped them to improve the self-assessed quality of their rapport and communications with adolescents. We conclude that a structured self-reflection exercise can enhance residents' experiences on an AM rotation. Copyright © 2016 Society for Adolescent Health and Medicine. Published by Elsevier Inc. All rights reserved.
Dugas, Martin; Dugas-Breit, Susanne
2014-01-01
Design, execution and analysis of clinical studies involves several stakeholders with different professional backgrounds. Typically, principle investigators are familiar with standard office tools, data managers apply electronic data capture (EDC) systems and statisticians work with statistics software. Case report forms (CRFs) specify the data model of study subjects, evolve over time and consist of hundreds to thousands of data items per study. To avoid erroneous manual transformation work, a converting tool for different representations of study data models was designed. It can convert between office format, EDC and statistics format. In addition, it supports semantic annotations, which enable precise definitions for data items. A reference implementation is available as open source package ODMconverter at http://cran.r-project.org.
Analysis of the 1996 Wisconsin forest statistics by habitat type.
John Kotar; Joseph A. Kovach; Gary Brand
1999-01-01
The fifth inventory of Wisconsin's forests is presented from the perspective of habitat type as a classification tool. Habitat type classifies forests based on the species composition of the understory plant community. Various forest attributes are summarized by habitat type and management implications are discussed.
The Stratification Analysis of Sediment Data for Lake Michigan
This research paper describes the development of spatial statistical tools that are applied to investigate the spatial trends of sediment data sets for nutrients and carbon in Lake Michigan. All of the sediment data utilized in the present study was collected over a two year per...
Statistical Performances of Resistive Active Power Splitter
NASA Astrophysics Data System (ADS)
Lalléchère, Sébastien; Ravelo, Blaise; Thakur, Atul
2016-03-01
In this paper, the synthesis and sensitivity analysis of an active power splitter (PWS) is proposed. It is based on the active cell composed of a Field Effect Transistor in cascade with shunted resistor at the input and the output (resistive amplifier topology). The PWS uncertainty versus resistance tolerances is suggested by using stochastic method. Furthermore, with the proposed topology, we can control easily the device gain while varying a resistance. This provides useful tool to analyse the statistical sensitivity of the system in uncertain environment.
Choi, Edmond P H; Chin, Weng Yee; Wan, Eric Y F; Lam, Cindy L K
2016-05-01
To examine the internal and external responsiveness of the Pressure Ulcer Scale for Healing (PUSH) tool for assessing the healing progress in acute and chronic wounds. It is important to establish the responsiveness of instruments used in conducting wound care assessments to ensure that they are able to capture changes in wound healing accurately over time. Prospective longitudinal observational study. The key study instrument was the PUSH tool. Internal responsiveness was assessed using paired t-testing and effect size statistics. External responsiveness was assessed using multiple linear regression. All new patients with at least one eligible acute or chronic wound, enrolled in the Nurse and Allied Health Clinic-Wound Care programme between 1 December 2012 - 31 March 2013 were included for analysis (N = 541). Overall, the PUSH tool was able to detect statistically significant changes in wound healing between baseline and discharge. The effect size statistics were large. The internal responsiveness of the PUSH tool was confirmed in patients with a variety of different wound types including venous ulcers, pressure ulcers, neuropathic ulcers, burns and scalds, skin tears, surgical wounds and traumatic wounds. After controlling for age, gender and wound type, subjects in the 'wound improved but not healed' group had a smaller change in PUSH scores than those in the 'wound healed' group. Subjects in the 'wound static or worsened' group had the smallest change in PUSH scores. The external responsiveness was confirmed. The internal and external responsiveness of the PUSH tool confirmed that it can be used to track the healing progress of both acute and chronic wounds. © 2016 The Authors. Journal of Advanced Nursing Published by John Wiley & Sons Ltd.
Estimating the Regional Economic Significance of Airports
1992-09-01
following three options for estimating induced impacts: the economic base model , an econometric model , and a regional input-output model . One approach to...limitations, however, the economic base model has been widely used for regional economic analysis. A second approach is to develop an econometric model of...analysis is the principal statistical tool used to estimate the economic relationships. Regional econometric models are capable of estimating a single
Evaluation of Lightning Incidence to Elements of a Complex Structure: A Monte Carlo Approach
NASA Technical Reports Server (NTRS)
Mata, Carlos T.; Rakov, V. A.
2008-01-01
There are complex structures for which the installation and positioning of the lightning protection system (LPS) cannot be done using the lightning protection standard guidelines. As a result, there are some "unprotected" or "exposed" areas. In an effort to quantify the lightning threat to these areas, a Monte Carlo statistical tool has been developed. This statistical tool uses two random number generators: a uniform distribution to generate origins of downward propagating leaders and a lognormal distribution to generate returns stroke peak currents. Downward leaders propagate vertically downward and their striking distances are defined by the polarity and peak current. Following the electrogeometrical concept, we assume that the leader attaches to the closest object within its striking distance. The statistical analysis is run for 10,000 years with an assumed ground flash density and peak current distributions, and the output of the program is the probability of direct attachment to objects of interest with its corresponding peak current distribution.
Evaluation of Lightning Incidence to Elements of a Complex Structure: A Monte Carlo Approach
NASA Technical Reports Server (NTRS)
Mata, Carlos T.; Rakov, V. A.
2008-01-01
There are complex structures for which the installation and positioning of the lightning protection system (LPS) cannot be done using the lightning protection standard guidelines. As a result, there are some "unprotected" or "exposed" areas. In an effort to quantify the lightning threat to these areas, a Monte Carlo statistical tool has been developed. This statistical tool uses two random number generators: a uniform distribution to generate the origin of downward propagating leaders and a lognormal distribution to generate the corresponding returns stroke peak currents. Downward leaders propagate vertically downward and their striking distances are defined by the polarity and peak current. Following the electrogeometrical concept, we assume that the leader attaches to the closest object within its striking distance. The statistical analysis is run for N number of years with an assumed ground flash density and the output of the program is the probability of direct attachment to objects of interest with its corresponding peak current distribution.
mcaGUI: microbial community analysis R-Graphical User Interface (GUI).
Copeland, Wade K; Krishnan, Vandhana; Beck, Daniel; Settles, Matt; Foster, James A; Cho, Kyu-Chul; Day, Mitch; Hickey, Roxana; Schütte, Ursel M E; Zhou, Xia; Williams, Christopher J; Forney, Larry J; Abdo, Zaid
2012-08-15
Microbial communities have an important role in natural ecosystems and have an impact on animal and human health. Intuitive graphic and analytical tools that can facilitate the study of these communities are in short supply. This article introduces Microbial Community Analysis GUI, a graphical user interface (GUI) for the R-programming language (R Development Core Team, 2010). With this application, researchers can input aligned and clustered sequence data to create custom abundance tables and perform analyses specific to their needs. This GUI provides a flexible modular platform, expandable to include other statistical tools for microbial community analysis in the future. The mcaGUI package and source are freely available as part of Bionconductor at http://www.bioconductor.org/packages/release/bioc/html/mcaGUI.html
A quality improvement management model for renal care.
Vlchek, D L; Day, L M
1991-04-01
The purpose of this article is to explore the potential for applying the theory and tools of quality improvement (total quality management) in the renal care setting. We believe that the coupling of the statistical techniques used in the Deming method of quality improvement, with modern approaches to outcome and process analysis, will provide the renal care community with powerful tools, not only for improved quality (i.e., reduced morbidity and mortality), but also for technology evaluation and resource allocation.
Modeling and replicating statistical topology and evidence for CMB nonhomogeneity
Agami, Sarit
2017-01-01
Under the banner of “big data,” the detection and classification of structure in extremely large, high-dimensional, data sets are two of the central statistical challenges of our times. Among the most intriguing new approaches to this challenge is “TDA,” or “topological data analysis,” one of the primary aims of which is providing nonmetric, but topologically informative, preanalyses of data which make later, more quantitative, analyses feasible. While TDA rests on strong mathematical foundations from topology, in applications, it has faced challenges due to difficulties in handling issues of statistical reliability and robustness, often leading to an inability to make scientific claims with verifiable levels of statistical confidence. We propose a methodology for the parametric representation, estimation, and replication of persistence diagrams, the main diagnostic tool of TDA. The power of the methodology lies in the fact that even if only one persistence diagram is available for analysis—the typical case for big data applications—the replications permit conventional statistical hypothesis testing. The methodology is conceptually simple and computationally practical, and provides a broadly effective statistical framework for persistence diagram TDA analysis. We demonstrate the basic ideas on a toy example, and the power of the parametric approach to TDA modeling in an analysis of cosmic microwave background (CMB) nonhomogeneity. PMID:29078301
The Objective Identification and Quantification of Interstitial Lung Abnormalities in Smokers.
Ash, Samuel Y; Harmouche, Rola; Ross, James C; Diaz, Alejandro A; Hunninghake, Gary M; Putman, Rachel K; Onieva, Jorge; Martinez, Fernando J; Choi, Augustine M; Lynch, David A; Hatabu, Hiroto; Rosas, Ivan O; Estepar, Raul San Jose; Washko, George R
2017-08-01
Previous investigation suggests that visually detected interstitial changes in the lung parenchyma of smokers are highly clinically relevant and predict outcomes, including death. Visual subjective analysis to detect these changes is time-consuming, insensitive to subtle changes, and requires training to enhance reproducibility. Objective detection of such changes could provide a method of disease identification without these limitations. The goal of this study was to develop and test a fully automated image processing tool to objectively identify radiographic features associated with interstitial abnormalities in the computed tomography scans of a large cohort of smokers. An automated tool that uses local histogram analysis combined with distance from the pleural surface was used to detect radiographic features consistent with interstitial lung abnormalities in computed tomography scans from 2257 individuals from the Genetic Epidemiology of COPD study, a longitudinal observational study of smokers. The sensitivity and specificity of this tool was determined based on its ability to detect the visually identified presence of these abnormalities. The tool had a sensitivity of 87.8% and a specificity of 57.5% for the detection of interstitial lung abnormalities, with a c-statistic of 0.82, and was 100% sensitive and 56.7% specific for the detection of the visual subtype of interstitial abnormalities called fibrotic parenchymal abnormalities, with a c-statistic of 0.89. In smokers, a fully automated image processing tool is able to identify those individuals who have interstitial lung abnormalities with moderate sensitivity and specificity. Copyright © 2017 The Association of University Radiologists. Published by Elsevier Inc. All rights reserved.
ISS Mini AERCam Radio Frequency (RF) Coverage Analysis Using iCAT Development Tool
NASA Technical Reports Server (NTRS)
Bolen, Steve; Vazquez, Luis; Sham, Catherine; Fredrickson, Steven; Fink, Patrick; Cox, Jan; Phan, Chau; Panneton, Robert
2003-01-01
The long-term goals of the National Aeronautics and Space Administration's (NASA's) Human Exploration and Development of Space (HEDS) enterprise may require the development of autonomous free-flier (FF) robotic devices to operate within the vicinity of low-Earth orbiting spacecraft to supplement human extravehicular activities (EVAs) in space. Future missions could require external visual inspection of the spacecraft that would be difficult, or dangerous, for humans to perform. Under some circumstance, it may be necessary to employ an un-tethered communications link between the FF and the users. The interactive coverage analysis tool (ICAT) is a software tool that has been developed to perform critical analysis of the communications link performance for a FF operating in the vicinity of the International Space Station (ISS) external environment. The tool allows users to interactively change multiple parameters of the communications link parameters to efficiently perform systems engineering trades on network performance. These trades can be directly translated into design and requirements specifications. This tool significantly reduces the development time in determining a communications network topology by allowing multiple parameters to be changed, and the results of link coverage to be statistically characterized and plotted interactively.
De Sio, S; Cedrone, F; Greco, E; Di Traglia, M; Sanità, D; Mandolesi, D; Stansfeld, S A
2016-01-01
Psychosocial hazards and work-related stress have reached epidemic proportions in Europe. The Italia law introduced in 2008 the obligation for Italian companies to assess work related stress risk in order to protect their workers' safety and health. The purpose of our study was to propose an accurate measurement tool, using the HSE indicator tool, for more appropriate and significant work-related stress' prevention measures. The study was conducted on 204 visual display unit (VDU) operators: 106 male and 98 female. All subjects were administered the HSE questionnaire. The sample was studied through a 4 step process, using HSE analysis tool and a statistical analysis, based on the odds ratio calculation. The assessment model used demonstrated the presence of work related stress in VDU operators and additional "critical" aspects which had failed to emerge by the classical use of HSE analysis tool. The approach we propose allows to obtain a complete picture of the perception of work-related stress and can point out the most appropriate corrective actions.
MutAIT: an online genetic toxicology data portal and analysis tools.
Avancini, Daniele; Menzies, Georgina E; Morgan, Claire; Wills, John; Johnson, George E; White, Paul A; Lewis, Paul D
2016-05-01
Assessment of genetic toxicity and/or carcinogenic activity is an essential element of chemical screening programs employed to protect human health. Dose-response and gene mutation data are frequently analysed by industry, academia and governmental agencies for regulatory evaluations and decision making. Over the years, a number of efforts at different institutions have led to the creation and curation of databases to house genetic toxicology data, largely, with the aim of providing public access to facilitate research and regulatory assessments. This article provides a brief introduction to a new genetic toxicology portal called Mutation Analysis Informatics Tools (MutAIT) (www.mutait.org) that provides easy access to two of the largest genetic toxicology databases, the Mammalian Gene Mutation Database (MGMD) and TransgenicDB. TransgenicDB is a comprehensive collection of transgenic rodent mutation data initially compiled and collated by Health Canada. The updated MGMD contains approximately 50 000 individual mutation spectral records from the published literature. The portal not only gives access to an enormous quantity of genetic toxicology data, but also provides statistical tools for dose-response analysis and calculation of benchmark dose. Two important R packages for dose-response analysis are provided as web-distributed applications with user-friendly graphical interfaces. The 'drsmooth' package performs dose-response shape analysis and determines various points of departure (PoD) metrics and the 'PROAST' package provides algorithms for dose-response modelling. The MutAIT statistical tools, which are currently being enhanced, provide users with an efficient and comprehensive platform to conduct quantitative dose-response analyses and determine PoD values that can then be used to calculate human exposure limits or margins of exposure. © The Author 2015. Published by Oxford University Press on behalf of the UK Environmental Mutagen Society. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Learning investment indicators through data extension
NASA Astrophysics Data System (ADS)
Dvořák, Marek
2017-07-01
Stock prices in the form of time series were analysed using single and multivariate statistical methods. After simple data preprocessing in the form of logarithmic differences, we augmented this single variate time series to a multivariate representation. This method makes use of sliding windows to calculate several dozen of new variables using simple statistic tools like first and second moments as well as more complicated statistic, like auto-regression coefficients and residual analysis, followed by an optional quadratic transformation that was further used for data extension. These were used as a explanatory variables in a regularized logistic LASSO regression which tried to estimate Buy-Sell Index (BSI) from real stock market data.
Automatic identification of bacterial types using statistical imaging methods
NASA Astrophysics Data System (ADS)
Trattner, Sigal; Greenspan, Hayit; Tepper, Gapi; Abboud, Shimon
2003-05-01
The objective of the current study is to develop an automatic tool to identify bacterial types using computer-vision and statistical modeling techniques. Bacteriophage (phage)-typing methods are used to identify and extract representative profiles of bacterial types, such as the Staphylococcus Aureus. Current systems rely on the subjective reading of plaque profiles by human expert. This process is time-consuming and prone to errors, especially as technology is enabling the increase in the number of phages used for typing. The statistical methodology presented in this work, provides for an automated, objective and robust analysis of visual data, along with the ability to cope with increasing data volumes.
Temporal geospatial analysis of secondary school students’ examination performance
NASA Astrophysics Data System (ADS)
Nik Abd Kadir, ND; Adnan, NA
2016-06-01
Malaysia's Ministry of Education has improved the organization of the data to have the geographical information system (GIS) school database. However, no further analysis is done using geospatial analysis tool. Mapping has emerged as a communication tool and becomes effective way to publish the digital and statistical data such as school performance results. The objective of this study is to analyse secondary school student performance of science and mathematics scores of the Sijil Pelajaran Malaysia Examination result in the year 2010 to 2014 for the Kelantan's state schools with the aid of GIS software and geospatial analysis. The school performance according to school grade point average (GPA) from Grade A to Grade G were interpolated and mapped and query analysis using geospatial tools able to be done. This study will be beneficial to the education sector to analyse student performance not only in Kelantan but to the whole Malaysia and this will be a good method to publish in map towards better planning and decision making to prepare young Malaysians for the challenges of education system and performance.
This document describes three general approaches to the design of a sampling plan for biological monitoring of coral reefs. Status assessment, trend detection and targeted monitoring each require a different approach to site selection and statistical analysis. For status assessm...
Averaging Models: Parameters Estimation with the R-Average Procedure
ERIC Educational Resources Information Center
Vidotto, G.; Massidda, D.; Noventa, S.
2010-01-01
The Functional Measurement approach, proposed within the theoretical framework of Information Integration Theory (Anderson, 1981, 1982), can be a useful multi-attribute analysis tool. Compared to the majority of statistical models, the averaging model can account for interaction effects without adding complexity. The R-Average method (Vidotto &…
Identifying the Factors That Influence Change in SEBD Using Logistic Regression Analysis
ERIC Educational Resources Information Center
Camilleri, Liberato; Cefai, Carmel
2013-01-01
Multiple linear regression and ANOVA models are widely used in applications since they provide effective statistical tools for assessing the relationship between a continuous dependent variable and several predictors. However these models rely heavily on linearity and normality assumptions and they do not accommodate categorical dependent…
We used STARS (Spatial Tools for the Analysis of River Systems), an ArcGIS geoprocessing toolbox, to create spatial stream networks. We then developed and assessed spatial statistical models for each of these metrics, incorporating spatial autocorrelation based on both distance...
How to Engage Medical Students in Chronobiology: An Example on Autorhythmometry
ERIC Educational Resources Information Center
Rol de Lama, M. A.; Lozano, J. P.; Ortiz, V.; Sanchez-Vazquez, F. J.; Madrid, J. A.
2005-01-01
This contribution describes a new laboratory experience that improves medical students' learning of chronobiology by introducing them to basic chronobiology concepts as well as to methods and statistical analysis tools specific for circadian rhythms. We designed an autorhythmometry laboratory session where students simultaneously played the role…
Large-scale gene function analysis with the PANTHER classification system.
Mi, Huaiyu; Muruganujan, Anushya; Casagrande, John T; Thomas, Paul D
2013-08-01
The PANTHER (protein annotation through evolutionary relationship) classification system (http://www.pantherdb.org/) is a comprehensive system that combines gene function, ontology, pathways and statistical analysis tools that enable biologists to analyze large-scale, genome-wide data from sequencing, proteomics or gene expression experiments. The system is built with 82 complete genomes organized into gene families and subfamilies, and their evolutionary relationships are captured in phylogenetic trees, multiple sequence alignments and statistical models (hidden Markov models or HMMs). Genes are classified according to their function in several different ways: families and subfamilies are annotated with ontology terms (Gene Ontology (GO) and PANTHER protein class), and sequences are assigned to PANTHER pathways. The PANTHER website includes a suite of tools that enable users to browse and query gene functions, and to analyze large-scale experimental data with a number of statistical tests. It is widely used by bench scientists, bioinformaticians, computer scientists and systems biologists. In the 2013 release of PANTHER (v.8.0), in addition to an update of the data content, we redesigned the website interface to improve both user experience and the system's analytical capability. This protocol provides a detailed description of how to analyze genome-wide experimental data with the PANTHER classification system.
NASA Astrophysics Data System (ADS)
Lee, K. David; Wiesenfeld, Eric; Gelfand, Andrew
2007-04-01
One of the greatest challenges in modern combat is maintaining a high level of timely Situational Awareness (SA). In many situations, computational complexity and accuracy considerations make the development and deployment of real-time, high-level inference tools very difficult. An innovative hybrid framework that combines Bayesian inference, in the form of Bayesian Networks, and Possibility Theory, in the form of Fuzzy Logic systems, has recently been introduced to provide a rigorous framework for high-level inference. In previous research, the theoretical basis and benefits of the hybrid approach have been developed. However, lacking is a concrete experimental comparison of the hybrid framework with traditional fusion methods, to demonstrate and quantify this benefit. The goal of this research, therefore, is to provide a statistical analysis on the comparison of the accuracy and performance of hybrid network theory, with pure Bayesian and Fuzzy systems and an inexact Bayesian system approximated using Particle Filtering. To accomplish this task, domain specific models will be developed under these different theoretical approaches and then evaluated, via Monte Carlo Simulation, in comparison to situational ground truth to measure accuracy and fidelity. Following this, a rigorous statistical analysis of the performance results will be performed, to quantify the benefit of hybrid inference to other fusion tools.
Statistical imprints of CMB B -type polarization leakage in an incomplete sky survey analysis
DOE Office of Scientific and Technical Information (OSTI.GOV)
Santos, Larissa; Wang, Kai; Hu, Yangrui
2017-01-01
One of the main goals of modern cosmology is to search for primordial gravitational waves by looking on their imprints in the B -type polarization in the cosmic microwave background radiation. However, this signal is contaminated by various sources, including cosmic weak lensing, foreground radiations, instrumental noises, as well as the E -to- B leakage caused by the partial sky surveys, which should be well understood to avoid the misinterpretation of the observed data. In this paper, we adopt the E / B decomposition method suggested by Smith in 2006, and study the imprints of E -to- B leakage residualsmore » in the constructed B -type polarization maps, B( n-circumflex ), by employing various statistical tools. We find that the effects of E -to- B leakage are negligible for the B-mode power spectrum, as well as the skewness and kurtosis analyses of B-maps. However, if employing the morphological statistical tools, including Minkowski functionals and/or Betti numbers, we find the effect of leakage can be detected at very high confidence level, which shows that in the morphological analysis, the leakage can play a significant role as a contaminant for measuring the primordial B -mode signal and must be taken into account for a correct explanation of the data.« less
The influence of control group reproduction on the statistical ...
Because of various Congressional mandates to protect the environment from endocrine disrupting chemicals (EDCs), the United States Environmental Protection Agency (USEPA) initiated the Endocrine Disruptor Screening Program. In the context of this framework, the Office of Research and Development within the USEPA developed the Medaka Extended One Generation Reproduction Test (MEOGRT) to characterize the endocrine action of a suspected EDC. One important endpoint of the MEOGRT is fecundity of breeding pairs of medaka. Power analyses were conducted to determine the number of replicates needed in proposed test designs and to determine the effects that varying reproductive parameters (e.g. mean fecundity, variance, and days with no egg production) will have on the statistical power of the test. A software tool, the MEOGRT Reproduction Power Analysis Tool, was developed to expedite these power analyses by both calculating estimates of the needed reproductive parameters (e.g. population mean and variance) and performing the power analysis under user specified scenarios. The manuscript illustrates how the reproductive performance of the control medaka that are used in a MEOGRT influence statistical power, and therefore the successful implementation of the protocol. Example scenarios, based upon medaka reproduction data collected at MED, are discussed that bolster the recommendation that facilities planning to implement the MEOGRT should have a culture of medaka with hi
Interpretation of statistical results.
García Garmendia, J L; Maroto Monserrat, F
2018-02-21
The appropriate interpretation of the statistical results is crucial to understand the advances in medical science. The statistical tools allow us to transform the uncertainty and apparent chaos in nature to measurable parameters which are applicable to our clinical practice. The importance of understanding the meaning and actual extent of these instruments is essential for researchers, the funders of research and for professionals who require a permanent update based on good evidence and supports to decision making. Various aspects of the designs, results and statistical analysis are reviewed, trying to facilitate his comprehension from the basics to what is most common but no better understood, and bringing a constructive, non-exhaustive but realistic look. Copyright © 2018 Elsevier España, S.L.U. y SEMICYUC. All rights reserved.
gHRV: Heart rate variability analysis made easy.
Rodríguez-Liñares, L; Lado, M J; Vila, X A; Méndez, A J; Cuesta, P
2014-08-01
In this paper, the gHRV software tool is presented. It is a simple, free and portable tool developed in python for analysing heart rate variability. It includes a graphical user interface and it can import files in multiple formats, analyse time intervals in the signal, test statistical significance and export the results. This paper also contains, as an example of use, a clinical analysis performed with the gHRV tool, namely to determine whether the heart rate variability indexes change across different stages of sleep. Results from tests completed by researchers who have tried gHRV are also explained: in general the application was positively valued and results reflect a high level of satisfaction. gHRV is in continuous development and new versions will include suggestions made by testers. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.
Online and offline tools for head movement compensation in MEG.
Stolk, Arjen; Todorovic, Ana; Schoffelen, Jan-Mathijs; Oostenveld, Robert
2013-03-01
Magnetoencephalography (MEG) is measured above the head, which makes it sensitive to variations of the head position with respect to the sensors. Head movements blur the topography of the neuronal sources of the MEG signal, increase localization errors, and reduce statistical sensitivity. Here we describe two novel and readily applicable methods that compensate for the detrimental effects of head motion on the statistical sensitivity of MEG experiments. First, we introduce an online procedure that continuously monitors head position. Second, we describe an offline analysis method that takes into account the head position time-series. We quantify the performance of these methods in the context of three different experimental settings, involving somatosensory, visual and auditory stimuli, assessing both individual and group-level statistics. The online head localization procedure allowed for optimal repositioning of the subjects over multiple sessions, resulting in a 28% reduction of the variance in dipole position and an improvement of up to 15% in statistical sensitivity. Offline incorporation of the head position time-series into the general linear model resulted in improvements of group-level statistical sensitivity between 15% and 29%. These tools can substantially reduce the influence of head movement within and between sessions, increasing the sensitivity of many cognitive neuroscience experiments. Copyright © 2012 Elsevier Inc. All rights reserved.
NASA Astrophysics Data System (ADS)
Vitali, Lina; Righini, Gaia; Piersanti, Antonio; Cremona, Giuseppe; Pace, Giandomenico; Ciancarella, Luisella
2017-12-01
Air backward trajectory calculations are commonly used in a variety of atmospheric analyses, in particular for source attribution evaluation. The accuracy of backward trajectory analysis is mainly determined by the quality and the spatial and temporal resolution of the underlying meteorological data set, especially in the cases of complex terrain. This work describes a new tool for the calculation and the statistical elaboration of backward trajectories. To take advantage of the high-resolution meteorological database of the Italian national air quality model MINNI, a dedicated set of procedures was implemented under the name of M-TraCE (MINNI module for Trajectories Calculation and statistical Elaboration) to calculate and process the backward trajectories of air masses reaching a site of interest. Some outcomes from the application of the developed methodology to the Italian Network of Special Purpose Monitoring Stations are shown to assess its strengths for the meteorological characterization of air quality monitoring stations. M-TraCE has demonstrated its capabilities to provide a detailed statistical assessment of transport patterns and region of influence of the site under investigation, which is fundamental for correctly interpreting pollutants measurements and ascertaining the official classification of the monitoring site based on meta-data information. Moreover, M-TraCE has shown its usefulness in supporting other assessments, i.e., spatial representativeness of a monitoring site, focussing specifically on the analysis of the effects due to meteorological variables.
NASA Astrophysics Data System (ADS)
Mugnes, J.-M.; Robert, C.
2015-11-01
Spectral analysis is a powerful tool to investigate stellar properties and it has been widely used for decades now. However, the methods considered to perform this kind of analysis are mostly based on iteration among a few diagnostic lines to determine the stellar parameters. While these methods are often simple and fast, they can lead to errors and large uncertainties due to the required assumptions. Here, we present a method based on Bayesian statistics to find simultaneously the best combination of effective temperature, surface gravity, projected rotational velocity, and microturbulence velocity, using all the available spectral lines. Different tests are discussed to demonstrate the strength of our method, which we apply to 54 mid-resolution spectra of field and cluster B stars obtained at the Observatoire du Mont-Mégantic. We compare our results with those found in the literature. Differences are seen which are well explained by the different methods used. We conclude that the B-star microturbulence velocities are often underestimated. We also confirm the trend that B stars in clusters are on average faster rotators than field B stars.
Estimating the Diets of Animals Using Stable Isotopes and a Comprehensive Bayesian Mixing Model
Hopkins, John B.; Ferguson, Jake M.
2012-01-01
Using stable isotope mixing models (SIMMs) as a tool to investigate the foraging ecology of animals is gaining popularity among researchers. As a result, statistical methods are rapidly evolving and numerous models have been produced to estimate the diets of animals—each with their benefits and their limitations. Deciding which SIMM to use is contingent on factors such as the consumer of interest, its food sources, sample size, the familiarity a user has with a particular framework for statistical analysis, or the level of inference the researcher desires to make (e.g., population- or individual-level). In this paper, we provide a review of commonly used SIMM models and describe a comprehensive SIMM that includes all features commonly used in SIMM analysis and two new features. We used data collected in Yosemite National Park to demonstrate IsotopeR's ability to estimate dietary parameters. We then examined the importance of each feature in the model and compared our results to inferences from commonly used SIMMs. IsotopeR's user interface (in R) will provide researchers a user-friendly tool for SIMM analysis. The model is also applicable for use in paleontology, archaeology, and forensic studies as well as estimating pollution inputs. PMID:22235246
Geostatistical applications in environmental remediation
DOE Office of Scientific and Technical Information (OSTI.GOV)
Stewart, R.N.; Purucker, S.T.; Lyon, B.F.
1995-02-01
Geostatistical analysis refers to a collection of statistical methods for addressing data that vary in space. By incorporating spatial information into the analysis, geostatistics has advantages over traditional statistical analysis for problems with a spatial context. Geostatistics has a history of success in earth science applications, and its popularity is increasing in other areas, including environmental remediation. Due to recent advances in computer technology, geostatistical algorithms can be executed at a speed comparable to many standard statistical software packages. When used responsibly, geostatistics is a systematic and defensible tool can be used in various decision frameworks, such as the Datamore » Quality Objectives (DQO) process. At every point in the site, geostatistics can estimate both the concentration level and the probability or risk of exceeding a given value. Using these probability maps can assist in identifying clean-up zones. Given any decision threshold and an acceptable level of risk, the probability maps identify those areas that are estimated to be above or below the acceptable risk. Those areas that are above the threshold are of the most concern with regard to remediation. In addition to estimating clean-up zones, geostatistics can assist in designing cost-effective secondary sampling schemes. Those areas of the probability map with high levels of estimated uncertainty are areas where more secondary sampling should occur. In addition, geostatistics has the ability to incorporate soft data directly into the analysis. These data include historical records, a highly correlated secondary contaminant, or expert judgment. The role of geostatistics in environmental remediation is a tool that in conjunction with other methods can provide a common forum for building consensus.« less
Irvine, Kathryn M.; Manlove, Kezia; Hollimon, Cynthia
2012-01-01
An important consideration for long term monitoring programs is determining the required sampling effort to detect trends in specific ecological indicators of interest. To enhance the Greater Yellowstone Inventory and Monitoring Network’s water resources protocol(s) (O’Ney 2006 and O’Ney et al. 2009 [under review]), we developed a set of tools to: (1) determine the statistical power for detecting trends of varying magnitude in a specified water quality parameter over different lengths of sampling (years) and different within-year collection frequencies (monthly or seasonal sampling) at particular locations using historical data, and (2) perform periodic trend analyses for water quality parameters while addressing seasonality and flow weighting. A power analysis for trend detection is a statistical procedure used to estimate the probability of rejecting the hypothesis of no trend when in fact there is a trend, within a specific modeling framework. In this report, we base our power estimates on using the seasonal Kendall test (Helsel and Hirsch 2002) for detecting trend in water quality parameters measured at fixed locations over multiple years. We also present procedures (R-scripts) for conducting a periodic trend analysis using the seasonal Kendall test with and without flow adjustment. This report provides the R-scripts developed for power and trend analysis, tutorials, and the associated tables and graphs. The purpose of this report is to provide practical information for monitoring network staff on how to use these statistical tools for water quality monitoring data sets.
IPRStats: visualization of the functional potential of an InterProScan run.
Kelly, Ryan J; Vincent, David E; Friedberg, Iddo
2010-12-21
InterPro is a collection of protein signatures for the classification and automated annotation of proteins. Interproscan is a software tool that scans protein sequences against Interpro member databases using a variety of profile-based, hidden markov model and positional specific score matrix methods. It not only combines a set of analysis tools, but also performs data look-up from various sources, as well as some redundancy removal. Interproscan is robust and scalable, able to perform on any machine from a netbook to a large cluster. However, when performing whole-genome or metagenome analysis, there is a need for a fast statistical visualization of the results to have good initial grasp on the functional potential of the sequences in the analyzed data set. This is especially important when analyzing and comparing metagenomic or metaproteomic data-sets. IPRStats is a tool for the visualization of Interproscan results. Interproscan results are parsed from the Interproscan XML or EBIXML file into an SQLite or MySQL database. The results for each signature database scan are read and displayed as pie-charts or bar charts as summary statistics. A table is also provided, where each entry is a signature (e.g. a Pfam entry) accompanied by one or more Gene Ontology terms, if Interproscan was run using the Gene Ontology option. We present an platform-independent, open source licensed tool that is useful for Interproscan users who wish to view the summary of their results in a rapid and concise fashion.
Chavarrías, Cristina; García-Vázquez, Verónica; Alemán-Gómez, Yasser; Montesinos, Paula; Pascau, Javier; Desco, Manuel
2016-05-01
The purpose of this study was to develop a multi-platform automatic software tool for full processing of fMRI rodent studies. Existing tools require the usage of several different plug-ins, a significant user interaction and/or programming skills. Based on a user-friendly interface, the tool provides statistical parametric brain maps (t and Z) and percentage of signal change for user-provided regions of interest. The tool is coded in MATLAB (MathWorks(®)) and implemented as a plug-in for SPM (Statistical Parametric Mapping, the Wellcome Trust Centre for Neuroimaging). The automatic pipeline loads default parameters that are appropriate for preclinical studies and processes multiple subjects in batch mode (from images in either Nifti or raw Bruker format). In advanced mode, all processing steps can be selected or deselected and executed independently. Processing parameters and workflow were optimized for rat studies and assessed using 460 male-rat fMRI series on which we tested five smoothing kernel sizes and three different hemodynamic models. A smoothing kernel of FWHM = 1.2 mm (four times the voxel size) yielded the highest t values at the somatosensorial primary cortex, and a boxcar response function provided the lowest residual variance after fitting. fMRat offers the features of a thorough SPM-based analysis combined with the functionality of several SPM extensions in a single automatic pipeline with a user-friendly interface. The code and sample images can be downloaded from https://github.com/HGGM-LIM/fmrat .
An ANOVA approach for statistical comparisons of brain networks.
Fraiman, Daniel; Fraiman, Ricardo
2018-03-16
The study of brain networks has developed extensively over the last couple of decades. By contrast, techniques for the statistical analysis of these networks are less developed. In this paper, we focus on the statistical comparison of brain networks in a nonparametric framework and discuss the associated detection and identification problems. We tested network differences between groups with an analysis of variance (ANOVA) test we developed specifically for networks. We also propose and analyse the behaviour of a new statistical procedure designed to identify different subnetworks. As an example, we show the application of this tool in resting-state fMRI data obtained from the Human Connectome Project. We identify, among other variables, that the amount of sleep the days before the scan is a relevant variable that must be controlled. Finally, we discuss the potential bias in neuroimaging findings that is generated by some behavioural and brain structure variables. Our method can also be applied to other kind of networks such as protein interaction networks, gene networks or social networks.
Statistical Inference for Porous Materials using Persistent Homology.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Moon, Chul; Heath, Jason E.; Mitchell, Scott A.
2017-12-01
We propose a porous materials analysis pipeline using persistent homology. We rst compute persistent homology of binarized 3D images of sampled material subvolumes. For each image we compute sets of homology intervals, which are represented as summary graphics called persistence diagrams. We convert persistence diagrams into image vectors in order to analyze the similarity of the homology of the material images using the mature tools for image analysis. Each image is treated as a vector and we compute its principal components to extract features. We t a statistical model using the loadings of principal components to estimate material porosity, permeability,more » anisotropy, and tortuosity. We also propose an adaptive version of the structural similarity index (SSIM), a similarity metric for images, as a measure to determine the statistical representative elementary volumes (sREV) for persistence homology. Thus we provide a capability for making a statistical inference of the uid ow and transport properties of porous materials based on their geometry and connectivity.« less
Conducting Simulation Studies in the R Programming Environment.
Hallgren, Kevin A
2013-10-12
Simulation studies allow researchers to answer specific questions about data analysis, statistical power, and best-practices for obtaining accurate results in empirical research. Despite the benefits that simulation research can provide, many researchers are unfamiliar with available tools for conducting their own simulation studies. The use of simulation studies need not be restricted to researchers with advanced skills in statistics and computer programming, and such methods can be implemented by researchers with a variety of abilities and interests. The present paper provides an introduction to methods used for running simulation studies using the R statistical programming environment and is written for individuals with minimal experience running simulation studies or using R. The paper describes the rationale and benefits of using simulations and introduces R functions relevant for many simulation studies. Three examples illustrate different applications for simulation studies, including (a) the use of simulations to answer a novel question about statistical analysis, (b) the use of simulations to estimate statistical power, and (c) the use of simulations to obtain confidence intervals of parameter estimates through bootstrapping. Results and fully annotated syntax from these examples are provided.
Lefebvre, Alexandre; Rochefort, Gael Y.; Santos, Frédéric; Le Denmat, Dominique; Salmon, Benjamin; Pétillon, Jean-Marc
2016-01-01
Over the last decade, biomedical 3D-imaging tools have gained widespread use in the analysis of prehistoric bone artefacts. While initial attempts to characterise the major categories used in osseous industry (i.e. bone, antler, and dentine/ivory) have been successful, the taxonomic determination of prehistoric artefacts remains to be investigated. The distinction between reindeer and red deer antler can be challenging, particularly in cases of anthropic and/or taphonomic modifications. In addition to the range of destructive physicochemical identification methods available (mass spectrometry, isotopic ratio, and DNA analysis), X-ray micro-tomography (micro-CT) provides convincing non-destructive 3D images and analyses. This paper presents the experimental protocol (sample scans, image processing, and statistical analysis) we have developed in order to identify modern and archaeological antler collections (from Isturitz, France). This original method is based on bone microstructure analysis combined with advanced statistical support vector machine (SVM) classifiers. A combination of six microarchitecture biomarkers (bone volume fraction, trabecular number, trabecular separation, trabecular thickness, trabecular bone pattern factor, and structure model index) were screened using micro-CT in order to characterise internal alveolar structure. Overall, reindeer alveoli presented a tighter mesh than red deer alveoli, and statistical analysis allowed us to distinguish archaeological antler by species with an accuracy of 96%, regardless of anatomical location on the antler. In conclusion, micro-CT combined with SVM classifiers proves to be a promising additional non-destructive method for antler identification, suitable for archaeological artefacts whose degree of human modification and cultural heritage or scientific value has previously made it impossible (tools, ornaments, etc.). PMID:26901355
Albrecht, Jessica; Kopietz, Rainer; Frasnelli, Johannes; Wiesmann, Martin; Hummel, Thomas; Lundström, Johan N.
2009-01-01
Almost every odor we encounter in daily life has the capacity to produce a trigeminal sensation. Surprisingly, few functional imaging studies exploring human neuronal correlates of intranasal trigeminal function exist, and results are to some degree inconsistent. We utilized activation likelihood estimation (ALE), a quantitative voxel-based meta-analysis tool, to analyze functional imaging data (fMRI/PET) following intranasal trigeminal stimulation with carbon dioxide (CO2), a stimulus known to exclusively activate the trigeminal system. Meta-analysis tools are able to identify activations common across studies, thereby enabling activation mapping with higher certainty. Activation foci of nine studies utilizing trigeminal stimulation were included in the meta-analysis. We found significant ALE scores, thus indicating consistent activation across studies, in the brainstem, ventrolateral posterior thalamic nucleus, anterior cingulate cortex, insula, precentral gyrus, as well as in primary and secondary somatosensory cortices – a network known for the processing of intranasal nociceptive stimuli. Significant ALE values were also observed in the piriform cortex, insula, and the orbitofrontal cortex, areas known to process chemosensory stimuli, and in association cortices. Additionally, the trigeminal ALE statistics were directly compared with ALE statistics originating from olfactory stimulation, demonstrating considerable overlap in activation. In conclusion, the results of this meta-analysis map the human neuronal correlates of intranasal trigeminal stimulation with high statistical certainty and demonstrate that the cortical areas recruited during the processing of intranasal CO2 stimuli include those outside traditional trigeminal areas. Moreover, through illustrations of the considerable overlap between brain areas that process trigeminal and olfactory information; these results demonstrate the interconnectivity of flavor processing. PMID:19913573
Harkness, Mark; Fisher, Angela; Lee, Michael D; Mack, E Erin; Payne, Jo Ann; Dworatzek, Sandra; Roberts, Jeff; Acheson, Carolyn; Herrmann, Ronald; Possolo, Antonio
2012-04-01
A large, multi-laboratory microcosm study was performed to select amendments for supporting reductive dechlorination of high levels of trichloroethylene (TCE) found at an industrial site in the United Kingdom (UK) containing dense non-aqueous phase liquid (DNAPL) TCE. The study was designed as a fractional factorial experiment involving 177 bottles distributed between four industrial laboratories and was used to assess the impact of six electron donors, bioaugmentation, addition of supplemental nutrients, and two TCE levels (0.57 and 1.90 mM or 75 and 250 mg/L in the aqueous phase) on TCE dechlorination. Performance was assessed based on the concentration changes of TCE and reductive dechlorination degradation products. The chemical data was evaluated using analysis of variance (ANOVA) and survival analysis techniques to determine both main effects and important interactions for all the experimental variables during the 203-day study. The statistically based design and analysis provided powerful tools that aided decision-making for field application of this technology. The analysis showed that emulsified vegetable oil (EVO), lactate, and methanol were the most effective electron donors, promoting rapid and complete dechlorination of TCE to ethene. Bioaugmentation and nutrient addition also had a statistically significant positive impact on TCE dechlorination. In addition, the microbial community was measured using phospholipid fatty acid analysis (PLFA) for quantification of total biomass and characterization of the community structure and quantitative polymerase chain reaction (qPCR) for enumeration of Dehalococcoides organisms (Dhc) and the vinyl chloride reductase (vcrA) gene. The highest increase in levels of total biomass and Dhc was observed in the EVO microcosms, which correlated well with the dechlorination results. Copyright © 2012 Elsevier B.V. All rights reserved.
Design of a Web-tool for diagnostic clinical trials handling medical imaging research.
Baltasar Sánchez, Alicia; González-Sistal, Angel
2011-04-01
New clinical studies in medicine are based on patients and controls using different imaging diagnostic modalities. Medical information systems are not designed for clinical trials employing clinical imaging. Although commercial software and communication systems focus on storage of image data, they are not suitable for storage and mining of new types of quantitative data. We sought to design a Web-tool to support diagnostic clinical trials involving different experts and hospitals or research centres. The image analysis of this project is based on skeletal X-ray imaging. It involves a computerised image method using quantitative analysis of regions of interest in healthy bone and skeletal metastases. The database is implemented with ASP.NET 3.5 and C# technologies for our Web-based application. For data storage, we chose MySQL v.5.0, one of the most popular open source databases. User logins were necessary, and access to patient data was logged for auditing. For security, all data transmissions were carried over encrypted connections. This Web-tool is available to users scattered at different locations; it allows an efficient organisation and storage of data (case report form) and images and allows each user to know precisely what his task is. The advantages of our Web-tool are as follows: (1) sustainability is guaranteed; (2) network locations for collection of data are secured; (3) all clinical information is stored together with the original images and the results derived from processed images and statistical analysis that enable us to perform retrospective studies; (4) changes are easily incorporated because of the modular architecture; and (5) assessment of trial data collected at different sites is centralised to reduce statistical variance.
Characterizing pigments with hyperspectral imaging variable false-color composites
NASA Astrophysics Data System (ADS)
Hayem-Ghez, Anita; Ravaud, Elisabeth; Boust, Clotilde; Bastian, Gilles; Menu, Michel; Brodie-Linder, Nancy
2015-11-01
Hyperspectral imaging has been used for pigment characterization on paintings for the last 10 years. It is a noninvasive technique, which mixes the power of spectrophotometry and that of imaging technologies. We have access to a visible and near-infrared hyperspectral camera, ranging from 400 to 1000 nm in 80-160 spectral bands. In order to treat the large amount of data that this imaging technique generates, one can use statistical tools such as principal component analysis (PCA). To conduct the characterization of pigments, researchers mostly use PCA, convex geometry algorithms and the comparison of resulting clusters to database spectra with a specific tolerance (like the Spectral Angle Mapper tool on the dedicated software ENVI). Our approach originates from false-color photography and aims at providing a simple tool to identify pigments thanks to imaging spectroscopy. It can be considered as a quick first analysis to see the principal pigments of a painting, before using a more complete multivariate statistical tool. We study pigment spectra, for each kind of hue (blue, green, red and yellow) to identify the wavelength maximizing spectral differences. The case of red pigments is most interesting because our methodology can discriminate the red pigments very well—even red lakes, which are always difficult to identify. As for the yellow and blue categories, it represents a good progress of IRFC photography for pigment discrimination. We apply our methodology to study the pigments on a painting by Eustache Le Sueur, a French painter of the seventeenth century. We compare the results to other noninvasive analysis like X-ray fluorescence and optical microscopy. Finally, we draw conclusions about the advantages and limits of the variable false-color image method using hyperspectral imaging.
Language-Agnostic Reproducible Data Analysis Using Literate Programming.
Vassilev, Boris; Louhimo, Riku; Ikonen, Elina; Hautaniemi, Sampsa
2016-01-01
A modern biomedical research project can easily contain hundreds of analysis steps and lack of reproducibility of the analyses has been recognized as a severe issue. While thorough documentation enables reproducibility, the number of analysis programs used can be so large that in reality reproducibility cannot be easily achieved. Literate programming is an approach to present computer programs to human readers. The code is rearranged to follow the logic of the program, and to explain that logic in a natural language. The code executed by the computer is extracted from the literate source code. As such, literate programming is an ideal formalism for systematizing analysis steps in biomedical research. We have developed the reproducible computing tool Lir (literate, reproducible computing) that allows a tool-agnostic approach to biomedical data analysis. We demonstrate the utility of Lir by applying it to a case study. Our aim was to investigate the role of endosomal trafficking regulators to the progression of breast cancer. In this analysis, a variety of tools were combined to interpret the available data: a relational database, standard command-line tools, and a statistical computing environment. The analysis revealed that the lipid transport related genes LAPTM4B and NDRG1 are coamplified in breast cancer patients, and identified genes potentially cooperating with LAPTM4B in breast cancer progression. Our case study demonstrates that with Lir, an array of tools can be combined in the same data analysis to improve efficiency, reproducibility, and ease of understanding. Lir is an open-source software available at github.com/borisvassilev/lir.
Language-Agnostic Reproducible Data Analysis Using Literate Programming
Vassilev, Boris; Louhimo, Riku; Ikonen, Elina; Hautaniemi, Sampsa
2016-01-01
A modern biomedical research project can easily contain hundreds of analysis steps and lack of reproducibility of the analyses has been recognized as a severe issue. While thorough documentation enables reproducibility, the number of analysis programs used can be so large that in reality reproducibility cannot be easily achieved. Literate programming is an approach to present computer programs to human readers. The code is rearranged to follow the logic of the program, and to explain that logic in a natural language. The code executed by the computer is extracted from the literate source code. As such, literate programming is an ideal formalism for systematizing analysis steps in biomedical research. We have developed the reproducible computing tool Lir (literate, reproducible computing) that allows a tool-agnostic approach to biomedical data analysis. We demonstrate the utility of Lir by applying it to a case study. Our aim was to investigate the role of endosomal trafficking regulators to the progression of breast cancer. In this analysis, a variety of tools were combined to interpret the available data: a relational database, standard command-line tools, and a statistical computing environment. The analysis revealed that the lipid transport related genes LAPTM4B and NDRG1 are coamplified in breast cancer patients, and identified genes potentially cooperating with LAPTM4B in breast cancer progression. Our case study demonstrates that with Lir, an array of tools can be combined in the same data analysis to improve efficiency, reproducibility, and ease of understanding. Lir is an open-source software available at github.com/borisvassilev/lir. PMID:27711123
Pereira, Tiago Veiga; Rudnicki, Martina; Pereira, Alexandre Costa; Pombo-de-Oliveira, Maria S; Franco, Rendrik França
2006-01-01
Meta-analysis has become an important statistical tool in genetic association studies, since it may provide more powerful and precise estimates. However, meta-analytic studies are prone to several potential biases not only because the preferential publication of "positive'' studies but also due to difficulties in obtaining all relevant information during the study selection process. In this letter, we point out major problems in meta-analysis that may lead to biased conclusions, illustrating an empirical example of two recent meta-analyses on the relation between MTHFR polymorphisms and risk of acute lymphoblastic leukemia that, despite the similarity in statistical methods and period of study selection, provided partially conflicting results.
TableViewer for Herschel Data Processing
NASA Astrophysics Data System (ADS)
Zhang, L.; Schulz, B.
2006-07-01
The TableViewer utility is a GUI tool written in Java to support interactive data processing and analysis for the Herschel Space Observatory (Pilbratt et al. 2001). The idea was inherited from a prototype written in IDL (Schulz et al. 2005). It allows to graphically view and analyze tabular data organized in columns with equal numbers of rows. It can be run either as a standalone application, where data access is restricted to FITS (FITS 1999) files only, or it can be run from the Quick Look Analysis(QLA) or Interactive Analysis(IA) command line, from where also objects are accessible. The graphic display is very versatile, allowing plots in either linear or log scales. Zooming, panning, and changing data columns is performed rapidly using a group of navigation buttons. Selecting and de-selecting of fields of data points controls the input to simple analysis tasks like building a statistics table, or generating power spectra. The binary data stored in a TableDataset^1, a Product or in FITS files can also be displayed as tabular data, where values in individual cells can be modified. TableViewer provides several processing utilities which, besides calculation of statistics either for all channels or for selected channels, and calculation of power spectra, allows to convert/repair datasets by changing the unit name of data columns, and by modifying data values in columns with a simple calculator tool. Interactively selected data can be separated out, and modified data sets can be saved to FITS files. The tool will be very helpful especially in the early phases of Herschel data analysis when a quick access to contents of data products is important. TableDataset and Product are Java classes defined in herschel.ia.dataset.
NASA Astrophysics Data System (ADS)
Koymans, Mathijs; Langereis, Cor; Pastor-Galán, Daniel; van Hinsbergen, Douwe
2017-04-01
This contribution gives an overview of Paleomagnetism.org (Koymans et al., 2016), an online environment for paleomagnetic analysis. The application is developed in JavaScript and is fully open-sourced. It presents an interactive website in which paleomagnetic data can be interpreted, evaluated, visualized, and shared with others. The application has been available from late 2015 and since then has evolved with the addition of a magnetostratigraphic tool, additional input formats, and features that emphasize on the link between geomagnetism and tectonics. In the interpretation portal, principle component analysis (Kirschvink et al., 1981) can be applied on visualized demagnetization data (Zijderveld, 1967). Interpreted directions and great circles are combined using the iterative procedure described by (McFadden and McElhinny, 1988). The resulting directions can be further used in the statistics portal or exported as raw tabulated data and high-quality figures. The available tools in the statistics portal cover standard Fisher statistics for directional data and virtual geomagnetic poles (Fisher, 1953; Butler, 1992; Deenen et al., 2011). Other tools include the eigenvector approach foldtest (Tauxe and Watson, 1994), a bootstrapped reversal test (Tauxe et al., 2009), and the classical reversal test (McFadden and McElhinny, 1990). An implementation exists for the detection and correction of inclination shallowing in sediments (Tauxe and Kent, 2004; Tauxe et al., 2008), and a module to visualize apparent polar wander paths (Torsvik et al., 2012; Kent and Irving, 2010; Besse and Courtillot, 2002) for large continent-bearing plates. A miscellaneous portal exists for a set of tools that include a boostrapped oroclinal test (Pastor-Galán et al., 2016) for assessing possible linear relationships between strike and declination. Another tool that is available completes a net tectonic rotation analysis (after Morris et al., 1999) that restores a dyke to its paleo-vertical and can be used in determining paleo-spreading directions fundamental to plate reconstructions. Paleomagnetism.org provides an integrated approach for researchers to export and share paleomagnetic data through a common interface. The portals create a custom exportable file that can be distributed and included in public databases. With a publication, this file can be appended and would contain all paleomagnetic data discussed in the publication. The appended file can then be imported to the application by other researchers for reviewing. The accessibility and simplicity through which paleomagnetic data can be interpreted, analyzed, visualized, and shared should make Paleomagnetism.org of interest to the paleomagnetic and tectonic communities.
Schmitt, J Eric; Scanlon, Mary H; Servaes, Sabah; Levin, Dayna; Cook, Tessa S
2015-10-01
The advent of the ACGME's Next Accreditation System represents a significant new challenge for residencies and fellowships, owing to its requirements for more complex and detailed information. We developed a system of online assessment tools to provide comprehensive coverage of the twelve ACGME Milestones and digitized them using freely available cloud-based productivity tools. These tools include a combination of point-of-care procedural assessments, electronic quizzes, online modules, and other data entry forms. Using free statistical analytic tools, we also developed an automated system for management, processing, and data reporting. After one year of use, our Milestones project has resulted in the submission of over 20,000 individual data points. The use of automated statistical methods to generate resident-specific profiles has allowed for dynamic reports of individual residents' progress. These profiles both summarize data and also allow program directors access to more granular information as needed. Informatics-driven strategies for data assessment and processing represent feasible solutions to Milestones assessment and analysis, reducing the potential administrative burden for program directors, residents, and staff. Copyright © 2015 AUR. Published by Elsevier Inc. All rights reserved.
Villodre, Celia; Rebasa, Pere; Estrada, José Luís; Zaragoza, Carmen; Zapater, Pedro; Mena, Luís; Lluís, Félix
2016-11-01
In a previous study, we found that Physiological and Operative Severity Score for the enUmeration of Mortality and Morbidity (POSSUM) overpredicts morbidity risk in emergency gastrointestinal surgery. Our aim was to find a POSSUM equation adjustment. A prospective observational study was performed on 2,361 patients presenting with a community-acquired gastrointestinal surgical emergency. The first 1,000 surgeries constituted the development cohort, the second 1,000 events were the first validation intramural cohort, and the remaining 361 cases belonged to a second validation extramural cohort. (1) A modified POSSUM equation was obtained. (2) Logistic regression was used to yield a statistically significant equation that included age, hemoglobin, white cell count, sodium and operative severity. (3) A chi-square automatic interaction detector decision tree analysis yielded a statistically significant equation with 4 variables, namely cardiac failure, sodium, operative severity, and peritoneal soiling. A modified POSSUM equation and a simplified scoring system (aLicante sUrgical Community Emergencies New Tool for the enUmeration of Morbidities [LUCENTUM]) are described. Both tools significantly improve prediction of surgical morbidity in community-acquired gastrointestinal surgical emergencies. Copyright © 2016 Elsevier Inc. All rights reserved.
LaBudde, Robert A; Harnly, James M
2012-01-01
A qualitative botanical identification method (BIM) is an analytical procedure that returns a binary result (1 = Identified, 0 = Not Identified). A BIM may be used by a buyer, manufacturer, or regulator to determine whether a botanical material being tested is the same as the target (desired) material, or whether it contains excessive nontarget (undesirable) material. The report describes the development and validation of studies for a BIM based on the proportion of replicates identified, or probability of identification (POI), as the basic observed statistic. The statistical procedures proposed for data analysis follow closely those of the probability of detection, and harmonize the statistical concepts and parameters between quantitative and qualitative method validation. Use of POI statistics also harmonizes statistical concepts for botanical, microbiological, toxin, and other analyte identification methods that produce binary results. The POI statistical model provides a tool for graphical representation of response curves for qualitative methods, reporting of descriptive statistics, and application of performance requirements. Single collaborator and multicollaborative study examples are given.
Kathman, Steven J; Potts, Ryan J; Ayres, Paul H; Harp, Paul R; Wilson, Cody L; Garner, Charles D
2010-10-01
The mouse dermal assay has long been used to assess the dermal tumorigenicity of cigarette smoke condensate (CSC). This mouse skin model has been developed for use in carcinogenicity testing utilizing the SENCAR mouse as the standard strain. Though the model has limitations, it remains as the most relevant method available to study the dermal tumor promoting potential of mainstream cigarette smoke. In the typical SENCAR mouse CSC bioassay, CSC is applied for 29 weeks following the application of a tumor initiator such as 7,12-dimethylbenz[a]anthracene (DMBA). Several endpoints are considered for analysis including: the percentage of animals with at least one mass, latency, and number of masses per animal. In this paper, a relatively straightforward analytic model and procedure is presented for analyzing the time course of the incidence of masses. The procedure considered here takes advantage of Bayesian statistical techniques, which provide powerful methods for model fitting and simulation. Two datasets are analyzed to illustrate how the model fits the data, how well the model may perform in predicting data from such trials, and how the model may be used as a decision tool when comparing the dermal tumorigenicity of cigarette smoke condensate from multiple cigarette types. The analysis presented here was developed as a statistical decision tool for differentiating between two or more prototype products based on the dermal tumorigenicity. Copyright (c) 2010 Elsevier Inc. All rights reserved.
Inverse Statistics and Asset Allocation Efficiency
NASA Astrophysics Data System (ADS)
Bolgorian, Meysam
In this paper using inverse statistics analysis, the effect of investment horizon on the efficiency of portfolio selection is examined. Inverse statistics analysis is a general tool also known as probability distribution of exit time that is used for detecting the distribution of the time in which a stochastic process exits from a zone. This analysis was used in Refs. 1 and 2 for studying the financial returns time series. This distribution provides an optimal investment horizon which determines the most likely horizon for gaining a specific return. Using samples of stocks from Tehran Stock Exchange (TSE) as an emerging market and S&P 500 as a developed market, effect of optimal investment horizon in asset allocation is assessed. It is found that taking into account the optimal investment horizon in TSE leads to more efficiency for large size portfolios while for stocks selected from S&P 500, regardless of portfolio size, this strategy does not only not produce more efficient portfolios, but also longer investment horizons provides more efficiency.
Yue Xu, Selene; Nelson, Sandahl; Kerr, Jacqueline; Godbole, Suneeta; Patterson, Ruth; Merchant, Gina; Abramson, Ian; Staudenmayer, John; Natarajan, Loki
2018-04-01
Physical inactivity is a recognized risk factor for many chronic diseases. Accelerometers are increasingly used as an objective means to measure daily physical activity. One challenge in using these devices is missing data due to device nonwear. We used a well-characterized cohort of 333 overweight postmenopausal breast cancer survivors to examine missing data patterns of accelerometer outputs over the day. Based on these observed missingness patterns, we created psuedo-simulated datasets with realistic missing data patterns. We developed statistical methods to design imputation and variance weighting algorithms to account for missing data effects when fitting regression models. Bias and precision of each method were evaluated and compared. Our results indicated that not accounting for missing data in the analysis yielded unstable estimates in the regression analysis. Incorporating variance weights and/or subject-level imputation improved precision by >50%, compared to ignoring missing data. We recommend that these simple easy-to-implement statistical tools be used to improve analysis of accelerometer data.
Machine learning for neuroimaging with scikit-learn.
Abraham, Alexandre; Pedregosa, Fabian; Eickenberg, Michael; Gervais, Philippe; Mueller, Andreas; Kossaifi, Jean; Gramfort, Alexandre; Thirion, Bertrand; Varoquaux, Gaël
2014-01-01
Statistical machine learning methods are increasingly used for neuroimaging data analysis. Their main virtue is their ability to model high-dimensional datasets, e.g., multivariate analysis of activation images or resting-state time series. Supervised learning is typically used in decoding or encoding settings to relate brain images to behavioral or clinical observations, while unsupervised learning can uncover hidden structures in sets of images (e.g., resting state functional MRI) or find sub-populations in large cohorts. By considering different functional neuroimaging applications, we illustrate how scikit-learn, a Python machine learning library, can be used to perform some key analysis steps. Scikit-learn contains a very large set of statistical learning algorithms, both supervised and unsupervised, and its application to neuroimaging data provides a versatile tool to study the brain.
Machine learning for neuroimaging with scikit-learn
Abraham, Alexandre; Pedregosa, Fabian; Eickenberg, Michael; Gervais, Philippe; Mueller, Andreas; Kossaifi, Jean; Gramfort, Alexandre; Thirion, Bertrand; Varoquaux, Gaël
2014-01-01
Statistical machine learning methods are increasingly used for neuroimaging data analysis. Their main virtue is their ability to model high-dimensional datasets, e.g., multivariate analysis of activation images or resting-state time series. Supervised learning is typically used in decoding or encoding settings to relate brain images to behavioral or clinical observations, while unsupervised learning can uncover hidden structures in sets of images (e.g., resting state functional MRI) or find sub-populations in large cohorts. By considering different functional neuroimaging applications, we illustrate how scikit-learn, a Python machine learning library, can be used to perform some key analysis steps. Scikit-learn contains a very large set of statistical learning algorithms, both supervised and unsupervised, and its application to neuroimaging data provides a versatile tool to study the brain. PMID:24600388
DOE Office of Scientific and Technical Information (OSTI.GOV)
Steed, Chad Allen
EDENx is a multivariate data visualization tool that allows interactive user driven analysis of large-scale data sets with high dimensionality. EDENx builds on our earlier system, called EDEN to enable analysis of more dimensions and larger scale data sets. EDENx provides an initial overview of summary statistics for each variable in the data set under investigation. EDENx allows the user to interact with graphical summary plots of the data to investigate subsets and their statistical associations. These plots include histograms, binned scatterplots, binned parallel coordinate plots, timeline plots, and graphical correlation indicators. From the EDENx interface, a user can selectmore » a subsample of interest and launch a more detailed data visualization via the EDEN system. EDENx is best suited for high-level, aggregate analysis tasks while EDEN is more appropriate for detail data investigations.« less
GAC: Gene Associations with Clinical, a web based application.
Zhang, Xinyan; Rupji, Manali; Kowalski, Jeanne
2017-01-01
We present GAC, a shiny R based tool for interactive visualization of clinical associations based on high-dimensional data. The tool provides a web-based suite to perform supervised principal component analysis (SuperPC), an approach that uses both high-dimensional data, such as gene expression, combined with clinical data to infer clinical associations. We extended the approach to address binary outcomes, in addition to continuous and time-to-event data in our package, thereby increasing the use and flexibility of SuperPC. Additionally, the tool provides an interactive visualization for summarizing results based on a forest plot for both binary and time-to-event data. In summary, the GAC suite of tools provide a one stop shop for conducting statistical analysis to identify and visualize the association between a clinical outcome of interest and high-dimensional data types, such as genomic data. Our GAC package has been implemented in R and is available via http://shinygispa.winship.emory.edu/GAC/. The developmental repository is available at https://github.com/manalirupji/GAC.
INFORMATION: THEORY, BRAIN, AND BEHAVIOR
Jensen, Greg; Ward, Ryan D.; Balsam, Peter D.
2016-01-01
In the 65 years since its formal specification, information theory has become an established statistical paradigm, providing powerful tools for quantifying probabilistic relationships. Behavior analysis has begun to adopt these tools as a novel means of measuring the interrelations between behavior, stimuli, and contingent outcomes. This approach holds great promise for making more precise determinations about the causes of behavior and the forms in which conditioning may be encoded by organisms. In addition to providing an introduction to the basics of information theory, we review some of the ways that information theory has informed the studies of Pavlovian conditioning, operant conditioning, and behavioral neuroscience. In addition to enriching each of these empirical domains, information theory has the potential to act as a common statistical framework by which results from different domains may be integrated, compared, and ultimately unified. PMID:24122456
NASA Astrophysics Data System (ADS)
Wang, Ximing; Kim, Bokkyu; Park, Ji Hoon; Wang, Erik; Forsyth, Sydney; Lim, Cody; Ravi, Ragini; Karibyan, Sarkis; Sanchez, Alexander; Liu, Brent
2017-03-01
Quantitative imaging biomarkers are used widely in clinical trials for tracking and evaluation of medical interventions. Previously, we have presented a web based informatics system utilizing quantitative imaging features for predicting outcomes in stroke rehabilitation clinical trials. The system integrates imaging features extraction tools and a web-based statistical analysis tool. The tools include a generalized linear mixed model(GLMM) that can investigate potential significance and correlation based on features extracted from clinical data and quantitative biomarkers. The imaging features extraction tools allow the user to collect imaging features and the GLMM module allows the user to select clinical data and imaging features such as stroke lesion characteristics from the database as regressors and regressands. This paper discusses the application scenario and evaluation results of the system in a stroke rehabilitation clinical trial. The system was utilized to manage clinical data and extract imaging biomarkers including stroke lesion volume, location and ventricle/brain ratio. The GLMM module was validated and the efficiency of data analysis was also evaluated.
Lack, N
2001-08-01
The introduction of the modified data set for quality assurance in obstetrics (formerly perinatal survey) in Lower Saxony and Bavaria as early as 1999 saw the urgent requirement for a corresponding new statistical analysis of the revised data. The general outline of a new data reporting concept was originally presented by the Bavarian Commission for Perinatology and Neonatology at the Munich Perinatal Conference in November 1997. These ideas are germinal to content and layout of the new quality report for obstetrics currently in its nationwide harmonisation phase coordinated by the federal office for quality assurance in hospital care. A flexible and modular database oriented analysis tool developed in Bavaria is now in its second year of successful operation. The functionalities of this system are described in detail.
Stevens, John R; Jones, Todd R; Lefevre, Michael; Ganesan, Balasubramanian; Weimer, Bart C
2017-01-01
Microbial community analysis experiments to assess the effect of a treatment intervention (or environmental change) on the relative abundance levels of multiple related microbial species (or operational taxonomic units) simultaneously using high throughput genomics are becoming increasingly common. Within the framework of the evolutionary phylogeny of all species considered in the experiment, this translates to a statistical need to identify the phylogenetic branches that exhibit a significant consensus response (in terms of operational taxonomic unit abundance) to the intervention. We present the R software package SigTree , a collection of flexible tools that make use of meta-analysis methods and regular expressions to identify and visualize significantly responsive branches in a phylogenetic tree, while appropriately adjusting for multiple comparisons.
Trivedi, Prinal; Edwards, Jode W; Wang, Jelai; Gadbury, Gary L; Srinivasasainagendra, Vinodh; Zakharkin, Stanislav O; Kim, Kyoungmi; Mehta, Tapan; Brand, Jacob P L; Patki, Amit; Page, Grier P; Allison, David B
2005-04-06
Many efforts in microarray data analysis are focused on providing tools and methods for the qualitative analysis of microarray data. HDBStat! (High-Dimensional Biology-Statistics) is a software package designed for analysis of high dimensional biology data such as microarray data. It was initially developed for the analysis of microarray gene expression data, but it can also be used for some applications in proteomics and other aspects of genomics. HDBStat! provides statisticians and biologists a flexible and easy-to-use interface to analyze complex microarray data using a variety of methods for data preprocessing, quality control analysis and hypothesis testing. Results generated from data preprocessing methods, quality control analysis and hypothesis testing methods are output in the form of Excel CSV tables, graphs and an Html report summarizing data analysis. HDBStat! is a platform-independent software that is freely available to academic institutions and non-profit organizations. It can be downloaded from our website http://www.soph.uab.edu/ssg_content.asp?id=1164.
Evaluation of the quality of the teaching-learning process in undergraduate courses in Nursing.
González-Chordá, Víctor Manuel; Maciá-Soler, María Loreto
2015-01-01
to identify aspects of improvement of the quality of the teaching-learning process through the analysis of tools that evaluated the acquisition of skills by undergraduate students of Nursing. prospective longitudinal study conducted in a population of 60 secondyear Nursing students based on registration data, from which quality indicators that evaluate the acquisition of skills were obtained, with descriptive and inferential analysis. nine items were identified and nine learning activities included in the assessment tools that did not reach the established quality indicators (p<0.05). There are statistically significant differences depending on the hospital and clinical practices unit (p<0.05). the analysis of the evaluation tools used in the article "Nursing Care in Welfare Processes" of the analyzed university undergraduate course enabled the detection of the areas for improvement in the teachinglearning process. The challenge of education in nursing is to reach the best clinical research and educational results, in order to provide improvements to the quality of education and health care.
Kuhn-Tucker optimization based reliability analysis for probabilistic finite elements
NASA Technical Reports Server (NTRS)
Liu, W. K.; Besterfield, G.; Lawrence, M.; Belytschko, T.
1988-01-01
The fusion of probability finite element method (PFEM) and reliability analysis for fracture mechanics is considered. Reliability analysis with specific application to fracture mechanics is presented, and computational procedures are discussed. Explicit expressions for the optimization procedure with regard to fracture mechanics are given. The results show the PFEM is a very powerful tool in determining the second-moment statistics. The method can determine the probability of failure or fracture subject to randomness in load, material properties and crack length, orientation, and location.
Fisher statistics for analysis of diffusion tensor directional information.
Hutchinson, Elizabeth B; Rutecki, Paul A; Alexander, Andrew L; Sutula, Thomas P
2012-04-30
A statistical approach is presented for the quantitative analysis of diffusion tensor imaging (DTI) directional information using Fisher statistics, which were originally developed for the analysis of vectors in the field of paleomagnetism. In this framework, descriptive and inferential statistics have been formulated based on the Fisher probability density function, a spherical analogue of the normal distribution. The Fisher approach was evaluated for investigation of rat brain DTI maps to characterize tissue orientation in the corpus callosum, fornix, and hilus of the dorsal hippocampal dentate gyrus, and to compare directional properties in these regions following status epilepticus (SE) or traumatic brain injury (TBI) with values in healthy brains. Direction vectors were determined for each region of interest (ROI) for each brain sample and Fisher statistics were applied to calculate the mean direction vector and variance parameters in the corpus callosum, fornix, and dentate gyrus of normal rats and rats that experienced TBI or SE. Hypothesis testing was performed by calculation of Watson's F-statistic and associated p-value giving the likelihood that grouped observations were from the same directional distribution. In the fornix and midline corpus callosum, no directional differences were detected between groups, however in the hilus, significant (p<0.0005) differences were found that robustly confirmed observations that were suggested by visual inspection of directionally encoded color DTI maps. The Fisher approach is a potentially useful analysis tool that may extend the current capabilities of DTI investigation by providing a means of statistical comparison of tissue structural orientation. Copyright © 2012 Elsevier B.V. All rights reserved.
Dubovenko, Alexey; Nikolsky, Yuri; Rakhmatulin, Eugene; Nikolskaya, Tatiana
2017-01-01
Analysis of NGS and other sequencing data, gene variants, gene expression, proteomics, and other high-throughput (OMICs) data is challenging because of its biological complexity and high level of technical and biological noise. One way to deal with both problems is to perform analysis with a high fidelity annotated knowledgebase of protein interactions, pathways, and functional ontologies. This knowledgebase has to be structured in a computer-readable format and must include software tools for managing experimental data, analysis, and reporting. Here, we present MetaCore™ and Key Pathway Advisor (KPA), an integrated platform for functional data analysis. On the content side, MetaCore and KPA encompass a comprehensive database of molecular interactions of different types, pathways, network models, and ten functional ontologies covering human, mouse, and rat genes. The analytical toolkit includes tools for gene/protein list enrichment analysis, statistical "interactome" tool for the identification of over- and under-connected proteins in the dataset, and a biological network analysis module made up of network generation algorithms and filters. The suite also features Advanced Search, an application for combinatorial search of the database content, as well as a Java-based tool called Pathway Map Creator for drawing and editing custom pathway maps. Applications of MetaCore and KPA include molecular mode of action of disease research, identification of potential biomarkers and drug targets, pathway hypothesis generation, analysis of biological effects for novel small molecule compounds and clinical applications (analysis of large cohorts of patients, and translational and personalized medicine).
Fu, Wenjiang J.; Stromberg, Arnold J.; Viele, Kert; Carroll, Raymond J.; Wu, Guoyao
2009-01-01
Over the past two decades, there have been revolutionary developments in life science technologies characterized by high throughput, high efficiency, and rapid computation. Nutritionists now have the advanced methodologies for the analysis of DNA, RNA, protein, low-molecular-weight metabolites, as well as access to bioinformatics databases. Statistics, which can be defined as the process of making scientific inferences from data that contain variability, has historically played an integral role in advancing nutritional sciences. Currently, in the era of systems biology, statistics has become an increasingly important tool to quantitatively analyze information about biological macromolecules. This article describes general terms used in statistical analysis of large, complex experimental data. These terms include experimental design, power analysis, sample size calculation, and experimental errors (type I and II errors) for nutritional studies at population, tissue, cellular, and molecular levels. In addition, we highlighted various sources of experimental variations in studies involving microarray gene expression, real-time polymerase chain reaction, proteomics, and other bioinformatics technologies. Moreover, we provided guidelines for nutritionists and other biomedical scientists to plan and conduct studies and to analyze the complex data. Appropriate statistical analyses are expected to make an important contribution to solving major nutrition-associated problems in humans and animals (including obesity, diabetes, cardiovascular disease, cancer, ageing, and intrauterine fetal retardation). PMID:20233650
ERIC Educational Resources Information Center
Ciftci, S. Koza; Karadag, Engin; Akdal, Pinar
2014-01-01
The purpose of this study was to determine the effect of statistics instruction using computer-based tools, on statistics anxiety, attitude, and achievement. This study was designed as quasi-experimental research and the pattern used was a matched pre-test/post-test with control group design. Data was collected using three scales: a Statistics…
Selvarasu, Suresh; Kim, Do Yun; Karimi, Iftekhar A; Lee, Dong-Yup
2010-10-01
We present an integrated framework for characterizing fed-batch cultures of mouse hybridoma cells producing monoclonal antibody (mAb). This framework systematically combines data preprocessing, elemental balancing and statistical analysis technique. Initially, specific rates of cell growth, glucose/amino acid consumptions and mAb/metabolite productions were calculated via curve fitting using logistic equations, with subsequent elemental balancing of the preprocessed data indicating the presence of experimental measurement errors. Multivariate statistical analysis was then employed to understand physiological characteristics of the cellular system. The results from principal component analysis (PCA) revealed three major clusters of amino acids with similar trends in their consumption profiles: (i) arginine, threonine and serine, (ii) glycine, tyrosine, phenylalanine, methionine, histidine and asparagine, and (iii) lysine, valine and isoleucine. Further analysis using partial least square (PLS) regression identified key amino acids which were positively or negatively correlated with the cell growth, mAb production and the generation of lactate and ammonia. Based on these results, the optimal concentrations of key amino acids in the feed medium can be inferred, potentially leading to an increase in cell viability and productivity, as well as a decrease in toxic waste production. The study demonstrated how the current methodological framework using multivariate statistical analysis techniques can serve as a potential tool for deriving rational medium design strategies. Copyright © 2010 Elsevier B.V. All rights reserved.
2013-01-01
Background Routine electrocardiograms (ECGs) are not recommended for asymptomatic patients because the potential harms are thought to outweigh any benefits. Assessment tools to identify high risk individuals may improve the harm versus benefit profile of screening ECGs. In particular, people with unrecognized myocardial infarction (UMI) have elevated risk for cardiovascular events and death. Methods Using logistic regression, we developed a basic assessment tool among 16,653 participants in the REasons for Geographic and Racial Differences in Stroke (REGARDS) study using demographics, self-reported medical history, blood pressure, and body mass index and an expanded assessment tool using information on 51 potential variables. UMI was defined as electrocardiogram evidence of myocardial infarction without a self-reported history (n = 740). Results The basic assessment tool had a c-statistic of 0.638 (95% confidence interval 0.617 - 0.659) and included age, race, smoking status, body mass index, systolic blood pressure, and self-reported history of transient ischemic attack, deep vein thrombosis, falls, diabetes, and hypertension. A predicted probability of UMI > 3% provided a sensitivity of 80% and a specificity of 30%. The expanded assessment tool had a c-statistic of 0.654 (95% confidence interval 0.634-0.674). Because of the poor performance of these assessment tools, external validation was not pursued. Conclusions Despite examining a large number of potential correlates of UMI, the assessment tools did not provide a high level of discrimination. These data suggest defining groups with high prevalence of UMI for targeted screening will be difficult. PMID:23530553
Stuckey, Marla H.; Kiesler, James L.
2008-01-01
A water-analysis screening tool (WAST) was developed by the U.S. Geological Survey, in partnership with the Pennsylvania Department of Environmental Protection, to provide an initial screening of areas in the state where potential problems may exist related to the availability of water resources to meet current and future water-use demands. The tool compares water-use information to an initial screening criteria of the 7-day, 10-year low-flow statistic (7Q10) resulting in a screening indicator for influences of net withdrawals (withdrawals minus discharges) on aquatic-resource uses. This report is intended to serve as a guide for using the screening tool. The WAST can display general basin characteristics, water-use information, and screening-indicator information for over 10,000 watersheds in the state. The tool includes 12 primary functions that allow the user to display watershed information, edit water-use and water-supply information, observe effects downstream from edited water-use information, reset edited values to baseline, load new water-use information, save and retrieve scenarios, and save output as a Microsoft Excel spreadsheet.
eShadow: A tool for comparing closely related sequences
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ovcharenko, Ivan; Boffelli, Dario; Loots, Gabriela G.
2004-01-15
Primate sequence comparisons are difficult to interpret due to the high degree of sequence similarity shared between such closely related species. Recently, a novel method, phylogenetic shadowing, has been pioneered for predicting functional elements in the human genome through the analysis of multiple primate sequence alignments. We have expanded this theoretical approach to create a computational tool, eShadow, for the identification of elements under selective pressure in multiple sequence alignments of closely related genomes, such as in comparisons of human to primate or mouse to rat DNA. This tool integrates two different statistical methods and allows for the dynamic visualizationmore » of the resulting conservation profile. eShadow also includes a versatile optimization module capable of training the underlying Hidden Markov Model to differentially predict functional sequences. This module grants the tool high flexibility in the analysis of multiple sequence alignments and in comparing sequences with different divergence rates. Here, we describe the eShadow comparative tool and its potential uses for analyzing both multiple nucleotide and protein alignments to predict putative functional elements. The eShadow tool is publicly available at http://eshadow.dcode.org/« less
Business Intelligence: Magnum B.I.
ERIC Educational Resources Information Center
Briggs, Linda L.
2007-01-01
Business intelligence (BI) tools offer schools the ability to look beyond a routine statistic, such as what percentage of students have passed a given test. Through data analysis, schools can view specific scores for a select group of students, for example, and compare that data to other groups, classes, or teachers. That is the kind of…
New Tools for "New" History: Computers and the Teaching of Quantitative Historical Methods.
ERIC Educational Resources Information Center
Burton, Orville Vernon; Finnegan, Terence
1989-01-01
Explains the development of an instructional software package and accompanying workbook which teaches students to apply computerized statistical analysis to historical data, improving the study of social history. Concludes that the use of microcomputers and supercomputers to manipulate historical data enhances critical thinking skills and the use…
Economics: A Discriminant Analysis of Students' Perceptions of Web-Based Learning.
ERIC Educational Resources Information Center
Usip, Ebenge E.; Bee, Richard H.
1998-01-01
Users and nonusers of Web-based instruction (WBI) in an undergraduate statistics classes at Youngstown State University were surveyed. Users concluded that distance learning via the Web was a good method of obtaining general information and useful tool in improving their academic performance. Nonusers thought the university should provide…
Monitoring Urban Quality of Life: The Porto Experience
ERIC Educational Resources Information Center
Santos, Luis Delfim; Martins, Isabel
2007-01-01
This paper describes the monitoring system of the urban quality of life developed by the Porto City Council, a new tool being used to support urban planning and management. The two components of this system--a quantitative approach based on statistical indicators and a qualitative analysis based on the citizens' perceptions of the conditions of…
Overview Of Recent Enhancements To The Bumper-II Meteoroid and Orbital Debris Risk Assessment Tool
NASA Technical Reports Server (NTRS)
Hyde, James L.; Christiansen, Eric L.; Lear, Dana M.; Prior, Thomas G.
2006-01-01
Discussion includes recent enhancements to the BUMPER-II program and input files in support of Shuttle Return to Flight. Improvements to the mesh definitions of the finite element input model will be presented. A BUMPER-II analysis process that was used to estimate statistical uncertainty is introduced.
Teaching meta-analysis using MetaLight.
Thomas, James; Graziosi, Sergio; Higgins, Steve; Coe, Robert; Torgerson, Carole; Newman, Mark
2012-10-18
Meta-analysis is a statistical method for combining the results of primary studies. It is often used in systematic reviews and is increasingly a method and topic that appears in student dissertations. MetaLight is a freely available software application that runs simple meta-analyses and contains specific functionality to facilitate the teaching and learning of meta-analysis. While there are many courses and resources for meta-analysis available and numerous software applications to run meta-analyses, there are few pieces of software which are aimed specifically at helping those teaching and learning meta-analysis. Valuable teaching time can be spent learning the mechanics of a new software application, rather than on the principles and practices of meta-analysis. We discuss ways in which the MetaLight tool can be used to present some of the main issues involved in undertaking and interpreting a meta-analysis. While there are many software tools available for conducting meta-analysis, in the context of a teaching programme such software can require expenditure both in terms of money and in terms of the time it takes to learn how to use it. MetaLight was developed specifically as a tool to facilitate the teaching and learning of meta-analysis and we have presented here some of the ways it might be used in a training situation.
Gupta, Vishal; Pandey, Pulak M
2016-11-01
Thermal necrosis is one of the major problems associated with the bone drilling process in orthopedic/trauma surgical operations. To overcome this problem a new bone drilling method has been introduced recently. Studies have been carried out with rotary ultrasonic drilling (RUD) on pig bones using diamond coated abrasive hollow tools. In the present work, influence of process parameters (rotational speed, feed rate, drill diameter and vibrational amplitude) on change in the temperature was studied using design of experiment technique i.e., response surface methodology (RSM) and data analysis was carried out using analysis of variance (ANOVA). Temperature was recorded and measured by using embedded thermocouple technique at a distance of 0.5mm, 1.0mm, 1.5mm and 2.0mm from the drill site. Statistical model was developed to predict the maximum temperature at the drill tool and bone interface. It was observed that temperature increased with increase in the rotational speed, feed rate and drill diameter and decreased with increase in the vibrational amplitude. Copyright © 2016 IPEM. Published by Elsevier Ltd. All rights reserved.
Methodology to assess clinical liver safety data.
Merz, Michael; Lee, Kwan R; Kullak-Ublick, Gerd A; Brueckner, Andreas; Watkins, Paul B
2014-11-01
Analysis of liver safety data has to be multivariate by nature and needs to take into account time dependency of observations. Current standard tools for liver safety assessment such as summary tables, individual data listings, and narratives address these requirements to a limited extent only. Using graphics in the context of a systematic workflow including predefined graph templates is a valuable addition to standard instruments, helping to ensure completeness of evaluation, and supporting both hypothesis generation and testing. Employing graphical workflows interactively allows analysis in a team-based setting and facilitates identification of the most suitable graphics for publishing and regulatory reporting. Another important tool is statistical outlier detection, accounting for the fact that for assessment of Drug-Induced Liver Injury, identification and thorough evaluation of extreme values has much more relevance than measures of central tendency in the data. Taken together, systematical graphical data exploration and statistical outlier detection may have the potential to significantly improve assessment and interpretation of clinical liver safety data. A workshop was convened to discuss best practices for the assessment of drug-induced liver injury (DILI) in clinical trials.
Development of guidance for states transitioning to new safety analysis tools
NASA Astrophysics Data System (ADS)
Alluri, Priyanka
With about 125 people dying on US roads each day, the US Department of Transportation heightened the awareness of critical safety issues with the passage of SAFETEA-LU (Safe Accountable Flexible Efficient Transportation Equity Act---a Legacy for Users) legislation in 2005. The legislation required each of the states to develop a Strategic Highway Safety Plan (SHSP) and incorporate data-driven approaches to prioritize and evaluate program outcomes: Failure to do so resulted in funding sanctioning. In conjunction with the legislation, research efforts have also been progressing toward the development of new safety analysis tools such as IHSDM (Interactive Highway Safety Design Model), SafetyAnalyst, and HSM (Highway Safety Manual). These software and analysis tools are comparatively more advanced in statistical theory and level of accuracy, and have a tendency to be more data intensive. A review of the 2009 five-percent reports and excerpts from the nationwide survey revealed astonishing facts about the continuing use of traditional methods including crash frequencies and rates for site selection and prioritization. The intense data requirements and statistical complexity of advanced safety tools are considered as a hindrance to their adoption. In this context, this research aims at identifying the data requirements and data availability for SafetyAnalyst and HSM by working with both the tools. This research sets the stage for working with the Empirical Bayes approach by highlighting some of the biases and issues associated with the traditional methods of selecting projects such as greater emphasis on traffic volume and regression-to-mean phenomena. Further, the not-so-obvious issue with shorter segment lengths, which effect the results independent of the methods used, is also discussed. The more reliable and statistically acceptable Empirical Bayes methodology requires safety performance functions (SPFs), regression equations predicting the relation between crashes and exposure for a subset of roadway network. These SPFs, specific to a region and the analysis period are often unavailable. Calibration of already existing default national SPFs to the state's data could be a feasible solution, but, how well the state's data is represented is a legitimate question. With this background, SPFs were generated for various classifications of segments in Georgia and compared against the national default SPFs used in SafetyAnalyst calibrated to Georgia data. Dwelling deeper into the development of SPFs, the influence of actual and estimated traffic data on the fit of the equations is also studied questioning the accuracy and reliability of traffic estimations. In addition to SafetyAnalyst, HSM aims at performing quantitative safety analysis. Applying HSM methodology to two-way two-lane rural roads, the effect of using multiple CMFs (Crash Modification Factors) is studied. Lastly, data requirements, methodology, constraints, and results are compared between SafetyAnalyst and HSM.
Statistical analysis of effective singular values in matrix rank determination
NASA Technical Reports Server (NTRS)
Konstantinides, Konstantinos; Yao, Kung
1988-01-01
A major problem in using SVD (singular-value decomposition) as a tool in determining the effective rank of a perturbed matrix is that of distinguishing between significantly small and significantly large singular values to the end, conference regions are derived for the perturbed singular values of matrices with noisy observation data. The analysis is based on the theories of perturbations of singular values and statistical significance test. Threshold bounds for perturbation due to finite-precision and i.i.d. random models are evaluated. In random models, the threshold bounds depend on the dimension of the matrix, the noisy variance, and predefined statistical level of significance. Results applied to the problem of determining the effective order of a linear autoregressive system from the approximate rank of a sample autocorrelation matrix are considered. Various numerical examples illustrating the usefulness of these bounds and comparisons to other previously known approaches are given.
CalFitter: a web server for analysis of protein thermal denaturation data.
Mazurenko, Stanislav; Stourac, Jan; Kunka, Antonin; Nedeljkovic, Sava; Bednar, David; Prokop, Zbynek; Damborsky, Jiri
2018-05-14
Despite significant advances in the understanding of protein structure-function relationships, revealing protein folding pathways still poses a challenge due to a limited number of relevant experimental tools. Widely-used experimental techniques, such as calorimetry or spectroscopy, critically depend on a proper data analysis. Currently, there are only separate data analysis tools available for each type of experiment with a limited model selection. To address this problem, we have developed the CalFitter web server to be a unified platform for comprehensive data fitting and analysis of protein thermal denaturation data. The server allows simultaneous global data fitting using any combination of input data types and offers 12 protein unfolding pathway models for selection, including irreversible transitions often missing from other tools. The data fitting produces optimal parameter values, their confidence intervals, and statistical information to define unfolding pathways. The server provides an interactive and easy-to-use interface that allows users to directly analyse input datasets and simulate modelled output based on the model parameters. CalFitter web server is available free at https://loschmidt.chemi.muni.cz/calfitter/.
Karasik, Avshalom; Rahimi, Oshrit; David, Michal; Weiss, Ehud; Drori, Elyashiv
2018-04-25
Grapevine (Vitis vinifera L.) is one of the classical fruits of the Old World. Among the thousands of domesticated grapevine varieties and variable wild sylvestris populations, the range of variation in pip morphology is very wide. In this study we scanned representative samples of grape pip populations, in an attempt to probe the possibility of using the 3D tool for grape variety identification. The scanning was followed by mathematical and statistical analysis using innovative algorithms from the field of computer sciences. Using selected Fourier coefficients, a very clear separation was obtained between most of the varieties, with only very few overlaps. These results show that this method enables the separation between different Vitis vinifera varieties. Interestingly, when using the 3D approach to analyze couples of varieties, considered synonyms by the standard 22 SSR analysis approach, we found that the varieties in two of the considered synonym couples were clearly separated by the morphological analysis. This work, therefore, suggests a new systematic tool for high resolution variety discrimination.
Hourdel, Véronique; Volant, Stevenn; O'Brien, Darragh P; Chenal, Alexandre; Chamot-Rooke, Julia; Dillies, Marie-Agnès; Brier, Sébastien
2016-11-15
With the continued improvement of requisite mass spectrometers and UHPLC systems, Hydrogen/Deuterium eXchange Mass Spectrometry (HDX-MS) workflows are rapidly evolving towards the investigation of more challenging biological systems, including large protein complexes and membrane proteins. The analysis of such extensive systems results in very large HDX-MS datasets for which specific analysis tools are required to speed up data validation and interpretation. We introduce a web application and a new R-package named 'MEMHDX' to help users analyze, validate and visualize large HDX-MS datasets. MEMHDX is composed of two elements. A statistical tool aids in the validation of the results by applying a mixed-effects model for each peptide, in each experimental condition, and at each time point, taking into account the time dependency of the HDX reaction and number of independent replicates. Two adjusted P-values are generated per peptide, one for the 'Change in dynamics' and one for the 'Magnitude of ΔD', and are used to classify the data by means of a 'Logit' representation. A user-friendly interface developed with Shiny by RStudio facilitates the use of the package. This interactive tool allows the user to easily and rapidly validate, visualize and compare the relative deuterium incorporation on the amino acid sequence and 3D structure, providing both spatial and temporal information. MEMHDX is freely available as a web tool at the project home page http://memhdx.c3bi.pasteur.fr CONTACT: marie-agnes.dillies@pasteur.fr or sebastien.brier@pasteur.frSupplementary information: Supplementary data is available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.
2017-08-01
This large repository of climate model results for North America (Wang and Kotamarthi 2013, 2014, 2015) is stored in Network Common Data Form (NetCDF...Network Common Data Form (NetCDF). UCAR/Unidata Program Center, Boulder, CO. Available at: http://www.unidata.ucar.edu/software/netcdf. Accessed on 6/20...emissions diverge from each other regarding fossil fuel use, technology, and other socioeconomic factors. As a result, the estimated emissions for each of
NASA Astrophysics Data System (ADS)
Jain, A.
2017-08-01
Computer based method can help in discovery of leads and can potentially eliminate chemical synthesis and screening of many irrelevant compounds, and in this way, it save time as well as cost. Molecular modeling systems are powerful tools for building, visualizing, analyzing and storing models of complex molecular structure that can help to interpretate structure activity relationship. The use of various techniques of molecular mechanics and dynamics and software in Computer aided drug design along with statistics analysis is powerful tool for the medicinal chemistry to synthesis therapeutic and effective drugs with minimum side effect.
An R package for analyzing and modeling ranking data
2013-01-01
Background In medical informatics, psychology, market research and many other fields, researchers often need to analyze and model ranking data. However, there is no statistical software that provides tools for the comprehensive analysis of ranking data. Here, we present pmr, an R package for analyzing and modeling ranking data with a bundle of tools. The pmr package enables descriptive statistics (mean rank, pairwise frequencies, and marginal matrix), Analytic Hierarchy Process models (with Saaty’s and Koczkodaj’s inconsistencies), probability models (Luce model, distance-based model, and rank-ordered logit model), and the visualization of ranking data with multidimensional preference analysis. Results Examples of the use of package pmr are given using a real ranking dataset from medical informatics, in which 566 Hong Kong physicians ranked the top five incentives (1: competitive pressures; 2: increased savings; 3: government regulation; 4: improved efficiency; 5: improved quality care; 6: patient demand; 7: financial incentives) to the computerization of clinical practice. The mean rank showed that item 4 is the most preferred item and item 3 is the least preferred item, and significance difference was found between physicians’ preferences with respect to their monthly income. A multidimensional preference analysis identified two dimensions that explain 42% of the total variance. The first can be interpreted as the overall preference of the seven items (labeled as “internal/external”), and the second dimension can be interpreted as their overall variance of (labeled as “push/pull factors”). Various statistical models were fitted, and the best were found to be weighted distance-based models with Spearman’s footrule distance. Conclusions In this paper, we presented the R package pmr, the first package for analyzing and modeling ranking data. The package provides insight to users through descriptive statistics of ranking data. Users can also visualize ranking data by applying a thought multidimensional preference analysis. Various probability models for ranking data are also included, allowing users to choose that which is most suitable to their specific situations. PMID:23672645
An R package for analyzing and modeling ranking data.
Lee, Paul H; Yu, Philip L H
2013-05-14
In medical informatics, psychology, market research and many other fields, researchers often need to analyze and model ranking data. However, there is no statistical software that provides tools for the comprehensive analysis of ranking data. Here, we present pmr, an R package for analyzing and modeling ranking data with a bundle of tools. The pmr package enables descriptive statistics (mean rank, pairwise frequencies, and marginal matrix), Analytic Hierarchy Process models (with Saaty's and Koczkodaj's inconsistencies), probability models (Luce model, distance-based model, and rank-ordered logit model), and the visualization of ranking data with multidimensional preference analysis. Examples of the use of package pmr are given using a real ranking dataset from medical informatics, in which 566 Hong Kong physicians ranked the top five incentives (1: competitive pressures; 2: increased savings; 3: government regulation; 4: improved efficiency; 5: improved quality care; 6: patient demand; 7: financial incentives) to the computerization of clinical practice. The mean rank showed that item 4 is the most preferred item and item 3 is the least preferred item, and significance difference was found between physicians' preferences with respect to their monthly income. A multidimensional preference analysis identified two dimensions that explain 42% of the total variance. The first can be interpreted as the overall preference of the seven items (labeled as "internal/external"), and the second dimension can be interpreted as their overall variance of (labeled as "push/pull factors"). Various statistical models were fitted, and the best were found to be weighted distance-based models with Spearman's footrule distance. In this paper, we presented the R package pmr, the first package for analyzing and modeling ranking data. The package provides insight to users through descriptive statistics of ranking data. Users can also visualize ranking data by applying a thought multidimensional preference analysis. Various probability models for ranking data are also included, allowing users to choose that which is most suitable to their specific situations.
The Impact of Measurement Noise in GPA Diagnostic Analysis of a Gas Turbine Engine
NASA Astrophysics Data System (ADS)
Ntantis, Efstratios L.; Li, Y. G.
2013-12-01
The performance diagnostic analysis of a gas turbine is accomplished by estimating a set of internal engine health parameters from available sensor measurements. No physical measuring instruments however can ever completely eliminate the presence of measurement uncertainties. Sensor measurements are often distorted by noise and bias leading to inaccurate estimation results. This paper explores the impact of measurement noise on Gas Turbine GPA analysis. The analysis is demonstrated with a test case where gas turbine performance simulation and diagnostics code TURBOMATCH is used to build a performance model of a model engine similar to Rolls-Royce Trent 500 turbofan engine, and carry out the diagnostic analysis with the presence of different levels of measurement noise. Conclusively, to improve the reliability of the diagnostic results, a statistical analysis of the data scattering caused by sensor uncertainties is made. The diagnostic tool used to deal with the statistical analysis of measurement noise impact is a model-based method utilizing a non-linear GPA.
Blatti, Charles; Sinha, Saurabh
2014-07-01
The Motif Enrichment Tool (MET) provides an online interface that enables users to find major transcriptional regulators of their gene sets of interest. MET searches the appropriate regulatory region around each gene and identifies which transcription factor DNA-binding specificities (motifs) are statistically overrepresented. Motif enrichment analysis is currently available for many metazoan species including human, mouse, fruit fly, planaria and flowering plants. MET also leverages high-throughput experimental data such as ChIP-seq and DNase-seq from ENCODE and ModENCODE to identify the regulatory targets of a transcription factor with greater precision. The results from MET are produced in real time and are linked to a genome browser for easy follow-up analysis. Use of the web tool is free and open to all, and there is no login requirement. ADDRESS: http://veda.cs.uiuc.edu/MET/. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.
Sensitivity analysis of navy aviation readiness based sparing model
2017-09-01
variability. (See Figure 4.) Figure 4. Research design flowchart 18 Figure 4 lays out the four steps of the methodology , starting in the upper left-hand...as a function of changes in key inputs. We develop NAVARM Experimental Designs (NED), a computational tool created by applying a state-of-the-art...experimental design to the NAVARM model. Statistical analysis of the resulting data identifies the most influential cost factors. Those are, in order of
A brief introduction to probability.
Di Paola, Gioacchino; Bertani, Alessandro; De Monte, Lavinia; Tuzzolino, Fabio
2018-02-01
The theory of probability has been debated for centuries: back in 1600, French mathematics used the rules of probability to place and win bets. Subsequently, the knowledge of probability has significantly evolved and is now an essential tool for statistics. In this paper, the basic theoretical principles of probability will be reviewed, with the aim of facilitating the comprehension of statistical inference. After a brief general introduction on probability, we will review the concept of the "probability distribution" that is a function providing the probabilities of occurrence of different possible outcomes of a categorical or continuous variable. Specific attention will be focused on normal distribution that is the most relevant distribution applied to statistical analysis.
NASA Astrophysics Data System (ADS)
Medyńska-Gulij, Beata; Cybulski, Paweł
2016-06-01
This paper analyses the use of table visual variables of statistical data of hospital beds as an important tool for revealing spatio-temporal dependencies. It is argued that some of conclusions from the data about public health and public expenditure on health have a spatio-temporal reference. Different from previous studies, this article adopts combination of cartographic pragmatics and spatial visualization with previous conclusions made in public health literature. While the significant conclusions about health care and economic factors has been highlighted in research papers, this article is the first to apply visual analysis to statistical table together with maps which is called previsualisation.
NASA Astrophysics Data System (ADS)
Reuter, Matthew; Tschudi, Stephen
When investigating the electrical response properties of molecules, experiments often measure conductance whereas computation predicts transmission probabilities. Although the Landauer-Büttiker theory relates the two in the limit of coherent scattering through the molecule, a direct comparison between experiment and computation can still be difficult. Experimental data (specifically that from break junctions) is statistical and computational results are deterministic. Many studies compare the most probable experimental conductance with computation, but such an analysis discards almost all of the experimental statistics. In this work we develop tools to decipher the Landauer-Büttiker transmission function directly from experimental statistics and then apply them to enable a fairer comparison between experimental and computational results.
Statistical learning and selective inference.
Taylor, Jonathan; Tibshirani, Robert J
2015-06-23
We describe the problem of "selective inference." This addresses the following challenge: Having mined a set of data to find potential associations, how do we properly assess the strength of these associations? The fact that we have "cherry-picked"--searched for the strongest associations--means that we must set a higher bar for declaring significant the associations that we see. This challenge becomes more important in the era of big data and complex statistical modeling. The cherry tree (dataset) can be very large and the tools for cherry picking (statistical learning methods) are now very sophisticated. We describe some recent new developments in selective inference and illustrate their use in forward stepwise regression, the lasso, and principal components analysis.
Pedagogical monitoring as a tool to reduce dropout in distance learning in family health.
de Castro E Lima Baesse, Deborah; Grisolia, Alexandra Monteiro; de Oliveira, Ana Emilia Figueiredo
2016-08-22
This paper presents the results of a study of the Monsys monitoring system, an educational support tool designed to prevent and control the dropout rate in a distance learning course in family health. Developed by UNA-SUS/UFMA, Monsys was created to enable data mining in the virtual learning environment known as Moodle. This is an exploratory study using documentary and bibliographic research and analysis of the Monsys database. Two classes (2010 and 2011) were selected as research subjects, one with Monsys intervention and the other without. The samples were matched (using a ration of 1:1) by gender, age, marital status, graduation year, previous graduation status, location and profession. Statistical analysis was performed using the chi-square test and a multivariate logistic regression model with a 5 % significance level. The findings show that the dropout rate in the class in which Monsys was not employed (2010) was 43.2 %. However, the dropout rate in the class of 2011, in which the tool was employed as a pedagogical team aid, was 30.6 %. After statistical adjustment, the Monsys monitoring system remained in correlation with the course completion variable (adjusted OR = 1.74, IC95% = 1.17-2.59; p = 0.005), suggesting that the use of the Monsys tool, isolated to the adjusted variables, can enhance the likelihood that students will complete the course. Using the chi-square test, a profile analysis of students revealed a higher completion rate among women (67.7 %) than men (52.2 %). Analysis of age demonstrated that students between 40 and 49 years dropped out the least (32.1 %) and, with regard to professional training, nurses have the lowest dropout rates (36.3 %). The use of Monsys significantly reduced the dropout, with results showing greater association between the variables denoting presence of the monitoring system and female gender.
Weniger, Markus; Engelmann, Julia C; Schultz, Jörg
2007-01-01
Background Regulation of gene expression is relevant to many areas of biology and medicine, in the study of treatments, diseases, and developmental stages. Microarrays can be used to measure the expression level of thousands of mRNAs at the same time, allowing insight into or comparison of different cellular conditions. The data derived out of microarray experiments is highly dimensional and often noisy, and interpretation of the results can get intricate. Although programs for the statistical analysis of microarray data exist, most of them lack an integration of analysis results and biological interpretation. Results We have developed GEPAT, Genome Expression Pathway Analysis Tool, offering an analysis of gene expression data under genomic, proteomic and metabolic context. We provide an integration of statistical methods for data import and data analysis together with a biological interpretation for subsets of probes or single probes on the chip. GEPAT imports various types of oligonucleotide and cDNA array data formats. Different normalization methods can be applied to the data, afterwards data annotation is performed. After import, GEPAT offers various statistical data analysis methods, as hierarchical, k-means and PCA clustering, a linear model based t-test or chromosomal profile comparison. The results of the analysis can be interpreted by enrichment of biological terms, pathway analysis or interaction networks. Different biological databases are included, to give various information for each probe on the chip. GEPAT offers no linear work flow, but allows the usage of any subset of probes and samples as a start for a new data analysis. GEPAT relies on established data analysis packages, offers a modular approach for an easy extension, and can be run on a computer grid to allow a large number of users. It is freely available under the LGPL open source license for academic and commercial users at . Conclusion GEPAT is a modular, scalable and professional-grade software integrating analysis and interpretation of microarray gene expression data. An installation available for academic users can be found at . PMID:17543125
Hollunder, Jens; Friedel, Maik; Kuiper, Martin; Wilhelm, Thomas
2010-04-01
Many large 'omics' datasets have been published and many more are expected in the near future. New analysis methods are needed for best exploitation. We have developed a graphical user interface (GUI) for easy data analysis. Our discovery of all significant substructures (DASS) approach elucidates the underlying modularity, a typical feature of complex biological data. It is related to biclustering and other data mining approaches. Importantly, DASS-GUI also allows handling of multi-sets and calculation of statistical significances. DASS-GUI contains tools for further analysis of the identified patterns: analysis of the pattern hierarchy, enrichment analysis, module validation, analysis of additional numerical data, easy handling of synonymous names, clustering, filtering and merging. Different export options allow easy usage of additional tools such as Cytoscape. Source code, pre-compiled binaries for different systems, a comprehensive tutorial, case studies and many additional datasets are freely available at http://www.ifr.ac.uk/dass/gui/. DASS-GUI is implemented in Qt.
SLIVISU, an Interactive Visualisation Framework for Analysis of Geological Sea-Level Indicators
NASA Astrophysics Data System (ADS)
Klemann, V.; Schulte, S.; Unger, A.; Dransch, D.
2011-12-01
Flanking data analysis in earth system sciences by advanced visualisation tools is a striking feature due to rising complexity, amount and variety of available data. With respect to sea-level indicators (SLIs), their analysis in earth-system applications, such as modelling and simulation on regional or global scales, demands the consideration of large amounts of data - we talk about thousands of SLIs - and, so, to go ahead of analysing single sea-level curves. On the other hand, a gross analysis by means of statistical methods is hindered by the often heterogeneous and individual character of the single SLIs, i.e., the spatio-temporal context and often heterogenous information is difficult to handle or to represent in an objective way. Therefore a concept of integrating automated analysis and visualisation is mandatory. This is provided by visual analytics. As an implementation of this concept, we present the visualisation framework SLIVISU, developed at GFZ, which bases on multiple linked views and provides a synoptic analysis of observational data, model configurations, model outputs and results of automated analysis in glacial isostatic adjustment. Starting as a visualisation tool for an existing database of SLIs, it now serves as an analysis tool for the evaluation of model simulations in studies of glacial-isostatic adjustment.
Schmidt, Paul; Schmid, Volker J; Gaser, Christian; Buck, Dorothea; Bührlen, Susanne; Förschler, Annette; Mühlau, Mark
2013-01-01
Aiming at iron-related T2-hypointensity, which is related to normal aging and neurodegenerative processes, we here present two practicable approaches, based on Bayesian inference, for preprocessing and statistical analysis of a complex set of structural MRI data. In particular, Markov Chain Monte Carlo methods were used to simulate posterior distributions. First, we rendered a segmentation algorithm that uses outlier detection based on model checking techniques within a Bayesian mixture model. Second, we rendered an analytical tool comprising a Bayesian regression model with smoothness priors (in the form of Gaussian Markov random fields) mitigating the necessity to smooth data prior to statistical analysis. For validation, we used simulated data and MRI data of 27 healthy controls (age: [Formula: see text]; range, [Formula: see text]). We first observed robust segmentation of both simulated T2-hypointensities and gray-matter regions known to be T2-hypointense. Second, simulated data and images of segmented T2-hypointensity were analyzed. We found not only robust identification of simulated effects but also a biologically plausible age-related increase of T2-hypointensity primarily within the dentate nucleus but also within the globus pallidus, substantia nigra, and red nucleus. Our results indicate that fully Bayesian inference can successfully be applied for preprocessing and statistical analysis of structural MRI data.
Haystack, a web-based tool for metabolomics research
2014-01-01
Background Liquid chromatography coupled to mass spectrometry (LCMS) has become a widely used technique in metabolomics research for differential profiling, the broad screening of biomolecular constituents across multiple samples to diagnose phenotypic differences and elucidate relevant features. However, a significant limitation in LCMS-based metabolomics is the high-throughput data processing required for robust statistical analysis and data modeling for large numbers of samples with hundreds of unique chemical species. Results To address this problem, we developed Haystack, a web-based tool designed to visualize, parse, filter, and extract significant features from LCMS datasets rapidly and efficiently. Haystack runs in a browser environment with an intuitive graphical user interface that provides both display and data processing options. Total ion chromatograms (TICs) and base peak chromatograms (BPCs) are automatically displayed, along with time-resolved mass spectra and extracted ion chromatograms (EICs) over any mass range. Output files in the common .csv format can be saved for further statistical analysis or customized graphing. Haystack's core function is a flexible binning procedure that converts the mass dimension of the chromatogram into a set of interval variables that can uniquely identify a sample. Binned mass data can be analyzed by exploratory methods such as principal component analysis (PCA) to model class assignment and identify discriminatory features. The validity of this approach is demonstrated by comparison of a dataset from plants grown at two light conditions with manual and automated peak detection methods. Haystack successfully predicted class assignment based on PCA and cluster analysis, and identified discriminatory features based on analysis of EICs of significant bins. Conclusion Haystack, a new online tool for rapid processing and analysis of LCMS-based metabolomics data is described. It offers users a range of data visualization options and supports non-biased differential profiling studies through a unique and flexible binning function that provides an alternative to conventional peak deconvolution analysis methods. PMID:25350247
Haystack, a web-based tool for metabolomics research.
Grace, Stephen C; Embry, Stephen; Luo, Heng
2014-01-01
Liquid chromatography coupled to mass spectrometry (LCMS) has become a widely used technique in metabolomics research for differential profiling, the broad screening of biomolecular constituents across multiple samples to diagnose phenotypic differences and elucidate relevant features. However, a significant limitation in LCMS-based metabolomics is the high-throughput data processing required for robust statistical analysis and data modeling for large numbers of samples with hundreds of unique chemical species. To address this problem, we developed Haystack, a web-based tool designed to visualize, parse, filter, and extract significant features from LCMS datasets rapidly and efficiently. Haystack runs in a browser environment with an intuitive graphical user interface that provides both display and data processing options. Total ion chromatograms (TICs) and base peak chromatograms (BPCs) are automatically displayed, along with time-resolved mass spectra and extracted ion chromatograms (EICs) over any mass range. Output files in the common .csv format can be saved for further statistical analysis or customized graphing. Haystack's core function is a flexible binning procedure that converts the mass dimension of the chromatogram into a set of interval variables that can uniquely identify a sample. Binned mass data can be analyzed by exploratory methods such as principal component analysis (PCA) to model class assignment and identify discriminatory features. The validity of this approach is demonstrated by comparison of a dataset from plants grown at two light conditions with manual and automated peak detection methods. Haystack successfully predicted class assignment based on PCA and cluster analysis, and identified discriminatory features based on analysis of EICs of significant bins. Haystack, a new online tool for rapid processing and analysis of LCMS-based metabolomics data is described. It offers users a range of data visualization options and supports non-biased differential profiling studies through a unique and flexible binning function that provides an alternative to conventional peak deconvolution analysis methods.
Heterogeneous Structure of Stem Cells Dynamics: Statistical Models and Quantitative Predictions
Bogdan, Paul; Deasy, Bridget M.; Gharaibeh, Burhan; Roehrs, Timo; Marculescu, Radu
2014-01-01
Understanding stem cell (SC) population dynamics is essential for developing models that can be used in basic science and medicine, to aid in predicting cells fate. These models can be used as tools e.g. in studying patho-physiological events at the cellular and tissue level, predicting (mal)functions along the developmental course, and personalized regenerative medicine. Using time-lapsed imaging and statistical tools, we show that the dynamics of SC populations involve a heterogeneous structure consisting of multiple sub-population behaviors. Using non-Gaussian statistical approaches, we identify the co-existence of fast and slow dividing subpopulations, and quiescent cells, in stem cells from three species. The mathematical analysis also shows that, instead of developing independently, SCs exhibit a time-dependent fractal behavior as they interact with each other through molecular and tactile signals. These findings suggest that more sophisticated models of SC dynamics should view SC populations as a collective and avoid the simplifying homogeneity assumption by accounting for the presence of more than one dividing sub-population, and their multi-fractal characteristics. PMID:24769917
Systematic review of methods for quantifying teamwork in the operating theatre
Marshall, D.; Sykes, M.; McCulloch, P.; Shalhoub, J.; Maruthappu, M.
2018-01-01
Background Teamwork in the operating theatre is becoming increasingly recognized as a major factor in clinical outcomes. Many tools have been developed to measure teamwork. Most fall into two categories: self‐assessment by theatre staff and assessment by observers. A critical and comparative analysis of the validity and reliability of these tools is lacking. Methods MEDLINE and Embase databases were searched following PRISMA guidelines. Content validity was assessed using measurements of inter‐rater agreement, predictive validity and multisite reliability, and interobserver reliability using statistical measures of inter‐rater agreement and reliability. Quantitative meta‐analysis was deemed unsuitable. Results Forty‐eight articles were selected for final inclusion; self‐assessment tools were used in 18 and observational tools in 28, and there were two qualitative studies. Self‐assessment of teamwork by profession varied with the profession of the assessor. The most robust self‐assessment tool was the Safety Attitudes Questionnaire (SAQ), although this failed to demonstrate multisite reliability. The most robust observational tool was the Non‐Technical Skills (NOTECHS) system, which demonstrated both test–retest reliability (P > 0·09) and interobserver reliability (Rwg = 0·96). Conclusion Self‐assessment of teamwork by the theatre team was influenced by professional differences. Observational tools, when used by trained observers, circumvented this.
Joshi, Amit; Tandon, Nidhi; Patil, Vijay M; Noronha, Vanita; Gupta, Sudeep; Bhattacharjee, Atanu; Prabhash, Kumar
2017-01-01
Comprehensive geriatric assessment (CGA) in routine practice is not logistically feasible. Short geriatric screening tools are available for selecting patients for CGA. However none of them is validated in India. In this analysis we aim to compare the level of agreement between three commonly used short screening tools (Flemish version of TRST (fTRST), G8 and VES-13. Patients ≥65 years with a solid tumor malignancy undergoing cancer directed treatment were interviewed between March 2013 to July 2014. Geriatric screening with G8, fTRST and VES-13 tools was performed in these patients. G8 score ≤14, fTRST score ≥1 and VES-13 score ≥3 were taken as indicators for the presence of a high risk geriatric profile respectively. R version 3.1.2 was used for analysis. Cohen kappa agreement statistics was used to compare the agreement between the 3 tools. p value of 0.05 was taken as significant. The kappa statistics value for agreement between G8 score and fTRST, between VES-13 and fTRST and between VES-13 and G8 were 0.12 (P = 0.04), 0.16 (P = 0.07) and 0.05 (P = 0.45) respectively. It was found that maximum agreement was observed for VES-13 and fTRST. The agreement value of VES-13 and fTRST observed was 59.44 %(39.63% for high risk profile and 19.81% for low risk profile). The agreement value of G-8 and fTRST was 39.62% (2.83% only for high risk profile and 36.79% for low risk profile). The lowest agreement was between G8 and VES-13, 35.84% (7.54% for high risk detection and 28.30% for low risk detection). There was poor agreement (in view of kappa value been below 0.2) between the 3 short geriatric screening tools. Research needs to be directed to compare the agreement level between these 3 scales and CGA, so that the appropriate short screening tool can be selected for routine use.
Haghani, Fariba; Hatef Khorami, Mohammad; Fakhari, Mohammad
2016-07-01
Feedback cards are recommended as a feasible tool for structured written feedback delivery in clinical education while effectiveness of this tool on the medical students' performance is still questionable. The purpose of this study was to compare the effects of structured written feedback by cards as well as verbal feedback versus verbal feedback alone on the clinical performance of medical students at the Mini Clinical Evaluation Exercise (Mini-CEX) test in an outpatient clinic. This is a quasi-experimental study with pre- and post-test comprising four groups in two terms of medical students' externship. The students' performance was assessed through the Mini-Clinical Evaluation Exercise (Mini-CEX) as a clinical performance evaluation tool. Structured written feedbacks were given to two experimental groups by designed feedback cards as well as verbal feedback, while in the two control groups feedback was delivered verbally as a routine approach in clinical education. By consecutive sampling method, 62 externship students were enrolled in this study and seven students were excluded from the final analysis due to their absence for three days. According to the ANOVA analysis and Post Hoc Tukey test, no statistically significant difference was observed among the four groups at the pre-test, whereas a statistically significant difference was observed between the experimental and control groups at the post-test (F = 4.023, p =0.012). The effect size of the structured written feedbacks on clinical performance was 0.19. Structured written feedback by cards could improve the performance of medical students in a statistical sense. Further studies must be conducted in other clinical courses with longer durations.
An Interactive Tool For Semi-automated Statistical Prediction Using Earth Observations and Models
NASA Astrophysics Data System (ADS)
Zaitchik, B. F.; Berhane, F.; Tadesse, T.
2015-12-01
We developed a semi-automated statistical prediction tool applicable to concurrent analysis or seasonal prediction of any time series variable in any geographic location. The tool was developed using Shiny, JavaScript, HTML and CSS. A user can extract a predictand by drawing a polygon over a region of interest on the provided user interface (global map). The user can select the Climatic Research Unit (CRU) precipitation or Climate Hazards Group InfraRed Precipitation with Station data (CHIRPS) as predictand. They can also upload their own predictand time series. Predictors can be extracted from sea surface temperature, sea level pressure, winds at different pressure levels, air temperature at various pressure levels, and geopotential height at different pressure levels. By default, reanalysis fields are applied as predictors, but the user can also upload their own predictors, including a wide range of compatible satellite-derived datasets. The package generates correlations of the variables selected with the predictand. The user also has the option to generate composites of the variables based on the predictand. Next, the user can extract predictors by drawing polygons over the regions that show strong correlations (composites). Then, the user can select some or all of the statistical prediction models provided. Provided models include Linear Regression models (GLM, SGLM), Tree-based models (bagging, random forest, boosting), Artificial Neural Network, and other non-linear models such as Generalized Additive Model (GAM) and Multivariate Adaptive Regression Splines (MARS). Finally, the user can download the analysis steps they used, such as the region they selected, the time period they specified, the predictand and predictors they chose and preprocessing options they used, and the model results in PDF or HTML format. Key words: Semi-automated prediction, Shiny, R, GLM, ANN, RF, GAM, MARS
NASA Astrophysics Data System (ADS)
Kraljić, K.; Strüngmann, L.; Fimmel, E.; Gumbel, M.
2018-01-01
The genetic code is degenerated and it is assumed that redundancy provides error detection and correction mechanisms in the translation process. However, the biological meaning of the code's structure is still under current research. This paper presents a Genetic Code Analysis Toolkit (GCAT) which provides workflows and algorithms for the analysis of the structure of nucleotide sequences. In particular, sets or sequences of codons can be transformed and tested for circularity, comma-freeness, dichotomic partitions and others. GCAT comes with a fertile editor custom-built to work with the genetic code and a batch mode for multi-sequence processing. With the ability to read FASTA files or load sequences from GenBank, the tool can be used for the mathematical and statistical analysis of existing sequence data. GCAT is Java-based and provides a plug-in concept for extensibility. Availability: Open source Homepage:http://www.gcat.bio/
Gemovic, Branislava; Perovic, Vladimir; Glisic, Sanja; Veljkovic, Nevena
2013-01-01
There are more than 500 amino acid substitutions in each human genome, and bioinformatics tools irreplaceably contribute to determination of their functional effects. We have developed feature-based algorithm for the detection of mutations outside conserved functional domains (CFDs) and compared its classification efficacy with the most commonly used phylogeny-based tools, PolyPhen-2 and SIFT. The new algorithm is based on the informational spectrum method (ISM), a feature-based technique, and statistical analysis. Our dataset contained neutral polymorphisms and mutations associated with myeloid malignancies from epigenetic regulators ASXL1, DNMT3A, EZH2, and TET2. PolyPhen-2 and SIFT had significantly lower accuracies in predicting the effects of amino acid substitutions outside CFDs than expected, with especially low sensitivity. On the other hand, only ISM algorithm showed statistically significant classification of these sequences. It outperformed PolyPhen-2 and SIFT by 15% and 13%, respectively. These results suggest that feature-based methods, like ISM, are more suitable for the classification of amino acid substitutions outside CFDs than phylogeny-based tools.
Basic principles of stability.
Egan, William; Schofield, Timothy
2009-11-01
An understanding of the principles of degradation, as well as the statistical tools for measuring product stability, is essential to management of product quality. Key to this is management of vaccine potency. Vaccine shelf life is best managed through determination of a minimum potency release requirement, which helps assure adequate potency throughout expiry. Use of statistical tools such a least squares regression analysis should be employed to model potency decay. The use of such tools provides incentive to properly design vaccine stability studies, while holding stability measurements to specification presents a disincentive for collecting valuable data. The laws of kinetics such as Arrhenius behavior help practitioners design effective accelerated stability programs, which can be utilized to manage stability after a process change. Design of stability studies should be carefully considered, with an eye to minimizing the variability of the stability parameter. In the case of measuring the degradation rate, testing at the beginning and the end of the study improves the precision of this estimate. Additional design considerations such as bracketing and matrixing improve the efficiency of stability evaluation of vaccines.
Generating community-built tools for data sharing and analysis in environmental networks
Read, Jordan S.; Gries, Corinna; Read, Emily K.; Klug, Jennifer; Hanson, Paul C.; Hipsey, Matthew R.; Jennings, Eleanor; O'Reilley, Catherine; Winslow, Luke A.; Pierson, Don; McBride, Christopher G.; Hamilton, David
2016-01-01
Rapid data growth in many environmental sectors has necessitated tools to manage and analyze these data. The development of tools often lags behind the proliferation of data, however, which may slow exploratory opportunities and scientific progress. The Global Lake Ecological Observatory Network (GLEON) collaborative model supports an efficient and comprehensive data–analysis–insight life cycle, including implementations of data quality control checks, statistical calculations/derivations, models, and data visualizations. These tools are community-built and openly shared. We discuss the network structure that enables tool development and a culture of sharing, leading to optimized output from limited resources. Specifically, data sharing and a flat collaborative structure encourage the development of tools that enable scientific insights from these data. Here we provide a cross-section of scientific advances derived from global-scale analyses in GLEON. We document enhancements to science capabilities made possible by the development of analytical tools and highlight opportunities to expand this framework to benefit other environmental networks.
Quantifying Traces of Tool Use: A Novel Morphometric Analysis of Damage Patterns on Percussive Tools
Caruana, Matthew V.; Carvalho, Susana; Braun, David R.; Presnyakova, Darya; Haslam, Michael; Archer, Will; Bobe, Rene; Harris, John W. K.
2014-01-01
Percussive technology continues to play an increasingly important role in understanding the evolution of tool use. Comparing the archaeological record with extractive foraging behaviors in nonhuman primates has focused on percussive implements as a key to investigating the origins of lithic technology. Despite this, archaeological approaches towards percussive tools have been obscured by a lack of standardized methodologies. Central to this issue have been the use of qualitative, non-diagnostic techniques to identify percussive tools from archaeological contexts. Here we describe a new morphometric method for distinguishing anthropogenically-generated damage patterns on percussive tools from naturally damaged river cobbles. We employ a geomatic approach through the use of three-dimensional scanning and geographical information systems software to statistically quantify the identification process in percussive technology research. This will strengthen current technological analyses of percussive tools in archaeological frameworks and open new avenues for translating behavioral inferences of early hominins from percussive damage patterns. PMID:25415303
Liau, Siow Yen; Mohamed Izham, M I; Hassali, M A; Shafie, A A
2010-01-01
Cardiovascular diseases, the main causes of hospitalisations and death globally, have put an enormous economic burden on the healthcare system. Several risk factors are associated with the occurrence of cardiovascular events. At the heart of efficient prevention of cardiovascular disease is the concept of risk assessment. This paper aims to review the available cardiovascular risk-assessment tools and its applicability in predicting cardiovascular risk among Asian populations. A systematic search was performed using keywords as MeSH and Boolean terms. A total of 25 risk-assessment tools were identified. Of these, only two risk-assessment tools (8%) were derived from an Asian population. These risk-assessment tools differ in various ways, including characteristics of the derivation sample, type of study, time frame of follow-up, end points, statistical analysis and risk factors included. Very few cardiovascular risk-assessment tools were developed in Asian populations. In order to accurately predict the cardiovascular risk of our population, there is a need to develop a risk-assessment tool based on local epidemiological data.
2013-01-01
Background Assessing the risk of bias of randomized controlled trials (RCTs) is crucial to understand how biases affect treatment effect estimates. A number of tools have been developed to evaluate risk of bias of RCTs; however, it is unknown how these tools compare to each other in the items included. The main objective of this study was to describe which individual items are included in RCT quality tools used in general health and physical therapy (PT) research, and how these items compare to those of the Cochrane Risk of Bias (RoB) tool. Methods We used comprehensive literature searches and a systematic approach to identify tools that evaluated the methodological quality or risk of bias of RCTs in general health and PT research. We extracted individual items from all quality tools. We calculated the frequency of quality items used across tools and compared them to those in the RoB tool. Comparisons were made between general health and PT quality tools using Chi-squared tests. Results In addition to the RoB tool, 26 quality tools were identified, with 19 being used in general health and seven in PT research. The total number of quality items included in general health research tools was 130, compared with 48 items across PT tools and seven items in the RoB tool. The most frequently included items in general health research tools (14/19, 74%) were inclusion and exclusion criteria, and appropriate statistical analysis. In contrast, the most frequent items included in PT tools (86%, 6/7) were: baseline comparability, blinding of investigator/assessor, and use of intention-to-treat analysis. Key items of the RoB tool (sequence generation and allocation concealment) were included in 71% (5/7) of PT tools, and 63% (12/19) and 37% (7/19) of general health research tools, respectively. Conclusions There is extensive item variation across tools that evaluate the risk of bias of RCTs in health research. Results call for an in-depth analysis of items that should be used to assess risk of bias of RCTs. Further empirical evidence on the use of individual items and the psychometric properties of risk of bias tools is needed. PMID:24044807
Armijo-Olivo, Susan; Fuentes, Jorge; Ospina, Maria; Saltaji, Humam; Hartling, Lisa
2013-09-17
Assessing the risk of bias of randomized controlled trials (RCTs) is crucial to understand how biases affect treatment effect estimates. A number of tools have been developed to evaluate risk of bias of RCTs; however, it is unknown how these tools compare to each other in the items included. The main objective of this study was to describe which individual items are included in RCT quality tools used in general health and physical therapy (PT) research, and how these items compare to those of the Cochrane Risk of Bias (RoB) tool. We used comprehensive literature searches and a systematic approach to identify tools that evaluated the methodological quality or risk of bias of RCTs in general health and PT research. We extracted individual items from all quality tools. We calculated the frequency of quality items used across tools and compared them to those in the RoB tool. Comparisons were made between general health and PT quality tools using Chi-squared tests. In addition to the RoB tool, 26 quality tools were identified, with 19 being used in general health and seven in PT research. The total number of quality items included in general health research tools was 130, compared with 48 items across PT tools and seven items in the RoB tool. The most frequently included items in general health research tools (14/19, 74%) were inclusion and exclusion criteria, and appropriate statistical analysis. In contrast, the most frequent items included in PT tools (86%, 6/7) were: baseline comparability, blinding of investigator/assessor, and use of intention-to-treat analysis. Key items of the RoB tool (sequence generation and allocation concealment) were included in 71% (5/7) of PT tools, and 63% (12/19) and 37% (7/19) of general health research tools, respectively. There is extensive item variation across tools that evaluate the risk of bias of RCTs in health research. Results call for an in-depth analysis of items that should be used to assess risk of bias of RCTs. Further empirical evidence on the use of individual items and the psychometric properties of risk of bias tools is needed.
De Spiegelaere, Ward; Malatinkova, Eva; Lynch, Lindsay; Van Nieuwerburgh, Filip; Messiaen, Peter; O'Doherty, Una; Vandekerckhove, Linos
2014-06-01
Quantification of integrated proviral HIV DNA by repetitive-sampling Alu-HIV PCR is a candidate virological tool to monitor the HIV reservoir in patients. However, the experimental procedures and data analysis of the assay are complex and hinder its widespread use. Here, we provide an improved and simplified data analysis method by adopting binomial and Poisson statistics. A modified analysis method on the basis of Poisson statistics was used to analyze the binomial data of positive and negative reactions from a 42-replicate Alu-HIV PCR by use of dilutions of an integration standard and on samples of 57 HIV-infected patients. Results were compared with the quantitative output of the previously described Alu-HIV PCR method. Poisson-based quantification of the Alu-HIV PCR was linearly correlated with the standard dilution series, indicating that absolute quantification with the Poisson method is a valid alternative for data analysis of repetitive-sampling Alu-HIV PCR data. Quantitative outputs of patient samples assessed by the Poisson method correlated with the previously described Alu-HIV PCR analysis, indicating that this method is a valid alternative for quantifying integrated HIV DNA. Poisson-based analysis of the Alu-HIV PCR data enables absolute quantification without the need of a standard dilution curve. Implementation of the CI estimation permits improved qualitative analysis of the data and provides a statistical basis for the required minimal number of technical replicates. © 2014 The American Association for Clinical Chemistry.
NASA Astrophysics Data System (ADS)
Mia, Mozammel; Bashir, Mahmood Al; Dhar, Nikhil Ranjan
2016-07-01
Hard turning is gradually replacing the time consuming conventional turning process, which is typically followed by grinding, by producing surface quality compatible to grinding. The hard turned surface roughness depends on the cutting parameters, machining environments and tool insert configurations. In this article the variation of the surface roughness of the produced surfaces with the changes in tool insert configuration, use of coolant and different cutting parameters (cutting speed, feed rate) has been investigated. This investigation was performed in machining AISI 1060 steel, hardened to 56 HRC by heat treatment, using coated carbide inserts under two different machining environments. The depth of cut, fluid pressure and material hardness were kept constant. The Design of Experiment (DOE) was performed to determine the number and combination sets of different cutting parameters. A full factorial analysis has been performed to examine the effect of main factors as well as interaction effect of factors on surface roughness. A statistical analysis of variance (ANOVA) was employed to determine the combined effect of cutting parameters, environment and tool configuration. The result of this analysis reveals that environment has the most significant impact on surface roughness followed by feed rate and tool configuration respectively.
DMT-TAFM: a data mining tool for technical analysis of futures market
NASA Astrophysics Data System (ADS)
Stepanov, Vladimir; Sathaye, Archana
2002-03-01
Technical analysis of financial markets describes many patterns of market behavior. For practical use, all these descriptions need to be adjusted for each particular trading session. In this paper, we develop a data mining tool for technical analysis of the futures markets (DMT-TAFM), which dynamically generates rules based on the notion of the price pattern similarity. The tool consists of three main components. The first component provides visualization of data series on a chart with different ranges, scales, and chart sizes and types. The second component constructs pattern descriptions using sets of polynomials. The third component specifies the training set for mining, defines the similarity notion, and searches for a set of similar patterns. DMT-TAFM is useful to prepare the data, and then reveal and systemize statistical information about similar patterns found in any type of historical price series. We performed experiments with our tool on three decades of trading data fro hundred types of futures. Our results for this data set shows that, we can prove or disprove many well-known patterns based on real data, as well as reveal new ones, and use the set of relatively consistent patterns found during data mining for developing better futures trading strategies.
OPATs: Omnibus P-value association tests.
Chen, Chia-Wei; Yang, Hsin-Chou
2017-07-10
Combining statistical significances (P-values) from a set of single-locus association tests in genome-wide association studies is a proof-of-principle method for identifying disease-associated genomic segments, functional genes and biological pathways. We review P-value combinations for genome-wide association studies and introduce an integrated analysis tool, Omnibus P-value Association Tests (OPATs), which provides popular analysis methods of P-value combinations. The software OPATs programmed in R and R graphical user interface features a user-friendly interface. In addition to analysis modules for data quality control and single-locus association tests, OPATs provides three types of set-based association test: window-, gene- and biopathway-based association tests. P-value combinations with or without threshold and rank truncation are provided. The significance of a set-based association test is evaluated by using resampling procedures. Performance of the set-based association tests in OPATs has been evaluated by simulation studies and real data analyses. These set-based association tests help boost the statistical power, alleviate the multiple-testing problem, reduce the impact of genetic heterogeneity, increase the replication efficiency of association tests and facilitate the interpretation of association signals by streamlining the testing procedures and integrating the genetic effects of multiple variants in genomic regions of biological relevance. In summary, P-value combinations facilitate the identification of marker sets associated with disease susceptibility and uncover missing heritability in association studies, thereby establishing a foundation for the genetic dissection of complex diseases and traits. OPATs provides an easy-to-use and statistically powerful analysis tool for P-value combinations. OPATs, examples, and user guide can be downloaded from http://www.stat.sinica.edu.tw/hsinchou/genetics/association/OPATs.htm. © The Author 2017. Published by Oxford University Press.
Using statistical process control to make data-based clinical decisions.
Pfadt, A; Wheeler, D J
1995-01-01
Applied behavior analysis is based on an investigation of variability due to interrelationships among antecedents, behavior, and consequences. This permits testable hypotheses about the causes of behavior as well as for the course of treatment to be evaluated empirically. Such information provides corrective feedback for making data-based clinical decisions. This paper considers how a different approach to the analysis of variability based on the writings of Walter Shewart and W. Edwards Deming in the area of industrial quality control helps to achieve similar objectives. Statistical process control (SPC) was developed to implement a process of continual product improvement while achieving compliance with production standards and other requirements for promoting customer satisfaction. SPC involves the use of simple statistical tools, such as histograms and control charts, as well as problem-solving techniques, such as flow charts, cause-and-effect diagrams, and Pareto charts, to implement Deming's management philosophy. These data-analytic procedures can be incorporated into a human service organization to help to achieve its stated objectives in a manner that leads to continuous improvement in the functioning of the clients who are its customers. Examples are provided to illustrate how SPC procedures can be used to analyze behavioral data. Issues related to the application of these tools for making data-based clinical decisions and for creating an organizational climate that promotes their routine use in applied settings are also considered.
McDonnell, J D; Schunck, N; Higdon, D; Sarich, J; Wild, S M; Nazarewicz, W
2015-03-27
Statistical tools of uncertainty quantification can be used to assess the information content of measured observables with respect to present-day theoretical models, to estimate model errors and thereby improve predictive capability, to extrapolate beyond the regions reached by experiment, and to provide meaningful input to applications and planned measurements. To showcase new opportunities offered by such tools, we make a rigorous analysis of theoretical statistical uncertainties in nuclear density functional theory using Bayesian inference methods. By considering the recent mass measurements from the Canadian Penning Trap at Argonne National Laboratory, we demonstrate how the Bayesian analysis and a direct least-squares optimization, combined with high-performance computing, can be used to assess the information content of the new data with respect to a model based on the Skyrme energy density functional approach. Employing the posterior probability distribution computed with a Gaussian process emulator, we apply the Bayesian framework to propagate theoretical statistical uncertainties in predictions of nuclear masses, two-neutron dripline, and fission barriers. Overall, we find that the new mass measurements do not impose a constraint that is strong enough to lead to significant changes in the model parameters. The example discussed in this study sets the stage for quantifying and maximizing the impact of new measurements with respect to current modeling and guiding future experimental efforts, thus enhancing the experiment-theory cycle in the scientific method.
Using Statistical Analysis Software to Advance Nitro Plasticizer Wettability
DOE Office of Scientific and Technical Information (OSTI.GOV)
Shear, Trevor Allan
Statistical analysis in science is an extremely powerful tool that is often underutilized. Additionally, it is frequently the case that data is misinterpreted or not used to its fullest extent. Utilizing the advanced software JMP®, many aspects of experimental design and data analysis can be evaluated and improved. This overview will detail the features of JMP® and how they were used to advance a project, resulting in time and cost savings, as well as the collection of scientifically sound data. The project analyzed in this report addresses the inability of a nitro plasticizer to coat a gold coated quartz crystalmore » sensor used in a quartz crystal microbalance. Through the use of the JMP® software, the wettability of the nitro plasticizer was increased by over 200% using an atmospheric plasma pen, ensuring good sample preparation and reliable results.« less
Integration of statistical and physiological analyses of adaptation of near-isogenic barley lines.
Romagosa, I; Fox, P N; García Del Moral, L F; Ramos, J M; García Del Moral, B; Roca de Togores, F; Molina-Cano, J L
1993-08-01
Seven near-isogenic barley lines, differing for three independent mutant genes, were grown in 15 environments in Spain. Genotype x environment interaction (G x E) for grain yield was examined with the Additive Main Effects and Multiplicative interaction (AMMI) model. The results of this statistical analysis of multilocation yield-data were compared with a morpho-physiological characterization of the lines at two sites (Molina-Cano et al. 1990). The first two principal component axes from the AMMI analysis were strongly associated with the morpho-physiological characters. The independent but parallel discrimination among genotypes reflects genetic differences and highlights the power of the AMMI analysis as a tool to investigate G x E. Characters which appear to be positively associated with yield in the germplasm under study could be identified for some environments.
Medical cost analysis: application to colorectal cancer data from the SEER Medicare database.
Bang, Heejung
2005-10-01
Incompleteness is a key feature of most survival data. Numerous well established statistical methodologies and algorithms exist for analyzing life or failure time data. However, induced censorship invalidates the use of those standard analytic tools for some survival-type data such as medical costs. In this paper, some valid methods currently available for analyzing censored medical cost data are reviewed. Some cautionary findings under different assumptions are envisioned through application to medical costs from colorectal cancer patients. Cost analysis should be suitably planned and carefully interpreted under various meaningful scenarios even with judiciously selected statistical methods. This approach would be greatly helpful to policy makers who seek to prioritize health care expenditures and to assess the elements of resource use.
SPARTA: Simple Program for Automated reference-based bacterial RNA-seq Transcriptome Analysis.
Johnson, Benjamin K; Scholz, Matthew B; Teal, Tracy K; Abramovitch, Robert B
2016-02-04
Many tools exist in the analysis of bacterial RNA sequencing (RNA-seq) transcriptional profiling experiments to identify differentially expressed genes between experimental conditions. Generally, the workflow includes quality control of reads, mapping to a reference, counting transcript abundance, and statistical tests for differentially expressed genes. In spite of the numerous tools developed for each component of an RNA-seq analysis workflow, easy-to-use bacterially oriented workflow applications to combine multiple tools and automate the process are lacking. With many tools to choose from for each step, the task of identifying a specific tool, adapting the input/output options to the specific use-case, and integrating the tools into a coherent analysis pipeline is not a trivial endeavor, particularly for microbiologists with limited bioinformatics experience. To make bacterial RNA-seq data analysis more accessible, we developed a Simple Program for Automated reference-based bacterial RNA-seq Transcriptome Analysis (SPARTA). SPARTA is a reference-based bacterial RNA-seq analysis workflow application for single-end Illumina reads. SPARTA is turnkey software that simplifies the process of analyzing RNA-seq data sets, making bacterial RNA-seq analysis a routine process that can be undertaken on a personal computer or in the classroom. The easy-to-install, complete workflow processes whole transcriptome shotgun sequencing data files by trimming reads and removing adapters, mapping reads to a reference, counting gene features, calculating differential gene expression, and, importantly, checking for potential batch effects within the data set. SPARTA outputs quality analysis reports, gene feature counts and differential gene expression tables and scatterplots. SPARTA provides an easy-to-use bacterial RNA-seq transcriptional profiling workflow to identify differentially expressed genes between experimental conditions. This software will enable microbiologists with limited bioinformatics experience to analyze their data and integrate next generation sequencing (NGS) technologies into the classroom. The SPARTA software and tutorial are available at sparta.readthedocs.org.
Progress toward openness, transparency, and reproducibility in cognitive neuroscience.
Gilmore, Rick O; Diaz, Michele T; Wyble, Brad A; Yarkoni, Tal
2017-05-01
Accumulating evidence suggests that many findings in psychological science and cognitive neuroscience may prove difficult to reproduce; statistical power in brain imaging studies is low and has not improved recently; software errors in analysis tools are common and can go undetected for many years; and, a few large-scale studies notwithstanding, open sharing of data, code, and materials remain the rare exception. At the same time, there is a renewed focus on reproducibility, transparency, and openness as essential core values in cognitive neuroscience. The emergence and rapid growth of data archives, meta-analytic tools, software pipelines, and research groups devoted to improved methodology reflect this new sensibility. We review evidence that the field has begun to embrace new open research practices and illustrate how these can begin to address problems of reproducibility, statistical power, and transparency in ways that will ultimately accelerate discovery. © 2017 New York Academy of Sciences.
A network-base analysis of CMIP5 "historical" experiments
NASA Astrophysics Data System (ADS)
Bracco, A.; Foudalis, I.; Dovrolis, C.
2012-12-01
In computer science, "complex network analysis" refers to a set of metrics, modeling tools and algorithms commonly used in the study of complex nonlinear dynamical systems. Its main premise is that the underlying topology or network structure of a system has a strong impact on its dynamics and evolution. By allowing to investigate local and non-local statistical interaction, network analysis provides a powerful, but only marginally explored, framework to validate climate models and investigate teleconnections, assessing their strength, range, and impacts on the climate system. In this work we propose a new, fast, robust and scalable methodology to examine, quantify, and visualize climate sensitivity, while constraining general circulation models (GCMs) outputs with observations. The goal of our novel approach is to uncover relations in the climate system that are not (or not fully) captured by more traditional methodologies used in climate science and often adopted from nonlinear dynamical systems analysis, and to explain known climate phenomena in terms of the network structure or its metrics. Our methodology is based on a solid theoretical framework and employs mathematical and statistical tools, exploited only tentatively in climate research so far. Suitably adapted to the climate problem, these tools can assist in visualizing the trade-offs in representing global links and teleconnections among different data sets. Here we present the methodology, and compare network properties for different reanalysis data sets and a suite of CMIP5 coupled GCM outputs. With an extensive model intercomparison in terms of the climate network that each model leads to, we quantify how each model reproduces major teleconnections, rank model performances, and identify common or specific errors in comparing model outputs and observations.
Radiation shielding quality assurance
NASA Astrophysics Data System (ADS)
Um, Dallsun
For the radiation shielding quality assurance, the validity and reliability of the neutron transport code MCNP, which is now one of the most widely used radiation shielding analysis codes, were checked with lot of benchmark experiments. And also as a practical example, follows were performed in this thesis. One integral neutron transport experiment to measure the effect of neutron streaming in iron and void was performed with Dog-Legged Void Assembly in Knolls Atomic Power Laboratory in 1991. Neutron flux was measured six different places with the methane detectors and a BF-3 detector. The main purpose of the measurements was to provide benchmark against which various neutron transport calculation tools could be compared. Those data were used in verification of Monte Carlo Neutron & Photon Transport Code, MCNP, with the modeling for that. Experimental results and calculation results were compared in both ways, as the total integrated value of neutron fluxes along neutron energy range from 10 KeV to 2 MeV and as the neutron spectrum along with neutron energy range. Both results are well matched with the statistical error +/-20%. MCNP results were also compared with those of TORT, a three dimensional discrete ordinates code which was developed by Oak Ridge National Laboratory. MCNP results are superior to the TORT results at all detector places except one. This means that MCNP is proved as a very powerful tool for the analysis of neutron transport through iron & air and further it could be used as a powerful tool for the radiation shielding analysis. For one application of the analysis of variance (ANOVA) to neutron and gamma transport problems, uncertainties for the calculated values of critical K were evaluated as in the ANOVA on statistical data.
StatsDB: platform-agnostic storage and understanding of next generation sequencing run metrics
Ramirez-Gonzalez, Ricardo H.; Leggett, Richard M.; Waite, Darren; Thanki, Anil; Drou, Nizar; Caccamo, Mario; Davey, Robert
2014-01-01
Modern sequencing platforms generate enormous quantities of data in ever-decreasing amounts of time. Additionally, techniques such as multiplex sequencing allow one run to contain hundreds of different samples. With such data comes a significant challenge to understand its quality and to understand how the quality and yield are changing across instruments and over time. As well as the desire to understand historical data, sequencing centres often have a duty to provide clear summaries of individual run performance to collaborators or customers. We present StatsDB, an open-source software package for storage and analysis of next generation sequencing run metrics. The system has been designed for incorporation into a primary analysis pipeline, either at the programmatic level or via integration into existing user interfaces. Statistics are stored in an SQL database and APIs provide the ability to store and access the data while abstracting the underlying database design. This abstraction allows simpler, wider querying across multiple fields than is possible by the manual steps and calculation required to dissect individual reports, e.g. ”provide metrics about nucleotide bias in libraries using adaptor barcode X, across all runs on sequencer A, within the last month”. The software is supplied with modules for storage of statistics from FastQC, a commonly used tool for analysis of sequence reads, but the open nature of the database schema means it can be easily adapted to other tools. Currently at The Genome Analysis Centre (TGAC), reports are accessed through our LIMS system or through a standalone GUI tool, but the API and supplied examples make it easy to develop custom reports and to interface with other packages. PMID:24627795
2011-01-01
Background Clinical researchers have often preferred to use a fixed effects model for the primary interpretation of a meta-analysis. Heterogeneity is usually assessed via the well known Q and I2 statistics, along with the random effects estimate they imply. In recent years, alternative methods for quantifying heterogeneity have been proposed, that are based on a 'generalised' Q statistic. Methods We review 18 IPD meta-analyses of RCTs into treatments for cancer, in order to quantify the amount of heterogeneity present and also to discuss practical methods for explaining heterogeneity. Results Differing results were obtained when the standard Q and I2 statistics were used to test for the presence of heterogeneity. The two meta-analyses with the largest amount of heterogeneity were investigated further, and on inspection the straightforward application of a random effects model was not deemed appropriate. Compared to the standard Q statistic, the generalised Q statistic provided a more accurate platform for estimating the amount of heterogeneity in the 18 meta-analyses. Conclusions Explaining heterogeneity via the pre-specification of trial subgroups, graphical diagnostic tools and sensitivity analyses produced a more desirable outcome than an automatic application of the random effects model. Generalised Q statistic methods for quantifying and adjusting for heterogeneity should be incorporated as standard into statistical software. Software is provided to help achieve this aim. PMID:21473747
"Dear Fresher …"--How Online Questionnaires Can Improve Learning and Teaching Statistics
ERIC Educational Resources Information Center
Bebermeier, Sarah; Nussbeck, Fridtjof W.; Ontrup, Greta
2015-01-01
Lecturers teaching statistics are faced with several challenges supporting students' learning in appropriate ways. A variety of methods and tools exist to facilitate students' learning on statistics courses. The online questionnaires presented in this report are a new, slightly different computer-based tool: the central aim was to support students…
PIVOT: platform for interactive analysis and visualization of transcriptomics data.
Zhu, Qin; Fisher, Stephen A; Dueck, Hannah; Middleton, Sarah; Khaladkar, Mugdha; Kim, Junhyong
2018-01-05
Many R packages have been developed for transcriptome analysis but their use often requires familiarity with R and integrating results of different packages requires scripts to wrangle the datatypes. Furthermore, exploratory data analyses often generate multiple derived datasets such as data subsets or data transformations, which can be difficult to track. Here we present PIVOT, an R-based platform that wraps open source transcriptome analysis packages with a uniform user interface and graphical data management that allows non-programmers to interactively explore transcriptomics data. PIVOT supports more than 40 popular open source packages for transcriptome analysis and provides an extensive set of tools for statistical data manipulations. A graph-based visual interface is used to represent the links between derived datasets, allowing easy tracking of data versions. PIVOT further supports automatic report generation, publication-quality plots, and program/data state saving, such that all analysis can be saved, shared and reproduced. PIVOT will allow researchers with broad background to easily access sophisticated transcriptome analysis tools and interactively explore transcriptome datasets.
Pineda-Peña, Andrea-Clemencia; Faria, Nuno Rodrigues; Imbrechts, Stijn; Libin, Pieter; Abecasis, Ana Barroso; Deforche, Koen; Gómez-López, Arley; Camacho, Ricardo J; de Oliveira, Tulio; Vandamme, Anne-Mieke
2013-10-01
To investigate differences in pathogenesis, diagnosis and resistance pathways between HIV-1 subtypes, an accurate subtyping tool for large datasets is needed. We aimed to evaluate the performance of automated subtyping tools to classify the different subtypes and circulating recombinant forms using pol, the most sequenced region in clinical practice. We also present the upgraded version 3 of the Rega HIV subtyping tool (REGAv3). HIV-1 pol sequences (PR+RT) for 4674 patients retrieved from the Portuguese HIV Drug Resistance Database, and 1872 pol sequences trimmed from full-length genomes retrieved from the Los Alamos database were classified with statistical-based tools such as COMET, jpHMM and STAR; similarity-based tools such as NCBI and Stanford; and phylogenetic-based tools such as REGA version 2 (REGAv2), REGAv3, and SCUEAL. The performance of these tools, for pol, and for PR and RT separately, was compared in terms of reproducibility, sensitivity and specificity with respect to the gold standard which was manual phylogenetic analysis of the pol region. The sensitivity and specificity for subtypes B and C was more than 96% for seven tools, but was variable for other subtypes such as A, D, F and G. With regard to the most common circulating recombinant forms (CRFs), the sensitivity and specificity for CRF01_AE was ~99% with statistical-based tools, with phylogenetic-based tools and with Stanford, one of the similarity based tools. CRF02_AG was correctly identified for more than 96% by COMET, REGAv3, Stanford and STAR. All the tools reached a specificity of more than 97% for most of the subtypes and the two main CRFs (CRF01_AE and CRF02_AG). Other CRFs were identified only by COMET, REGAv2, REGAv3, and SCUEAL and with variable sensitivity. When analyzing sequences for PR and RT separately, the performance for PR was generally lower and variable between the tools. Similarity and statistical-based tools were 100% reproducible, but this was lower for phylogenetic-based tools such as REGA (~99%) and SCUEAL (~96%). REGAv3 had an improved performance for subtype B and CRF02_AG compared to REGAv2 and is now able to also identify all epidemiologically relevant CRFs. In general the best performing tools, in alphabetical order, were COMET, jpHMM, REGAv3, and SCUEAL when analyzing pure subtypes in the pol region, and COMET and REGAv3 when analyzing most of the CRFs. Based on this study, we recommend to confirm subtyping with 2 well performing tools, and be cautious with the interpretation of short sequences. Copyright © 2013 The Authors. Published by Elsevier B.V. All rights reserved.
Experiences in using DISCUS for visualizing human communication
NASA Astrophysics Data System (ADS)
Groehn, Matti; Nieminen, Marko; Haho, Paeivi; Smeds, Riitta
2000-02-01
In this paper, we present further improvement to the DISCUS software that can be used to record and analyze the flow and constants of business process simulation session discussion. The tool was initially introduced in 'visual data exploration and analysis IV' conference. The initial features of the tool enabled the visualization of discussion flow in business process simulation sessions and the creation of SOM analyses. The improvements of the tool consists of additional visualization possibilities that enable quick on-line analyses and improved graphical statistics. We have also created the very first interface to audio data and implemented two ways to visualize it. We also outline additional possibilities to use the tool in other application areas: these include usability testing and the possibility to use the tool for capturing design rationale in a product development process. The data gathered with DISCUS may be used in other applications, and further work may be done with data ming techniques.
Mangado, Nerea; Piella, Gemma; Noailly, Jérôme; Pons-Prats, Jordi; Ballester, Miguel Ángel González
2016-01-01
Computational modeling has become a powerful tool in biomedical engineering thanks to its potential to simulate coupled systems. However, real parameters are usually not accurately known, and variability is inherent in living organisms. To cope with this, probabilistic tools, statistical analysis and stochastic approaches have been used. This article aims to review the analysis of uncertainty and variability in the context of finite element modeling in biomedical engineering. Characterization techniques and propagation methods are presented, as well as examples of their applications in biomedical finite element simulations. Uncertainty propagation methods, both non-intrusive and intrusive, are described. Finally, pros and cons of the different approaches and their use in the scientific community are presented. This leads us to identify future directions for research and methodological development of uncertainty modeling in biomedical engineering. PMID:27872840
Mangado, Nerea; Piella, Gemma; Noailly, Jérôme; Pons-Prats, Jordi; Ballester, Miguel Ángel González
2016-01-01
Computational modeling has become a powerful tool in biomedical engineering thanks to its potential to simulate coupled systems. However, real parameters are usually not accurately known, and variability is inherent in living organisms. To cope with this, probabilistic tools, statistical analysis and stochastic approaches have been used. This article aims to review the analysis of uncertainty and variability in the context of finite element modeling in biomedical engineering. Characterization techniques and propagation methods are presented, as well as examples of their applications in biomedical finite element simulations. Uncertainty propagation methods, both non-intrusive and intrusive, are described. Finally, pros and cons of the different approaches and their use in the scientific community are presented. This leads us to identify future directions for research and methodological development of uncertainty modeling in biomedical engineering.
Kerfriden, P.; Schmidt, K.M.; Rabczuk, T.; Bordas, S.P.A.
2013-01-01
We propose to identify process zones in heterogeneous materials by tailored statistical tools. The process zone is redefined as the part of the structure where the random process cannot be correctly approximated in a low-dimensional deterministic space. Such a low-dimensional space is obtained by a spectral analysis performed on pre-computed solution samples. A greedy algorithm is proposed to identify both process zone and low-dimensional representative subspace for the solution in the complementary region. In addition to the novelty of the tools proposed in this paper for the analysis of localised phenomena, we show that the reduced space generated by the method is a valid basis for the construction of a reduced order model. PMID:27069423
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wilson, Eric J
The ResStock analysis tool is helping states, municipalities, utilities, and manufacturers identify which home upgrades save the most energy and money. Across the country there's a vast diversity in the age, size, construction practices, installed equipment, appliances, and resident behavior of the housing stock, not to mention the range of climates. These variations have hindered the accuracy of predicting savings for existing homes. Researchers at the National Renewable Energy Laboratory (NREL) developed ResStock. It's a versatile tool that takes a new approach to large-scale residential energy analysis by combining: large public and private data sources, statistical sampling, detailed subhourly buildingmore » simulations, high-performance computing. This combination achieves unprecedented granularity and most importantly - accuracy - in modeling the diversity of the single-family housing stock.« less
NASA Astrophysics Data System (ADS)
Hardie, Russell C.; Power, Jonathan D.; LeMaster, Daniel A.; Droege, Douglas R.; Gladysz, Szymon; Bose-Pillai, Santasri
2017-07-01
We present a numerical wave propagation method for simulating imaging of an extended scene under anisoplanatic conditions. While isoplanatic simulation is relatively common, few tools are specifically designed for simulating the imaging of extended scenes under anisoplanatic conditions. We provide a complete description of the proposed simulation tool, including the wave propagation method used. Our approach computes an array of point spread functions (PSFs) for a two-dimensional grid on the object plane. The PSFs are then used in a spatially varying weighted sum operation, with an ideal image, to produce a simulated image with realistic optical turbulence degradation. The degradation includes spatially varying warping and blurring. To produce the PSF array, we generate a series of extended phase screens. Simulated point sources are numerically propagated from an array of positions on the object plane, through the phase screens, and ultimately to the focal plane of the simulated camera. Note that the optical path for each PSF will be different, and thus, pass through a different portion of the extended phase screens. These different paths give rise to a spatially varying PSF to produce anisoplanatic effects. We use a method for defining the individual phase screen statistics that we have not seen used in previous anisoplanatic simulations. We also present a validation analysis. In particular, we compare simulated outputs with the theoretical anisoplanatic tilt correlation and a derived differential tilt variance statistic. This is in addition to comparing the long- and short-exposure PSFs and isoplanatic angle. We believe this analysis represents the most thorough validation of an anisoplanatic simulation to date. The current work is also unique that we simulate and validate both constant and varying Cn2(z) profiles. Furthermore, we simulate sequences with both temporally independent and temporally correlated turbulence effects. Temporal correlation is introduced by generating even larger extended phase screens and translating this block of screens in front of the propagation area. Our validation analysis shows an excellent match between the simulation statistics and the theoretical predictions. Thus, we think this tool can be used effectively to study optical anisoplanatic turbulence and to aid in the development of image restoration methods.
2012-06-01
generalized behavioral model characterized after the fictional Seldon equations (the one elaborated upon by Isaac Asimov in the 1951 novel, The...Foundation). Asimov described the Seldon equations as essentially statistical models with historical data of a sufficient size and variability that they
Quantification of Operational Risk Using A Data Mining
NASA Technical Reports Server (NTRS)
Perera, J. Sebastian
1999-01-01
What is Data Mining? - Data Mining is the process of finding actionable information hidden in raw data. - Data Mining helps find hidden patterns, trends, and important relationships often buried in a sea of data - Typically, automated software tools based on advanced statistical analysis and data modeling technology can be utilized to automate the data mining process
Onboard Acoustic Data-Processing for the Statistical Analysis of Array Beam-Noise,
1980-12-15
performance of the sonar system as a measurement tool and others that can assess the character of the ambient- noise field at the time of the measurement. In...the plot as would "dead" hydrophones. A reduction in sensitivity of a hydrophone, a faulty preamplifier , or any other fault in the acoustic channel
ERIC Educational Resources Information Center
Preacher, Kristopher J.; Kelley, Ken
2011-01-01
The statistical analysis of mediation effects has become an indispensable tool for helping scientists investigate processes thought to be causal. Yet, in spite of many recent advances in the estimation and testing of mediation effects, little attention has been given to methods for communicating effect size and the practical importance of those…
TANGO: a generic tool for high-throughput 3D image analysis for studying nuclear organization.
Ollion, Jean; Cochennec, Julien; Loll, François; Escudé, Christophe; Boudier, Thomas
2013-07-15
The cell nucleus is a highly organized cellular organelle that contains the genetic material. The study of nuclear architecture has become an important field of cellular biology. Extracting quantitative data from 3D fluorescence imaging helps understand the functions of different nuclear compartments. However, such approaches are limited by the requirement for processing and analyzing large sets of images. Here, we describe Tools for Analysis of Nuclear Genome Organization (TANGO), an image analysis tool dedicated to the study of nuclear architecture. TANGO is a coherent framework allowing biologists to perform the complete analysis process of 3D fluorescence images by combining two environments: ImageJ (http://imagej.nih.gov/ij/) for image processing and quantitative analysis and R (http://cran.r-project.org) for statistical processing of measurement results. It includes an intuitive user interface providing the means to precisely build a segmentation procedure and set-up analyses, without possessing programming skills. TANGO is a versatile tool able to process large sets of images, allowing quantitative study of nuclear organization. TANGO is composed of two programs: (i) an ImageJ plug-in and (ii) a package (rtango) for R. They are both free and open source, available (http://biophysique.mnhn.fr/tango) for Linux, Microsoft Windows and Macintosh OSX. Distribution is under the GPL v.2 licence. thomas.boudier@snv.jussieu.fr Supplementary data are available at Bioinformatics online.
FMAP: Functional Mapping and Analysis Pipeline for metagenomics and metatranscriptomics studies.
Kim, Jiwoong; Kim, Min Soo; Koh, Andrew Y; Xie, Yang; Zhan, Xiaowei
2016-10-10
Given the lack of a complete and comprehensive library of microbial reference genomes, determining the functional profile of diverse microbial communities is challenging. The available functional analysis pipelines lack several key features: (i) an integrated alignment tool, (ii) operon-level analysis, and (iii) the ability to process large datasets. Here we introduce our open-sourced, stand-alone functional analysis pipeline for analyzing whole metagenomic and metatranscriptomic sequencing data, FMAP (Functional Mapping and Analysis Pipeline). FMAP performs alignment, gene family abundance calculations, and statistical analysis (three levels of analyses are provided: differentially-abundant genes, operons and pathways). The resulting output can be easily visualized with heatmaps and functional pathway diagrams. FMAP functional predictions are consistent with currently available functional analysis pipelines. FMAP is a comprehensive tool for providing functional analysis of metagenomic/metatranscriptomic sequencing data. With the added features of integrated alignment, operon-level analysis, and the ability to process large datasets, FMAP will be a valuable addition to the currently available functional analysis toolbox. We believe that this software will be of great value to the wider biology and bioinformatics communities.
WEC Design Response Toolbox v. 1.0
DOE Office of Scientific and Technical Information (OSTI.GOV)
Coe, Ryan; Michelen, Carlos; Eckert-Gallup, Aubrey
2016-03-30
The WEC Design Response Toolbox (WDRT) is a numerical toolbox for design-response analysis of wave energy converters (WECs). The WDRT was developed during a series of efforts to better understand WEC survival design. The WDRT has been designed as a tool for researchers and developers, enabling the straightforward application of statistical and engineering methods. The toolbox includes methods for short-term extreme response, environmental characterization, long-term extreme response and risk analysis, fatigue, and design wave composition.
Fan, Yannan; Siklenka, Keith; Arora, Simran K.; Ribeiro, Paula; Kimmins, Sarah; Xia, Jianguo
2016-01-01
MicroRNAs (miRNAs) can regulate nearly all biological processes and their dysregulation is implicated in various complex diseases and pathological conditions. Recent years have seen a growing number of functional studies of miRNAs using high-throughput experimental technologies, which have produced a large amount of high-quality data regarding miRNA target genes and their interactions with small molecules, long non-coding RNAs, epigenetic modifiers, disease associations, etc. These rich sets of information have enabled the creation of comprehensive networks linking miRNAs with various biologically important entities to shed light on their collective functions and regulatory mechanisms. Here, we introduce miRNet, an easy-to-use web-based tool that offers statistical, visual and network-based approaches to help researchers understand miRNAs functions and regulatory mechanisms. The key features of miRNet include: (i) a comprehensive knowledge base integrating high-quality miRNA-target interaction data from 11 databases; (ii) support for differential expression analysis of data from microarray, RNA-seq and quantitative PCR; (iii) implementation of a flexible interface for data filtering, refinement and customization during network creation; (iv) a powerful fully featured network visualization system coupled with enrichment analysis. miRNet offers a comprehensive tool suite to enable statistical analysis and functional interpretation of various data generated from current miRNA studies. miRNet is freely available at http://www.mirnet.ca. PMID:27105848
Phenolic Analysis and Theoretic Design for Chinese Commercial Wines' Authentication.
Li, Si-Yu; Zhu, Bao-Qing; Reeves, Malcolm J; Duan, Chang-Qing
2018-01-01
To develop a robust tool for Chinese commercial wines' varietal, regional, and vintage authentication, phenolic compounds in 121 Chinese commercial dry red wines were detected and quantified by using high-performance liquid chromatography triple-quadrupole mass spectrometry (HPLC-QqQ-MS/MS), and differentiation abilities of principal component analysis (PCA), partial least squares discriminant analysis (PLS-DA), and orthogonal partial least squares discriminant analysis (OPLS-DA) were compared. Better than PCA and PLS-DA, OPLS-DA models used to differentiate wines according to their varieties (Cabernet Sauvignon or other varieties), regions (east or west Cabernet Sauvignon wines), and vintages (young or old Cabernet Sauvignon wines) were ideally established. The S-plot provided in OPLS-DA models showed the key phenolic compounds which were both statistically and biochemically significant in sample differentiation. Besides, the potential of the OPLS-DA models in deeper sample differentiating of more detailed regional and vintage information of wines was proved optimistic. On the basis of our results, a promising theoretic design for wine authentication was further proposed for the first time, which might be helpful in practical authentication of more commercial wines. The phenolic data of 121 Chinese commercial dry red wines was processed with different statistical tools for varietal, regional, and vintage differentiation. A promising theoretical design was summarized, which might be helpful for wine authentication in practical situation. © 2017 Institute of Food Technologists®.
Replica analysis of overfitting in regression models for time-to-event data
NASA Astrophysics Data System (ADS)
Coolen, A. C. C.; Barrett, J. E.; Paga, P.; Perez-Vicente, C. J.
2017-09-01
Overfitting, which happens when the number of parameters in a model is too large compared to the number of data points available for determining these parameters, is a serious and growing problem in survival analysis. While modern medicine presents us with data of unprecedented dimensionality, these data cannot yet be used effectively for clinical outcome prediction. Standard error measures in maximum likelihood regression, such as p-values and z-scores, are blind to overfitting, and even for Cox’s proportional hazards model (the main tool of medical statisticians), one finds in literature only rules of thumb on the number of samples required to avoid overfitting. In this paper we present a mathematical theory of overfitting in regression models for time-to-event data, which aims to increase our quantitative understanding of the problem and provide practical tools with which to correct regression outcomes for the impact of overfitting. It is based on the replica method, a statistical mechanical technique for the analysis of heterogeneous many-variable systems that has been used successfully for several decades in physics, biology, and computer science, but not yet in medical statistics. We develop the theory initially for arbitrary regression models for time-to-event data, and verify its predictions in detail for the popular Cox model.
Goodness-of-fit tests and model diagnostics for negative binomial regression of RNA sequencing data.
Mi, Gu; Di, Yanming; Schafer, Daniel W
2015-01-01
This work is about assessing model adequacy for negative binomial (NB) regression, particularly (1) assessing the adequacy of the NB assumption, and (2) assessing the appropriateness of models for NB dispersion parameters. Tools for the first are appropriate for NB regression generally; those for the second are primarily intended for RNA sequencing (RNA-Seq) data analysis. The typically small number of biological samples and large number of genes in RNA-Seq analysis motivate us to address the trade-offs between robustness and statistical power using NB regression models. One widely-used power-saving strategy, for example, is to assume some commonalities of NB dispersion parameters across genes via simple models relating them to mean expression rates, and many such models have been proposed. As RNA-Seq analysis is becoming ever more popular, it is appropriate to make more thorough investigations into power and robustness of the resulting methods, and into practical tools for model assessment. In this article, we propose simulation-based statistical tests and diagnostic graphics to address model adequacy. We provide simulated and real data examples to illustrate that our proposed methods are effective for detecting the misspecification of the NB mean-variance relationship as well as judging the adequacy of fit of several NB dispersion models.
3Drefine: an interactive web server for efficient protein structure refinement
Bhattacharya, Debswapna; Nowotny, Jackson; Cao, Renzhi; Cheng, Jianlin
2016-01-01
3Drefine is an interactive web server for consistent and computationally efficient protein structure refinement with the capability to perform web-based statistical and visual analysis. The 3Drefine refinement protocol utilizes iterative optimization of hydrogen bonding network combined with atomic-level energy minimization on the optimized model using a composite physics and knowledge-based force fields for efficient protein structure refinement. The method has been extensively evaluated on blind CASP experiments as well as on large-scale and diverse benchmark datasets and exhibits consistent improvement over the initial structure in both global and local structural quality measures. The 3Drefine web server allows for convenient protein structure refinement through a text or file input submission, email notification, provided example submission and is freely available without any registration requirement. The server also provides comprehensive analysis of submissions through various energy and statistical feedback and interactive visualization of multiple refined models through the JSmol applet that is equipped with numerous protein model analysis tools. The web server has been extensively tested and used by many users. As a result, the 3Drefine web server conveniently provides a useful tool easily accessible to the community. The 3Drefine web server has been made publicly available at the URL: http://sysbio.rnet.missouri.edu/3Drefine/. PMID:27131371
Boulesteix, Anne-Laure; Wilson, Rory; Hapfelmeier, Alexander
2017-09-09
The goal of medical research is to develop interventions that are in some sense superior, with respect to patient outcome, to interventions currently in use. Similarly, the goal of research in methodological computational statistics is to develop data analysis tools that are themselves superior to the existing tools. The methodology of the evaluation of medical interventions continues to be discussed extensively in the literature and it is now well accepted that medicine should be at least partly "evidence-based". Although we statisticians are convinced of the importance of unbiased, well-thought-out study designs and evidence-based approaches in the context of clinical research, we tend to ignore these principles when designing our own studies for evaluating statistical methods in the context of our methodological research. In this paper, we draw an analogy between clinical trials and real-data-based benchmarking experiments in methodological statistical science, with datasets playing the role of patients and methods playing the role of medical interventions. Through this analogy, we suggest directions for improvement in the design and interpretation of studies which use real data to evaluate statistical methods, in particular with respect to dataset inclusion criteria and the reduction of various forms of bias. More generally, we discuss the concept of "evidence-based" statistical research, its limitations and its impact on the design and interpretation of real-data-based benchmark experiments. We suggest that benchmark studies-a method of assessment of statistical methods using real-world datasets-might benefit from adopting (some) concepts from evidence-based medicine towards the goal of more evidence-based statistical research.
Finding Statistically Significant Communities in Networks
Lancichinetti, Andrea; Radicchi, Filippo; Ramasco, José J.; Fortunato, Santo
2011-01-01
Community structure is one of the main structural features of networks, revealing both their internal organization and the similarity of their elementary units. Despite the large variety of methods proposed to detect communities in graphs, there is a big need for multi-purpose techniques, able to handle different types of datasets and the subtleties of community structure. In this paper we present OSLOM (Order Statistics Local Optimization Method), the first method capable to detect clusters in networks accounting for edge directions, edge weights, overlapping communities, hierarchies and community dynamics. It is based on the local optimization of a fitness function expressing the statistical significance of clusters with respect to random fluctuations, which is estimated with tools of Extreme and Order Statistics. OSLOM can be used alone or as a refinement procedure of partitions/covers delivered by other techniques. We have also implemented sequential algorithms combining OSLOM with other fast techniques, so that the community structure of very large networks can be uncovered. Our method has a comparable performance as the best existing algorithms on artificial benchmark graphs. Several applications on real networks are shown as well. OSLOM is implemented in a freely available software (http://www.oslom.org), and we believe it will be a valuable tool in the analysis of networks. PMID:21559480
Statistical significance estimation of a signal within the GooFit framework on GPUs
NASA Astrophysics Data System (ADS)
Cristella, Leonardo; Di Florio, Adriano; Pompili, Alexis
2017-03-01
In order to test the computing capabilities of GPUs with respect to traditional CPU cores a high-statistics toy Monte Carlo technique has been implemented both in ROOT/RooFit and GooFit frameworks with the purpose to estimate the statistical significance of the structure observed by CMS close to the kinematical boundary of the J/ψϕ invariant mass in the three-body decay B+ → J/ψϕK+. GooFit is a data analysis open tool under development that interfaces ROOT/RooFit to CUDA platform on nVidia GPU. The optimized GooFit application running on GPUs hosted by servers in the Bari Tier2 provides striking speed-up performances with respect to the RooFit application parallelised on multiple CPUs by means of PROOF-Lite tool. The considerable resulting speed-up, evident when comparing concurrent GooFit processes allowed by CUDA Multi Process Service and a RooFit/PROOF-Lite process with multiple CPU workers, is presented and discussed in detail. By means of GooFit it has also been possible to explore the behaviour of a likelihood ratio test statistic in different situations in which the Wilks Theorem may or may not apply because its regularity conditions are not satisfied.
NASA Astrophysics Data System (ADS)
Di Florio, Adriano
2017-10-01
In order to test the computing capabilities of GPUs with respect to traditional CPU cores a high-statistics toy Monte Carlo technique has been implemented both in ROOT/RooFit and GooFit frameworks with the purpose to estimate the statistical significance of the structure observed by CMS close to the kinematical boundary of the J/ψϕ invariant mass in the three-body decay B + → J/ψϕK +. GooFit is a data analysis open tool under development that interfaces ROOT/RooFit to CUDA platform on nVidia GPU. The optimized GooFit application running on GPUs hosted by servers in the Bari Tier2 provides striking speed-up performances with respect to the RooFit application parallelised on multiple CPUs by means of PROOF-Lite tool. The considerable resulting speed-up, evident when comparing concurrent GooFit processes allowed by CUDA Multi Process Service and a RooFit/PROOF-Lite process with multiple CPU workers, is presented and discussed in detail. By means of GooFit it has also been possible to explore the behaviour of a likelihood ratio test statistic in different situations in which the Wilks Theorem may or may not apply because its regularity conditions are not satisfied.
Mercer, Theresa G; Frostick, Lynne E; Walmsley, Anthony D
2011-10-15
This paper presents a statistical technique that can be applied to environmental chemistry data where missing values and limit of detection levels prevent the application of statistics. A working example is taken from an environmental leaching study that was set up to determine if there were significant differences in levels of leached arsenic (As), chromium (Cr) and copper (Cu) between lysimeters containing preservative treated wood waste and those containing untreated wood. Fourteen lysimeters were setup and left in natural conditions for 21 weeks. The resultant leachate was analysed by ICP-OES to determine the As, Cr and Cu concentrations. However, due to the variation inherent in each lysimeter combined with the limits of detection offered by ICP-OES, the collected quantitative data was somewhat incomplete. Initial data analysis was hampered by the number of 'missing values' in the data. To recover the dataset, the statistical tool of Statistical Multiple Imputation (SMI) was applied, and the data was re-analysed successfully. It was demonstrated that using SMI did not affect the variance in the data, but facilitated analysis of the complete dataset. Copyright © 2011 Elsevier B.V. All rights reserved.
ROOT: A C++ framework for petabyte data storage, statistical analysis and visualization
DOE Office of Scientific and Technical Information (OSTI.GOV)
Antcheva, I.; /CERN; Ballintijn, M.
2009-01-01
ROOT is an object-oriented C++ framework conceived in the high-energy physics (HEP) community, designed for storing and analyzing petabytes of data in an efficient way. Any instance of a C++ class can be stored into a ROOT file in a machine-independent compressed binary format. In ROOT the TTree object container is optimized for statistical data analysis over very large data sets by using vertical data storage techniques. These containers can span a large number of files on local disks, the web or a number of different shared file systems. In order to analyze this data, the user can chose outmore » of a wide set of mathematical and statistical functions, including linear algebra classes, numerical algorithms such as integration and minimization, and various methods for performing regression analysis (fitting). In particular, the RooFit package allows the user to perform complex data modeling and fitting while the RooStats library provides abstractions and implementations for advanced statistical tools. Multivariate classification methods based on machine learning techniques are available via the TMVA package. A central piece in these analysis tools are the histogram classes which provide binning of one- and multi-dimensional data. Results can be saved in high-quality graphical formats like Postscript and PDF or in bitmap formats like JPG or GIF. The result can also be stored into ROOT macros that allow a full recreation and rework of the graphics. Users typically create their analysis macros step by step, making use of the interactive C++ interpreter CINT, while running over small data samples. Once the development is finished, they can run these macros at full compiled speed over large data sets, using on-the-fly compilation, or by creating a stand-alone batch program. Finally, if processing farms are available, the user can reduce the execution time of intrinsically parallel tasks - e.g. data mining in HEP - by using PROOF, which will take care of optimally distributing the work over the available resources in a transparent way.« less
Nonequilibrium Statistical Operator Method and Generalized Kinetic Equations
NASA Astrophysics Data System (ADS)
Kuzemsky, A. L.
2018-01-01
We consider some principal problems of nonequilibrium statistical thermodynamics in the framework of the Zubarev nonequilibrium statistical operator approach. We present a brief comparative analysis of some approaches to describing irreversible processes based on the concept of nonequilibrium Gibbs ensembles and their applicability to describing nonequilibrium processes. We discuss the derivation of generalized kinetic equations for a system in a heat bath. We obtain and analyze a damped Schrödinger-type equation for a dynamical system in a heat bath. We study the dynamical behavior of a particle in a medium taking the dissipation effects into account. We consider the scattering problem for neutrons in a nonequilibrium medium and derive a generalized Van Hove formula. We show that the nonequilibrium statistical operator method is an effective, convenient tool for describing irreversible processes in condensed matter.
Houts, Carrie R; Edwards, Michael C; Wirth, R J; Deal, Linda S
2016-11-01
There has been a notable increase in the advocacy of using small-sample designs as an initial quantitative assessment of item and scale performance during the scale development process. This is particularly true in the development of clinical outcome assessments (COAs), where Rasch analysis has been advanced as an appropriate statistical tool for evaluating the developing COAs using a small sample. We review the benefits such methods are purported to offer from both a practical and statistical standpoint and detail several problematic areas, including both practical and statistical theory concerns, with respect to the use of quantitative methods, including Rasch-consistent methods, with small samples. The feasibility of obtaining accurate information and the potential negative impacts of misusing large-sample statistical methods with small samples during COA development are discussed.
NASA Technical Reports Server (NTRS)
Burns, K. Lee; Altino, Karen
2008-01-01
The Marshall Space Flight Center Natural Environments Branch has a long history of expertise in the modeling and computation of statistical launch availabilities with respect to weather conditions. Their existing data analysis product, the Atmospheric Parametric Risk Assessment (APRA) tool, computes launch availability given an input set of vehicle hardware and/or operational weather constraints by calculating the climatological probability of exceeding the specified constraint limits, APRA has been used extensively to provide the Space Shuttle program the ability to estimate impacts that various proposed design modifications would have to overall launch availability. The model accounts for both seasonal and diurnal variability at a single geographic location and provides output probabilities for a single arbitrary launch attempt. Recently, the Shuttle program has shown interest in having additional capabilities added to the APRA model, including analysis of humidity parameters, inclusion of landing site weather to produce landing availability, and concurrent analysis of multiple sites, to assist in operational landing site selection. In addition, the Constellation program has also expressed interest in the APRA tool, and has requested several additional capabilities to address some Constellation-specific issues, both in the specification and verification of design requirements and in the development of operations concepts. The combined scope of the requested capability enhancements suggests an evolution of the model beyond a simple revision process. Development has begun for a new data analysis tool that will satisfy the requests of both programs. This new tool, Probabilities of Atmospheric Conditions and Environmental Risk (PACER), will provide greater flexibility and significantly enhanced functionality compared to the currently existing tool.
ERIC Educational Resources Information Center
Cassel, Russell N.
This paper relates educational and psychological statistics to certain "Research Statistical Tools" (RSTs) necessary to accomplish and understand general research in the behavioral sciences. Emphasis is placed on acquiring an effective understanding of the RSTs and to this end they are are ordered to a continuum scale in terms of individual…
Syndromic surveillance of influenza activity in Sweden: an evaluation of three tools.
Ma, T; Englund, H; Bjelkmar, P; Wallensten, A; Hulth, A
2015-08-01
An evaluation was conducted to determine which syndromic surveillance tools complement traditional surveillance by serving as earlier indicators of influenza activity in Sweden. Web queries, medical hotline statistics, and school absenteeism data were evaluated against two traditional surveillance tools. Cross-correlation calculations utilized aggregated weekly data for all-age, nationwide activity for four influenza seasons, from 2009/2010 to 2012/2013. The surveillance tool indicative of earlier influenza activity, by way of statistical and visual evidence, was identified. The web query algorithm and medical hotline statistics performed equally well as each other and to the traditional surveillance tools. School absenteeism data were not reliable resources for influenza surveillance. Overall, the syndromic surveillance tools did not perform with enough consistency in season lead nor in earlier timing of the peak week to be considered as early indicators. They do, however, capture incident cases before they have formally entered the primary healthcare system.
Brusniak, Mi-Youn; Bodenmiller, Bernd; Campbell, David; Cooke, Kelly; Eddes, James; Garbutt, Andrew; Lau, Hollis; Letarte, Simon; Mueller, Lukas N; Sharma, Vagisha; Vitek, Olga; Zhang, Ning; Aebersold, Ruedi; Watts, Julian D
2008-01-01
Background Quantitative proteomics holds great promise for identifying proteins that are differentially abundant between populations representing different physiological or disease states. A range of computational tools is now available for both isotopically labeled and label-free liquid chromatography mass spectrometry (LC-MS) based quantitative proteomics. However, they are generally not comparable to each other in terms of functionality, user interfaces, information input/output, and do not readily facilitate appropriate statistical data analysis. These limitations, along with the array of choices, present a daunting prospect for biologists, and other researchers not trained in bioinformatics, who wish to use LC-MS-based quantitative proteomics. Results We have developed Corra, a computational framework and tools for discovery-based LC-MS proteomics. Corra extends and adapts existing algorithms used for LC-MS-based proteomics, and statistical algorithms, originally developed for microarray data analyses, appropriate for LC-MS data analysis. Corra also adapts software engineering technologies (e.g. Google Web Toolkit, distributed processing) so that computationally intense data processing and statistical analyses can run on a remote server, while the user controls and manages the process from their own computer via a simple web interface. Corra also allows the user to output significantly differentially abundant LC-MS-detected peptide features in a form compatible with subsequent sequence identification via tandem mass spectrometry (MS/MS). We present two case studies to illustrate the application of Corra to commonly performed LC-MS-based biological workflows: a pilot biomarker discovery study of glycoproteins isolated from human plasma samples relevant to type 2 diabetes, and a study in yeast to identify in vivo targets of the protein kinase Ark1 via phosphopeptide profiling. Conclusion The Corra computational framework leverages computational innovation to enable biologists or other researchers to process, analyze and visualize LC-MS data with what would otherwise be a complex and not user-friendly suite of tools. Corra enables appropriate statistical analyses, with controlled false-discovery rates, ultimately to inform subsequent targeted identification of differentially abundant peptides by MS/MS. For the user not trained in bioinformatics, Corra represents a complete, customizable, free and open source computational platform enabling LC-MS-based proteomic workflows, and as such, addresses an unmet need in the LC-MS proteomics field. PMID:19087345
NASA Technical Reports Server (NTRS)
Chambers, L. H.; Chaudhury, S.; Page, M. T.; Lankey, A. J.; Doughty, J.; Kern, Steven; Rogerson, Tina M.
2008-01-01
During the summer of 2007, as part of the second year of a NASA-funded project in partnership with Christopher Newport University called SPHERE (Students as Professionals Helping Educators Research the Earth), a group of undergraduate students spent 8 weeks in a research internship at or near NASA Langley Research Center. Three students from this group formed the Clouds group along with a NASA mentor (Chambers), and the brief addition of a local high school student fulfilling a mentorship requirement. The Clouds group was given the task of exploring and analyzing ground-based cloud observations obtained by K-12 students as part of the Students' Cloud Observations On-Line (S'COOL) Project, and the corresponding satellite data. This project began in 1997. The primary analysis tools developed for it were in FORTRAN, a computer language none of the students were familiar with. While they persevered through computer challenges and picky syntax, it eventually became obvious that this was not the most fruitful approach for a project aimed at motivating K-12 students to do their own data analysis. Thus, about halfway through the summer the group shifted its focus to more modern data analysis and visualization tools, namely spreadsheets and Google(tm) Earth. The result of their efforts, so far, is two different Excel spreadsheets and a Google(tm) Earth file. The spreadsheets are set up to allow participating classrooms to paste in a particular dataset of interest, using the standard S'COOL format, and easily perform a variety of analyses and comparisons of the ground cloud observation reports and their correspondence with the satellite data. This includes summarizing cloud occurrence and cloud cover statistics, and comparing cloud cover measurements from the two points of view. A visual classification tool is also provided to compare the cloud levels reported from the two viewpoints. This provides a statistical counterpart to the existing S'COOL data visualization tool, which is used for individual ground-to-satellite correspondences. The Google(tm) Earth file contains a set of placemarks and ground overlays to show participating students the area around their school that the satellite is measuring. This approach will be automated and made interactive by the S'COOL database expert and will also be used to help refine the latitude/longitude location of the participating schools. Once complete, these new data analysis tools will be posted on the S'COOL website for use by the project participants in schools around the US and the world.
48 CFR 1852.223-76 - Federal Automotive Statistical Tool Reporting.
Code of Federal Regulations, 2011 CFR
2011-10-01
... data describing vehicle usage required by the Federal Automotive Statistical Tool (FAST) by October 15 of each year. FAST is accessed through http://fastweb.inel.gov/. (End of clause) [68 FR 43334, July...
48 CFR 1852.223-76 - Federal Automotive Statistical Tool Reporting.
Code of Federal Regulations, 2012 CFR
2012-10-01
... data describing vehicle usage required by the Federal Automotive Statistical Tool (FAST) by October 15 of each year. FAST is accessed through http://fastweb.inel.gov/. (End of clause) [68 FR 43334, July...
48 CFR 1852.223-76 - Federal Automotive Statistical Tool Reporting.
Code of Federal Regulations, 2013 CFR
2013-10-01
... data describing vehicle usage required by the Federal Automotive Statistical Tool (FAST) by October 15 of each year. FAST is accessed through http://fastweb.inel.gov/. (End of clause) [68 FR 43334, July...
48 CFR 1852.223-76 - Federal Automotive Statistical Tool Reporting.
Code of Federal Regulations, 2014 CFR
2014-10-01
... data describing vehicle usage required by the Federal Automotive Statistical Tool (FAST) by October 15 of each year. FAST is accessed through http://fastweb.inel.gov/. (End of clause) [68 FR 43334, July...
Palmcrantz, Susanne; Borg, Jörgen; Sommerfeld, Disa; Plantin, Jeanette; Wall, Anneli; Ehn, Maria; Sjölinder, Marie; Boman, Inga-Lill
2017-09-01
In this study an interactive distance solution (called the DISKO tool) was developed to enable home-based motor training after stroke. The overall aim was to explore the feasibility and safety of using the DISKO-tool, customized for interactive stroke rehabilitation in the home setting, in different rehabilitation phases after stroke. Fifteen patients in three different stages in the continuum of rehabilitation after stroke participated in a home-based training program using the DISKO-tool. The program included 15 training sessions with recurrent follow-ups by the integrated application for video communication with a physiotherapist. Safety and feasibility were assessed from patients, physiotherapists, and a technician using logbooks, interviews, and a questionnaire. Qualitative content analysis and descriptive statistics were used in the analysis. Fourteen out of 15 patients finalized the training period with a mean of 19.5 minutes spent on training at each session. The DISKO-tool was found to be useful and safe by patients and physiotherapists. This study demonstrates the feasibility and safety of the DISKO-tool and provides guidance in further development and testing of interactive distance technology for home rehabilitation, to be used by health care professionals and patients in different phases of rehabilitation after stroke.
CsSNP: A Web-Based Tool for the Detecting of Comparative Segments SNPs.
Wang, Yi; Wang, Shuangshuang; Zhou, Dongjie; Yang, Shuai; Xu, Yongchao; Yang, Chao; Yang, Long
2016-07-01
SNP (single nucleotide polymorphism) is a popular tool for the study of genetic diversity, evolution, and other areas. Therefore, it is necessary to develop a convenient, utility, robust, rapid, and open source detecting-SNP tool for all researchers. Since the detection of SNPs needs special software and series steps including alignment, detection, analysis and present, the study of SNPs is limited for nonprofessional users. CsSNP (Comparative segments SNP, http://biodb.sdau.edu.cn/cssnp/ ) is a freely available web tool based on the Blat, Blast, and Perl programs to detect comparative segments SNPs and to show the detail information of SNPs. The results are filtered and presented in the statistics figure and a Gbrowse map. This platform contains the reference genomic sequences and coding sequences of 60 plant species, and also provides new opportunities for the users to detect SNPs easily. CsSNP is provided a convenient tool for nonprofessional users to find comparative segments SNPs in their own sequences, and give the users the information and the analysis of SNPs, and display these data in a dynamic map. It provides a new method to detect SNPs and may accelerate related studies.
NASA Astrophysics Data System (ADS)
Noel, Jean; Prieto, Juan C.; Styner, Martin
2017-03-01
Functional Analysis of Diffusion Tensor Tract Statistics (FADTTS) is a toolbox for analysis of white matter (WM) fiber tracts. It allows associating diffusion properties along major WM bundles with a set of covariates of interest, such as age, diagnostic status and gender, and the structure of the variability of these WM tract properties. However, to use this toolbox, a user must have an intermediate knowledge in scripting languages (MATLAB). FADTTSter was created to overcome this issue and make the statistical analysis accessible to any non-technical researcher. FADTTSter is actively being used by researchers at the University of North Carolina. FADTTSter guides non-technical users through a series of steps including quality control of subjects and fibers in order to setup the necessary parameters to run FADTTS. Additionally, FADTTSter implements interactive charts for FADTTS' outputs. This interactive chart enhances the researcher experience and facilitates the analysis of the results. FADTTSter's motivation is to improve usability and provide a new analysis tool to the community that complements FADTTS. Ultimately, by enabling FADTTS to a broader audience, FADTTSter seeks to accelerate hypothesis testing in neuroimaging studies involving heterogeneous clinical data and diffusion tensor imaging. This work is submitted to the Biomedical Applications in Molecular, Structural, and Functional Imaging conference. The source code of this application is available in NITRC.