Data, Analysis, and Visualization | Computational Science | NREL
Data, Analysis, and Visualization Data, Analysis, and Visualization Data management, data analysis . At NREL, our data management, data analysis, and scientific visualization capabilities help move the approaches to image analysis and computer vision. Data Management and Big Data Systems, software, and tools
Qualitative data analysis: conceptual and practical considerations.
Liamputtong, Pranee
2009-08-01
Qualitative inquiry requires that collected data is organised in a meaningful way, and this is referred to as data analysis. Through analytic processes, researchers turn what can be voluminous data into understandable and insightful analysis. This paper sets out the different approaches that qualitative researchers can use to make sense of their data including thematic analysis, narrative analysis, discourse analysis and semiotic analysis and discusses the ways that qualitative researchers can analyse their data. I first discuss salient issues in performing qualitative data analysis, and then proceed to provide some suggestions on different methods of data analysis in qualitative research. Finally, I provide some discussion on the use of computer-assisted data analysis.
Primary, Secondary, and Meta-Analysis of Research
ERIC Educational Resources Information Center
Glass, Gene V.
1976-01-01
Examines data analysis at three levels: analysis of data; secondary analysis is the re-analysis of data for the purpose of answering the original research question with better statistical techniques, or answering new questions with old data; and, meta-analysis refers to the statistical analysis of many analysis results from individual studies for…
MDAS: an integrated system for metabonomic data analysis.
Liu, Juan; Li, Bo; Xiong, Jiang-Hui
2009-03-01
Metabonomics, the latest 'omics' research field, shows great promise as a tool in biomarker discovery, drug efficacy and toxicity analysis, disease diagnosis and prognosis. One of the major challenges now facing researchers is how to process this data to yield useful information about a biological system, e.g., the mechanism of diseases. Traditional methods employed in metabonomic data analysis use multivariate analysis methods developed independently in chemometrics research. Additionally, with the development of machine learning approaches, some methods such as SVMs also show promise for use in metabonomic data analysis. Aside from the application of general multivariate analysis and machine learning methods to this problem, there is also a need for an integrated tool customized for metabonomic data analysis which can be easily used by biologists to reveal interesting patterns in metabonomic data.In this paper, we present a novel software tool MDAS (Metabonomic Data Analysis System) for metabonomic data analysis which integrates traditional chemometrics methods and newly introduced machine learning approaches. MDAS contains a suite of functional models for metabonomic data analysis and optimizes the flow of data analysis. Several file formats can be accepted as input. The input data can be optionally preprocessed and can then be processed with operations such as feature analysis and dimensionality reduction. The data with reduced dimensionalities can be used for training or testing through machine learning models. The system supplies proper visualization for data preprocessing, feature analysis, and classification which can be a powerful function for users to extract knowledge from the data. MDAS is an integrated platform for metabonomic data analysis, which transforms a complex analysis procedure into a more formalized and simplified one. The software package can be obtained from the authors.
An Array of Qualitative Data Analysis Tools: A Call for Data Analysis Triangulation
ERIC Educational Resources Information Center
Leech, Nancy L.; Onwuegbuzie, Anthony J.
2007-01-01
One of the most important steps in the qualitative research process is analysis of data. The purpose of this article is to provide elements for understanding multiple types of qualitative data analysis techniques available and the importance of utilizing more than one type of analysis, thus utilizing data analysis triangulation, in order to…
Data and Tools | Energy Analysis | NREL
and Tools Energy Analysis Data and Tools NREL develops energy analysis data and tools to assess collections. Data Products Technology and Performance Analysis Tools Energy Systems Analysis Tools Economic and Financial Analysis Tools
Online Analysis Enhances Use of NASA Earth Science Data
NASA Technical Reports Server (NTRS)
Acker, James G.; Leptoukh, Gregory
2007-01-01
Giovanni, the Goddard Earth Sciences Data and Information Services Center (GES DISC) Interactive Online Visualization and Analysis Infrastructure, has provided researchers with advanced capabilities to perform data exploration and analysis with observational data from NASA Earth observation satellites. In the past 5-10 years, examining geophysical events and processes with remote-sensing data required a multistep process of data discovery, data acquisition, data management, and ultimately data analysis. Giovanni accelerates this process by enabling basic visualization and analysis directly on the World Wide Web. In the last two years, Giovanni has added new data acquisition functions and expanded analysis options to increase its usefulness to the Earth science research community.
Advantages of Integrative Data Analysis for Developmental Research
ERIC Educational Resources Information Center
Bainter, Sierra A.; Curran, Patrick J.
2015-01-01
Amid recent progress in cognitive development research, high-quality data resources are accumulating, and data sharing and secondary data analysis are becoming increasingly valuable tools. Integrative data analysis (IDA) is an exciting analytical framework that can enhance secondary data analysis in powerful ways. IDA pools item-level data across…
First On-Site Data Analysis System for Subaru/Suprime-Cam
NASA Astrophysics Data System (ADS)
Furusawa, Hisanori; Okura, Yuki; Mineo, Sogo; Takata, Tadafumi; Nakata, Fumiaki; Tanaka, Manobu; Katayama, Nobuhiko; Itoh, Ryosuke; Yasuda, Naoki; Miyazaki, Satoshi; Komiyama, Yutaka; Utsumi, Yousuke; Uchida, Tomohisa; Aihara, Hiroaki
2011-03-01
We developed an automated on-site quick analysis system for mosaic CCD data of Suprime-Cam, which is a wide-field camera mounted at the prime focus of the Subaru Telescope, Mauna Kea, Hawaii. The first version of the data-analysis system was constructed, and started to operate in general observations. This system is a new function of observing support at the Subaru Telescope to provide the Subaru user community with an automated on-site data evaluation, aiming at improvements of observers' productivity, especially in large imaging surveys. The new system assists the data evaluation tasks in observations by the continuous monitoring of the characteristics of every data frame during observations. The evaluation results and data frames processed by this system are also useful for reducing the data-processing time in a full analysis after an observation. The primary analysis functions implemented in the data-analysis system are composed of automated realtime analysis for data evaluation and on-demand analysis, which is executed upon request, including mosaicing analysis and flat making analysis. In data evaluation, which is controlled by the organizing software, the database keeps track of the analysis histories, as well as the evaluated values of data frames, including seeing and sky background levels; it also helps in the selection of frames for mosaicing and flat making analysis. We examined the system performance and confirmed an improvement in the data-processing time by a factor of 9 with the aid of distributed parallel data processing and on-memory data processing, which makes the automated data evaluation effective.
NeoAnalysis: a Python-based toolbox for quick electrophysiological data processing and analysis.
Zhang, Bo; Dai, Ji; Zhang, Tao
2017-11-13
In a typical electrophysiological experiment, especially one that includes studying animal behavior, the data collected normally contain spikes, local field potentials, behavioral responses and other associated data. In order to obtain informative results, the data must be analyzed simultaneously with the experimental settings. However, most open-source toolboxes currently available for data analysis were developed to handle only a portion of the data and did not take into account the sorting of experimental conditions. Additionally, these toolboxes require that the input data be in a specific format, which can be inconvenient to users. Therefore, the development of a highly integrated toolbox that can process multiple types of data regardless of input data format and perform basic analysis for general electrophysiological experiments is incredibly useful. Here, we report the development of a Python based open-source toolbox, referred to as NeoAnalysis, to be used for quick electrophysiological data processing and analysis. The toolbox can import data from different data acquisition systems regardless of their formats and automatically combine different types of data into a single file with a standardized format. In cases where additional spike sorting is needed, NeoAnalysis provides a module to perform efficient offline sorting with a user-friendly interface. Then, NeoAnalysis can perform regular analog signal processing, spike train, and local field potentials analysis, behavioral response (e.g. saccade) detection and extraction, with several options available for data plotting and statistics. Particularly, it can automatically generate sorted results without requiring users to manually sort data beforehand. In addition, NeoAnalysis can organize all of the relevant data into an informative table on a trial-by-trial basis for data visualization. Finally, NeoAnalysis supports analysis at the population level. With the multitude of general-purpose functions provided by NeoAnalysis, users can easily obtain publication-quality figures without writing complex codes. NeoAnalysis is a powerful and valuable toolbox for users doing electrophysiological experiments.
A Hierarchical Visualization Analysis Model of Power Big Data
NASA Astrophysics Data System (ADS)
Li, Yongjie; Wang, Zheng; Hao, Yang
2018-01-01
Based on the conception of integrating VR scene and power big data analysis, a hierarchical visualization analysis model of power big data is proposed, in which levels are designed, targeting at different abstract modules like transaction, engine, computation, control and store. The regularly departed modules of power data storing, data mining and analysis, data visualization are integrated into one platform by this model. It provides a visual analysis solution for the power big data.
Secondary data analysis of large data sets in urology: successes and errors to avoid.
Schlomer, Bruce J; Copp, Hillary L
2014-03-01
Secondary data analysis is the use of data collected for research by someone other than the investigator. In the last several years there has been a dramatic increase in the number of these studies being published in urological journals and presented at urological meetings, especially involving secondary data analysis of large administrative data sets. Along with this expansion, skepticism for secondary data analysis studies has increased for many urologists. In this narrative review we discuss the types of large data sets that are commonly used for secondary data analysis in urology, and discuss the advantages and disadvantages of secondary data analysis. A literature search was performed to identify urological secondary data analysis studies published since 2008 using commonly used large data sets, and examples of high quality studies published in high impact journals are given. We outline an approach for performing a successful hypothesis or goal driven secondary data analysis study and highlight common errors to avoid. More than 350 secondary data analysis studies using large data sets have been published on urological topics since 2008 with likely many more studies presented at meetings but never published. Nonhypothesis or goal driven studies have likely constituted some of these studies and have probably contributed to the increased skepticism of this type of research. However, many high quality, hypothesis driven studies addressing research questions that would have been difficult to conduct with other methods have been performed in the last few years. Secondary data analysis is a powerful tool that can address questions which could not be adequately studied by another method. Knowledge of the limitations of secondary data analysis and of the data sets used is critical for a successful study. There are also important errors to avoid when planning and performing a secondary data analysis study. Investigators and the urological community need to strive to use secondary data analysis of large data sets appropriately to produce high quality studies that hopefully lead to improved patient outcomes. Copyright © 2014 American Urological Association Education and Research, Inc. Published by Elsevier Inc. All rights reserved.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Peters, Valerie A.; Ogilvie, Alistair B.
2012-01-01
This report addresses the general data requirements for reliability analysis of fielded wind turbines and other wind plant equipment. The report provides a rationale for why this data should be collected, a list of the data needed to support reliability and availability analysis, and specific data recommendations for a Computerized Maintenance Management System (CMMS) to support automated analysis. This data collection recommendations report was written by Sandia National Laboratories to address the general data requirements for reliability analysis of operating wind turbines. This report is intended to help develop a basic understanding of the data needed for reliability analysis frommore » a Computerized Maintenance Management System (CMMS) and other data systems. The report provides a rationale for why this data should be collected, a list of the data needed to support reliability and availability analysis, and specific recommendations for a CMMS to support automated analysis. Though written for reliability analysis of wind turbines, much of the information is applicable to a wider variety of equipment and analysis and reporting needs. The 'Motivation' section of this report provides a rationale for collecting and analyzing field data for reliability analysis. The benefits of this type of effort can include increased energy delivered, decreased operating costs, enhanced preventive maintenance schedules, solutions to issues with the largest payback, and identification of early failure indicators.« less
Trivedi, Prinal; Edwards, Jode W; Wang, Jelai; Gadbury, Gary L; Srinivasasainagendra, Vinodh; Zakharkin, Stanislav O; Kim, Kyoungmi; Mehta, Tapan; Brand, Jacob P L; Patki, Amit; Page, Grier P; Allison, David B
2005-04-06
Many efforts in microarray data analysis are focused on providing tools and methods for the qualitative analysis of microarray data. HDBStat! (High-Dimensional Biology-Statistics) is a software package designed for analysis of high dimensional biology data such as microarray data. It was initially developed for the analysis of microarray gene expression data, but it can also be used for some applications in proteomics and other aspects of genomics. HDBStat! provides statisticians and biologists a flexible and easy-to-use interface to analyze complex microarray data using a variety of methods for data preprocessing, quality control analysis and hypothesis testing. Results generated from data preprocessing methods, quality control analysis and hypothesis testing methods are output in the form of Excel CSV tables, graphs and an Html report summarizing data analysis. HDBStat! is a platform-independent software that is freely available to academic institutions and non-profit organizations. It can be downloaded from our website http://www.soph.uab.edu/ssg_content.asp?id=1164.
Implementing EVM Data Analysis Adding Value from a NASA Project Manager's Perspective
NASA Technical Reports Server (NTRS)
Counts, Stacy; Kerby, Jerald
2006-01-01
Data Analysis is one of the keys to an effective Earned Value Management (EVM) Process. Project Managers (PM) must continually evaluate data in assessing the health of their projects. Good analysis of data can assist PMs in making better decisions in managing projects. To better support our P Ms, National Aeronautics and Space Administration (NASA) - Marshall Space Flight Center (MSFC) recently renewed its emphasis on sound EVM data analysis practices and processes, During this presentation we will discuss the approach that MSFC followed in implementing better data analysis across its Center. We will address our approach to effectively equip and support our projects in applying a sound data analysis process. In addition, the PM for the Space Station Biological Research Project will share her experiences of how effective data analysis can benefit a PM in the decision making process. The PM will discuss how the emphasis on data analysis has helped create a solid method for assessing the project s performance. Using data analysis successfully can be an effective and efficient tool in today s environment with increasing workloads and downsizing workforces
Zhu, Yuerong; Zhu, Yuelin; Xu, Wei
2008-01-01
Background Though microarray experiments are very popular in life science research, managing and analyzing microarray data are still challenging tasks for many biologists. Most microarray programs require users to have sophisticated knowledge of mathematics, statistics and computer skills for usage. With accumulating microarray data deposited in public databases, easy-to-use programs to re-analyze previously published microarray data are in high demand. Results EzArray is a web-based Affymetrix expression array data management and analysis system for researchers who need to organize microarray data efficiently and get data analyzed instantly. EzArray organizes microarray data into projects that can be analyzed online with predefined or custom procedures. EzArray performs data preprocessing and detection of differentially expressed genes with statistical methods. All analysis procedures are optimized and highly automated so that even novice users with limited pre-knowledge of microarray data analysis can complete initial analysis quickly. Since all input files, analysis parameters, and executed scripts can be downloaded, EzArray provides maximum reproducibility for each analysis. In addition, EzArray integrates with Gene Expression Omnibus (GEO) and allows instantaneous re-analysis of published array data. Conclusion EzArray is a novel Affymetrix expression array data analysis and sharing system. EzArray provides easy-to-use tools for re-analyzing published microarray data and will help both novice and experienced users perform initial analysis of their microarray data from the location of data storage. We believe EzArray will be a useful system for facilities with microarray services and laboratories with multiple members involved in microarray data analysis. EzArray is freely available from . PMID:18218103
Exploring and Analyzing Climate Variations Online by Using NASA MERRA-2 Data at GES DISC
NASA Technical Reports Server (NTRS)
Shen, Suhung; Ostrenga, Dana M.; Vollmer, Bruce E.; Kempler, Steven J.
2016-01-01
NASA Giovanni (Goddard Interactive Online Visualization ANd aNalysis Infrastructure) (http:giovanni.sci.gsfc.nasa.govgiovanni) is a web-based data visualization and analysis system developed by the Goddard Earth Sciences Data and Information Services Center (GES DISC). Current data analysis functions include Lat-Lon map, time series, scatter plot, correlation map, difference, cross-section, vertical profile, and animation etc. The system enables basic statistical analysis and comparisons of multiple variables. This web-based tool facilitates data discovery, exploration and analysis of large amount of global and regional remote sensing and model data sets from a number of NASA data centers. Long term global assimilated atmospheric, land, and ocean data have been integrated into the system that enables quick exploration and analysis of climate data without downloading, preprocessing, and learning data. Example data include climate reanalysis data from NASA Modern-Era Retrospective analysis for Research and Applications, Version 2 (MERRA-2) which provides data beginning in 1980 to present; land data from NASA Global Land Data Assimilation System (GLDAS), which assimilates data from 1948 to 2012; as well as ocean biological data from NASA Ocean Biogeochemical Model (NOBM), which provides data from 1998 to 2012. This presentation, using surface air temperature, precipitation, ozone, and aerosol, etc. from MERRA-2, demonstrates climate variation analysis with Giovanni at selected regions.
Exploratory Climate Data Visualization and Analysis Using DV3D and UVCDAT
NASA Technical Reports Server (NTRS)
Maxwell, Thomas
2012-01-01
Earth system scientists are being inundated by an explosion of data generated by ever-increasing resolution in both global models and remote sensors. Advanced tools for accessing, analyzing, and visualizing very large and complex climate data are required to maintain rapid progress in Earth system research. To meet this need, NASA, in collaboration with the Ultra-scale Visualization Climate Data Analysis Tools (UVCOAT) consortium, is developing exploratory climate data analysis and visualization tools which provide data analysis capabilities for the Earth System Grid (ESG). This paper describes DV3D, a UV-COAT package that enables exploratory analysis of climate simulation and observation datasets. OV3D provides user-friendly interfaces for visualization and analysis of climate data at a level appropriate for scientists. It features workflow inte rfaces, interactive 40 data exploration, hyperwall and stereo visualization, automated provenance generation, and parallel task execution. DV30's integration with CDAT's climate data management system (COMS) and other climate data analysis tools provides a wide range of high performance climate data analysis operations. DV3D expands the scientists' toolbox by incorporating a suite of rich new exploratory visualization and analysis methods for addressing the complexity of climate datasets.
40 CFR 92.131 - Smoke, data analysis.
Code of Federal Regulations, 2010 CFR
2010-07-01
... 40 Protection of Environment 20 2010-07-01 2010-07-01 false Smoke, data analysis. 92.131 Section... analysis. The following procedure shall be used to analyze the smoke test data: (a) Locate each throttle... performed by direct analysis of the recorder traces, or by computer analysis of data collected by automatic...
Putative regulatory sites unraveled by network-embedded thermodynamic analysis of metabolome data
Kümmel, Anne; Panke, Sven; Heinemann, Matthias
2006-01-01
As one of the most recent members of the omics family, large-scale quantitative metabolomics data are currently complementing our systems biology data pool and offer the chance to integrate the metabolite level into the functional analysis of cellular networks. Network-embedded thermodynamic analysis (NET analysis) is presented as a framework for mechanistic and model-based analysis of these data. By coupling the data to an operating metabolic network via the second law of thermodynamics and the metabolites' Gibbs energies of formation, NET analysis allows inferring functional principles from quantitative metabolite data; for example it identifies reactions that are subject to active allosteric or genetic regulation as exemplified with quantitative metabolite data from Escherichia coli and Saccharomyces cerevisiae. Moreover, the optimization framework of NET analysis was demonstrated to be a valuable tool to systematically investigate data sets for consistency, for the extension of sub-omic metabolome data sets and for resolving intracompartmental concentrations from cell-averaged metabolome data. Without requiring any kind of kinetic modeling, NET analysis represents a perfectly scalable and unbiased approach to uncover insights from quantitative metabolome data. PMID:16788595
CADDIS Volume 4. Data Analysis: Getting Started
Assembling data for an ecological causal analysis, matching biological and environmental samples in time and space, organizing data along conceptual causal pathways, data quality and quantity requirements, Data Analysis references.
Bayesian data analysis for newcomers.
Kruschke, John K; Liddell, Torrin M
2018-02-01
This article explains the foundational concepts of Bayesian data analysis using virtually no mathematical notation. Bayesian ideas already match your intuitions from everyday reasoning and from traditional data analysis. Simple examples of Bayesian data analysis are presented that illustrate how the information delivered by a Bayesian analysis can be directly interpreted. Bayesian approaches to null-value assessment are discussed. The article clarifies misconceptions about Bayesian methods that newcomers might have acquired elsewhere. We discuss prior distributions and explain how they are not a liability but an important asset. We discuss the relation of Bayesian data analysis to Bayesian models of mind, and we briefly discuss what methodological problems Bayesian data analysis is not meant to solve. After you have read this article, you should have a clear sense of how Bayesian data analysis works and the sort of information it delivers, and why that information is so intuitive and useful for drawing conclusions from data.
Visualization techniques to aid in the analysis of multi-spectral astrophysical data sets
NASA Technical Reports Server (NTRS)
Brugel, Edward W.; Domik, Gitta O.; Ayres, Thomas R.
1993-01-01
The goal of this project was to support the scientific analysis of multi-spectral astrophysical data by means of scientific visualization. Scientific visualization offers its greatest value if it is not used as a method separate or alternative to other data analysis methods but rather in addition to these methods. Together with quantitative analysis of data, such as offered by statistical analysis, image or signal processing, visualization attempts to explore all information inherent in astrophysical data in the most effective way. Data visualization is one aspect of data analysis. Our taxonomy as developed in Section 2 includes identification and access to existing information, preprocessing and quantitative analysis of data, visual representation and the user interface as major components to the software environment of astrophysical data analysis. In pursuing our goal to provide methods and tools for scientific visualization of multi-spectral astrophysical data, we therefore looked at scientific data analysis as one whole process, adding visualization tools to an already existing environment and integrating the various components that define a scientific data analysis environment. As long as the software development process of each component is separate from all other components, users of data analysis software are constantly interrupted in their scientific work in order to convert from one data format to another, or to move from one storage medium to another, or to switch from one user interface to another. We also took an in-depth look at scientific visualization and its underlying concepts, current visualization systems, their contributions, and their shortcomings. The role of data visualization is to stimulate mental processes different from quantitative data analysis, such as the perception of spatial relationships or the discovery of patterns or anomalies while browsing through large data sets. Visualization often leads to an intuitive understanding of the meaning of data values and their relationships by sacrificing accuracy in interpreting the data values. In order to be accurate in the interpretation, data values need to be measured, computed on, and compared to theoretical or empirical models (quantitative analysis). If visualization software hampers quantitative analysis (which happens with some commercial visualization products), its use is greatly diminished for astrophysical data analysis. The software system STAR (Scientific Toolkit for Astrophysical Research) was developed as a prototype during the course of the project to better understand the pragmatic concerns raised in the project. STAR led to a better understanding on the importance of collaboration between astrophysicists and computer scientists.
Exploring and Analyzing Climate Variations Online by Using MERRA-2 data at GES DISC
NASA Astrophysics Data System (ADS)
Shen, S.; Ostrenga, D.; Vollmer, B.; Kempler, S.
2016-12-01
NASA Giovanni (Geospatial Interactive Online Visualization ANd aNalysis Infrastructure) (http://giovanni.sci.gsfc.nasa.gov/giovanni/) is a web-based data visualization and analysis system developed by the Goddard Earth Sciences Data and Information Services Center (GES DISC). Current data analysis functions include Lat-Lon map, time series, scatter plot, correlation map, difference, cross-section, vertical profile, and animation etc. The system enables basic statistical analysis and comparisons of multiple variables. This web-based tool facilitates data discovery, exploration and analysis of large amount of global and regional remote sensing and model data sets from a number of NASA data centers. Recently, long term global assimilated atmospheric, land, and ocean data have been integrated into the system that enables quick exploration and analysis of climate data without downloading, and preprocessing the data. Example data include climate reanalysis from NASA Modern-Era Retrospective analysis for Research and Applications, Version 2 (MERRA-2) which provides data beginning 1980 to present; land data from NASA Global Land Data Assimilation System (GLDAS) which assimilates data from 1948 to 2012; as well as ocean biological data from NASA Ocean Biogeochemical Model (NOBM) which assimilates data from 1998 to 2012. This presentation, using surface air temperature, precipitation, ozone, and aerosol, etc. from MERRA-2, demonstrates climate variation analysis with Giovanni at selected regions.
Anima: Modular Workflow System for Comprehensive Image Data Analysis
Rantanen, Ville; Valori, Miko; Hautaniemi, Sampsa
2014-01-01
Modern microscopes produce vast amounts of image data, and computational methods are needed to analyze and interpret these data. Furthermore, a single image analysis project may require tens or hundreds of analysis steps starting from data import and pre-processing to segmentation and statistical analysis; and ending with visualization and reporting. To manage such large-scale image data analysis projects, we present here a modular workflow system called Anima. Anima is designed for comprehensive and efficient image data analysis development, and it contains several features that are crucial in high-throughput image data analysis: programing language independence, batch processing, easily customized data processing, interoperability with other software via application programing interfaces, and advanced multivariate statistical analysis. The utility of Anima is shown with two case studies focusing on testing different algorithms developed in different imaging platforms and an automated prediction of alive/dead C. elegans worms by integrating several analysis environments. Anima is a fully open source and available with documentation at www.anduril.org/anima. PMID:25126541
Full Life Cycle of Data Analysis with Climate Model Diagnostic Analyzer (CMDA)
NASA Astrophysics Data System (ADS)
Lee, S.; Zhai, C.; Pan, L.; Tang, B.; Zhang, J.; Bao, Q.; Malarout, N.
2017-12-01
We have developed a system that supports the full life cycle of a data analysis process, from data discovery, to data customization, to analysis, to reanalysis, to publication, and to reproduction. The system called Climate Model Diagnostic Analyzer (CMDA) is designed to demonstrate that the full life cycle of data analysis can be supported within one integrated system for climate model diagnostic evaluation with global observational and reanalysis datasets. CMDA has four subsystems that are highly integrated to support the analysis life cycle. Data System manages datasets used by CMDA analysis tools, Analysis System manages CMDA analysis tools which are all web services, Provenance System manages the meta data of CMDA datasets and the provenance of CMDA analysis history, and Recommendation System extracts knowledge from CMDA usage history and recommends datasets/analysis tools to users. These four subsystems are not only highly integrated but also easily expandable. New datasets can be easily added to Data System and scanned to be visible to the other subsystems. New analysis tools can be easily registered to be available in the Analysis System and Provenance System. With CMDA, a user can start a data analysis process by discovering datasets of relevance to their research topic using the Recommendation System. Next, the user can customize the discovered datasets for their scientific use (e.g. anomaly calculation, regridding, etc) with tools in the Analysis System. Next, the user can do their analysis with the tools (e.g. conditional sampling, time averaging, spatial averaging) in the Analysis System. Next, the user can reanalyze the datasets based on the previously stored analysis provenance in the Provenance System. Further, they can publish their analysis process and result to the Provenance System to share with other users. Finally, any user can reproduce the published analysis process and results. By supporting the full life cycle of climate data analysis, CMDA improves the research productivity and collaboration level of its user.
TRAPR: R Package for Statistical Analysis and Visualization of RNA-Seq Data.
Lim, Jae Hyun; Lee, Soo Youn; Kim, Ju Han
2017-03-01
High-throughput transcriptome sequencing, also known as RNA sequencing (RNA-Seq), is a standard technology for measuring gene expression with unprecedented accuracy. Numerous bioconductor packages have been developed for the statistical analysis of RNA-Seq data. However, these tools focus on specific aspects of the data analysis pipeline, and are difficult to appropriately integrate with one another due to their disparate data structures and processing methods. They also lack visualization methods to confirm the integrity of the data and the process. In this paper, we propose an R-based RNA-Seq analysis pipeline called TRAPR, an integrated tool that facilitates the statistical analysis and visualization of RNA-Seq expression data. TRAPR provides various functions for data management, the filtering of low-quality data, normalization, transformation, statistical analysis, data visualization, and result visualization that allow researchers to build customized analysis pipelines.
DataHub: Knowledge-based data management for data discovery
NASA Astrophysics Data System (ADS)
Handley, Thomas H.; Li, Y. Philip
1993-08-01
Currently available database technology is largely designed for business data-processing applications, and seems inadequate for scientific applications. The research described in this paper, the DataHub, will address the issues associated with this shortfall in technology utilization and development. The DataHub development is addressing the key issues in scientific data management of scientific database models and resource sharing in a geographically distributed, multi-disciplinary, science research environment. Thus, the DataHub will be a server between the data suppliers and data consumers to facilitate data exchanges, to assist science data analysis, and to provide as systematic approach for science data management. More specifically, the DataHub's objectives are to provide support for (1) exploratory data analysis (i.e., data driven analysis); (2) data transformations; (3) data semantics capture and usage; analysis-related knowledge capture and usage; and (5) data discovery, ingestion, and extraction. Applying technologies that vary from deductive databases, semantic data models, data discovery, knowledge representation and inferencing, exploratory data analysis techniques and modern man-machine interfaces, DataHub will provide a prototype, integrated environement to support research scientists' needs in multiple disciplines (i.e. oceanography, geology, and atmospheric) while addressing the more general science data management issues. Additionally, the DataHub will provide data management services to exploratory data analysis applications such as LinkWinds and NCSA's XIMAGE.
Visualization techniques to aid in the analysis of multispectral astrophysical data sets
NASA Technical Reports Server (NTRS)
Brugel, E. W.; Domik, Gitta O.; Ayres, T. R.
1993-01-01
The goal of this project was to support the scientific analysis of multi-spectral astrophysical data by means of scientific visualization. Scientific visualization offers its greatest value if it is not used as a method separate or alternative to other data analysis methods but rather in addition to these methods. Together with quantitative analysis of data, such as offered by statistical analysis, image or signal processing, visualization attempts to explore all information inherent in astrophysical data in the most effective way. Data visualization is one aspect of data analysis. Our taxonomy as developed in Section 2 includes identification and access to existing information, preprocessing and quantitative analysis of data, visual representation and the user interface as major components to the software environment of astrophysical data analysis. In pursuing our goal to provide methods and tools for scientific visualization of multi-spectral astrophysical data, we therefore looked at scientific data analysis as one whole process, adding visualization tools to an already existing environment and integrating the various components that define a scientific data analysis environment. As long as the software development process of each component is separate from all other components, users of data analysis software are constantly interrupted in their scientific work in order to convert from one data format to another, or to move from one storage medium to another, or to switch from one user interface to another. We also took an in-depth look at scientific visualization and its underlying concepts, current visualization systems, their contributions and their shortcomings. The role of data visualization is to stimulate mental processes different from quantitative data analysis, such as the perception of spatial relationships or the discovery of patterns or anomalies while browsing through large data sets. Visualization often leads to an intuitive understanding of the meaning of data values and their relationships by sacrificing accuracy in interpreting the data values. In order to be accurate in the interpretation, data values need to be measured, computed on, and compared to theoretical or empirical models (quantitative analysis). If visualization software hampers quantitative analysis (which happens with some commercial visualization products), its use is greatly diminished for astrophysical data analysis. The software system STAR (Scientific Toolkit for Astrophysical Research) was developed as a prototype during the course of the project to better understand the pragmatic concerns raised in the project. STAR led to a better understanding on the importance of collaboration between astrophysicists and computer scientists. Twenty-one examples of the use of visualization for astrophysical data are included with this report. Sixteen publications related to efforts performed during or initiated through work on this project are listed at the end of this report.
Lee, Hyeongyu; Choi, Yosoon; Suh, Jangwon; Lee, Seung-Ho
2016-01-01
Understanding spatial variation of potentially toxic trace elements (PTEs) in soil is necessary to identify the proper measures for preventing soil contamination at both operating and abandoned mining areas. Many studies have been conducted worldwide to explore the spatial variation of PTEs and to create soil contamination maps using geostatistical methods. However, they generally depend only on inductively coupled plasma atomic emission spectrometry (ICP–AES) analysis data, therefore such studies are limited by insufficient input data owing to the disadvantages of ICP–AES analysis such as its costly operation and lengthy period required for analysis. To overcome this limitation, this study used both ICP–AES and portable X-ray fluorescence (PXRF) analysis data, with relatively low accuracy, for mapping copper and lead concentrations at a section of the Busan abandoned mine in Korea and compared the prediction performances of four different approaches: the application of ordinary kriging to ICP–AES analysis data, PXRF analysis data, both ICP–AES and transformed PXRF analysis data by considering the correlation between the ICP–AES and PXRF analysis data, and co-kriging to both the ICP–AES (primary variable) and PXRF analysis data (secondary variable). Their results were compared using an independent validation data set. The results obtained in this case study showed that the application of ordinary kriging to both ICP–AES and transformed PXRF analysis data is the most accurate approach when considers the spatial distribution of copper and lead contaminants in the soil and the estimation errors at 11 sampling points for validation. Therefore, when generating soil contamination maps for an abandoned mine, it is beneficial to use the proposed approach that incorporates the advantageous aspects of both ICP–AES and PXRF analysis data. PMID:27043594
Lee, Hyeongyu; Choi, Yosoon; Suh, Jangwon; Lee, Seung-Ho
2016-03-30
Understanding spatial variation of potentially toxic trace elements (PTEs) in soil is necessary to identify the proper measures for preventing soil contamination at both operating and abandoned mining areas. Many studies have been conducted worldwide to explore the spatial variation of PTEs and to create soil contamination maps using geostatistical methods. However, they generally depend only on inductively coupled plasma atomic emission spectrometry (ICP-AES) analysis data, therefore such studies are limited by insufficient input data owing to the disadvantages of ICP-AES analysis such as its costly operation and lengthy period required for analysis. To overcome this limitation, this study used both ICP-AES and portable X-ray fluorescence (PXRF) analysis data, with relatively low accuracy, for mapping copper and lead concentrations at a section of the Busan abandoned mine in Korea and compared the prediction performances of four different approaches: the application of ordinary kriging to ICP-AES analysis data, PXRF analysis data, both ICP-AES and transformed PXRF analysis data by considering the correlation between the ICP-AES and PXRF analysis data, and co-kriging to both the ICP-AES (primary variable) and PXRF analysis data (secondary variable). Their results were compared using an independent validation data set. The results obtained in this case study showed that the application of ordinary kriging to both ICP-AES and transformed PXRF analysis data is the most accurate approach when considers the spatial distribution of copper and lead contaminants in the soil and the estimation errors at 11 sampling points for validation. Therefore, when generating soil contamination maps for an abandoned mine, it is beneficial to use the proposed approach that incorporates the advantageous aspects of both ICP-AES and PXRF analysis data.
Pappas, Derek J; Marin, Wesley; Hollenbach, Jill A; Mack, Steven J
2016-03-01
Bridging ImmunoGenomic Data-Analysis Workflow Gaps (BIGDAWG) is an integrated data-analysis pipeline designed for the standardized analysis of highly-polymorphic genetic data, specifically for the HLA and KIR genetic systems. Most modern genetic analysis programs are designed for the analysis of single nucleotide polymorphisms, but the highly polymorphic nature of HLA and KIR data require specialized methods of data analysis. BIGDAWG performs case-control data analyses of highly polymorphic genotype data characteristic of the HLA and KIR loci. BIGDAWG performs tests for Hardy-Weinberg equilibrium, calculates allele frequencies and bins low-frequency alleles for k×2 and 2×2 chi-squared tests, and calculates odds ratios, confidence intervals and p-values for each allele. When multi-locus genotype data are available, BIGDAWG estimates user-specified haplotypes and performs the same binning and statistical calculations for each haplotype. For the HLA loci, BIGDAWG performs the same analyses at the individual amino-acid level. Finally, BIGDAWG generates figures and tables for each of these comparisons. BIGDAWG obviates the error-prone reformatting needed to traffic data between multiple programs, and streamlines and standardizes the data-analysis process for case-control studies of highly polymorphic data. BIGDAWG has been implemented as the bigdawg R package and as a free web application at bigdawg.immunogenomics.org. Copyright © 2015 American Society for Histocompatibility and Immunogenetics. Published by Elsevier Inc. All rights reserved.
Kel, AlexanderE
2017-02-01
Computational analysis of master regulators through the search for transcription factor binding sites followed by analysis of signal transduction networks of a cell is a new approach of causal analysis of multi-omics data. This paper contains results on analysis of multi-omics data that include transcriptomics, proteomics and epigenomics data of methotrexate (MTX) resistant colon cancer cell line. The data were used for analysis of mechanisms of resistance and for prediction of potential drug targets and promising compounds for reverting the MTX resistance of these cancer cells. We present all results of the analysis including the lists of identified transcription factors and their binding sites in genome and the list of predicted master regulators - potential drug targets. This data was generated in the study recently published in the article "Multi-omics "Upstream Analysis" of regulatory genomic regions helps identifying targets against methotrexate resistance of colon cancer" (Kel et al., 2016) [4]. These data are of interest for researchers from the field of multi-omics data analysis and for biologists who are interested in identification of novel drug targets against NTX resistance.
Visual Data Analysis for Satellites
NASA Technical Reports Server (NTRS)
Lau, Yee; Bhate, Sachin; Fitzpatrick, Patrick
2008-01-01
The Visual Data Analysis Package is a collection of programs and scripts that facilitate visual analysis of data available from NASA and NOAA satellites, as well as dropsonde, buoy, and conventional in-situ observations. The package features utilities for data extraction, data quality control, statistical analysis, and data visualization. The Hierarchical Data Format (HDF) satellite data extraction routines from NASA's Jet Propulsion Laboratory were customized for specific spatial coverage and file input/output. Statistical analysis includes the calculation of the relative error, the absolute error, and the root mean square error. Other capabilities include curve fitting through the data points to fill in missing data points between satellite passes or where clouds obscure satellite data. For data visualization, the software provides customizable Generic Mapping Tool (GMT) scripts to generate difference maps, scatter plots, line plots, vector plots, histograms, timeseries, and color fill images.
Tools for T-RFLP data analysis using Excel.
Fredriksson, Nils Johan; Hermansson, Malte; Wilén, Britt-Marie
2014-11-08
Terminal restriction fragment length polymorphism (T-RFLP) analysis is a DNA-fingerprinting method that can be used for comparisons of the microbial community composition in a large number of samples. There is no consensus on how T-RFLP data should be treated and analyzed before comparisons between samples are made, and several different approaches have been proposed in the literature. The analysis of T-RFLP data can be cumbersome and time-consuming, and for large datasets manual data analysis is not feasible. The currently available tools for automated T-RFLP analysis, although valuable, offer little flexibility, and few, if any, options regarding what methods to use. To enable comparisons and combinations of different data treatment methods an analysis template and an extensive collection of macros for T-RFLP data analysis using Microsoft Excel were developed. The Tools for T-RFLP data analysis template provides procedures for the analysis of large T-RFLP datasets including application of a noise baseline threshold and setting of the analysis range, normalization and alignment of replicate profiles, generation of consensus profiles, normalization and alignment of consensus profiles and final analysis of the samples including calculation of association coefficients and diversity index. The procedures are designed so that in all analysis steps, from the initial preparation of the data to the final comparison of the samples, there are various different options available. The parameters regarding analysis range, noise baseline, T-RF alignment and generation of consensus profiles are all given by the user and several different methods are available for normalization of the T-RF profiles. In each step, the user can also choose to base the calculations on either peak height data or peak area data. The Tools for T-RFLP data analysis template enables an objective and flexible analysis of large T-RFLP datasets in a widely used spreadsheet application.
CADDIS Volume 4. Data Analysis: Exploratory Data Analysis
Intro to exploratory data analysis. Overview of variable distributions, scatter plots, correlation analysis, GIS datasets. Use of conditional probability to examine stressor levels and impairment. Exploring correlations among multiple stressors.
Modern data science for analytical chemical data - A comprehensive review.
Szymańska, Ewa
2018-10-22
Efficient and reliable analysis of chemical analytical data is a great challenge due to the increase in data size, variety and velocity. New methodologies, approaches and methods are being proposed not only by chemometrics but also by other data scientific communities to extract relevant information from big datasets and provide their value to different applications. Besides common goal of big data analysis, different perspectives and terms on big data are being discussed in scientific literature and public media. The aim of this comprehensive review is to present common trends in the analysis of chemical analytical data across different data scientific fields together with their data type-specific and generic challenges. Firstly, common data science terms used in different data scientific fields are summarized and discussed. Secondly, systematic methodologies to plan and run big data analysis projects are presented together with their steps. Moreover, different analysis aspects like assessing data quality, selecting data pre-processing strategies, data visualization and model validation are considered in more detail. Finally, an overview of standard and new data analysis methods is provided and their suitability for big analytical chemical datasets shortly discussed. Copyright © 2018 Elsevier B.V. All rights reserved.
Schaub, Jochen; Clemens, Christoph; Kaufmann, Hitto; Schulz, Torsten W
2012-01-01
Development of efficient bioprocesses is essential for cost-effective manufacturing of recombinant therapeutic proteins. To achieve further process improvement and process rationalization comprehensive data analysis of both process data and phenotypic cell-level data is essential. Here, we present a framework for advanced bioprocess data analysis consisting of multivariate data analysis (MVDA), metabolic flux analysis (MFA), and pathway analysis for mapping of large-scale gene expression data sets. This data analysis platform was applied in a process development project with an IgG-producing Chinese hamster ovary (CHO) cell line in which the maximal product titer could be increased from about 5 to 8 g/L.Principal component analysis (PCA), k-means clustering, and partial least-squares (PLS) models were applied to analyze the macroscopic bioprocess data. MFA and gene expression analysis revealed intracellular information on the characteristics of high-performance cell cultivations. By MVDA, for example, correlations between several essential amino acids and the product concentration were observed. Also, a grouping into rather cell specific productivity-driven and process control-driven processes could be unraveled. By MFA, phenotypic characteristics in glycolysis, glutaminolysis, pentose phosphate pathway, citrate cycle, coupling of amino acid metabolism to citrate cycle, and in the energy yield could be identified. By gene expression analysis 247 deregulated metabolic genes were identified which are involved, inter alia, in amino acid metabolism, transport, and protein synthesis.
NASA Technical Reports Server (NTRS)
Mason, P. W.; Harris, H. G.; Zalesak, J.; Bernstein, M.
1974-01-01
The NASA Structural Analysis System (NASTRAN) Model 1 finite element idealization, input data, and detailed analytical results are presented. The data presented include: substructuring analysis for normal modes, plots of member data, plots of symmetric free-free modes, plots of antisymmetric free-free modes, analysis of the wing, analysis of the cargo doors, analysis of the payload, and analysis of the orbiter.
Mobile In Vivo Infrared Data Collection and Diagnoses Comparison System
NASA Technical Reports Server (NTRS)
Mintz, Frederick W. (Inventor); Gunapala, Sarath D. (Inventor); Moynihan, Philip I. (Inventor)
2013-01-01
Described is a mobile in vivo infrared brain scan and analysis system. The system includes a data collection subsystem and a data analysis subsystem. The data collection subsystem is a helmet with a plurality of infrared (IR) thermometer probes. Each of the IR thermometer probes includes an IR photodetector capable of detecting IR radiation generated by evoked potentials within a user's skull. The helmet is formed to collect brain data that is reflective of firing neurons in a mobile subject and transmit the brain data to the data analysis subsystem. The data analysis subsystem is configured to generate and display a three-dimensional image that depicts a location of the firing neurons. The data analysis subsystem is also configured to compare the brain data against a library of brain data to detect an anomaly in the brain data, and notify a user of any detected anomaly in the brain data.
Scientific Data Analysis Toolkit: A Versatile Add-in to Microsoft Excel for Windows
ERIC Educational Resources Information Center
Halpern, Arthur M.; Frye, Stephen L.; Marzzacco, Charles J.
2018-01-01
Scientific Data Analysis Toolkit (SDAT) is a rigorous, versatile, and user-friendly data analysis add-in application for Microsoft Excel for Windows (PC). SDAT uses the familiar Excel environment to carry out most of the analytical tasks used in data analysis. It has been designed for student use in manipulating and analyzing data encountered in…
Data extraction for complex meta-analysis (DECiMAL) guide.
Pedder, Hugo; Sarri, Grammati; Keeney, Edna; Nunes, Vanessa; Dias, Sofia
2016-12-13
As more complex meta-analytical techniques such as network and multivariate meta-analyses become increasingly common, further pressures are placed on reviewers to extract data in a systematic and consistent manner. Failing to do this appropriately wastes time, resources and jeopardises accuracy. This guide (data extraction for complex meta-analysis (DECiMAL)) suggests a number of points to consider when collecting data, primarily aimed at systematic reviewers preparing data for meta-analysis. Network meta-analysis (NMA), multiple outcomes analysis and analysis combining different types of data are considered in a manner that can be useful across a range of data collection programmes. The guide has been shown to be both easy to learn and useful in a small pilot study.
Visualising nursing data using correspondence analysis.
Kokol, Peter; Blažun Vošner, Helena; Železnik, Danica
2016-09-01
Digitally stored, large healthcare datasets enable nurses to use 'big data' techniques and tools in nursing research. Big data is complex and multi-dimensional, so visualisation may be a preferable approach to analyse and understand it. To demonstrate the use of visualisation of big data in a technique called correspondence analysis. In the authors' study, relations among data in a nursing dataset were shown visually in graphs using correspondence analysis. The case presented demonstrates that correspondence analysis is easy to use, shows relations between data visually in a form that is simple to interpret, and can reveal hidden associations between data. Correspondence analysis supports the discovery of new knowledge. Implications for practice Knowledge obtained using correspondence analysis can be transferred immediately into practice or used to foster further research.
Online data analysis using Web GDL
NASA Astrophysics Data System (ADS)
Jaffey, A.; Cheung, M.; Kobashi, A.
2008-12-01
The ever improving capability of modern astronomical instruments to capture data at high spatial resolution and cadence is opening up unprecedented opportunities for scientific discovery. When data sets become so large that they cannot be easily transferred over the internet, the researcher must find alternative ways to perform data analysis. One strategy is to bring the data analysis code to where the data resides. We present Web GDL, an implementation of GDL (GNU Data Language, open source incremental compiler compatible with IDL) that allows users to perform interactive data analysis within a web browser.
NASA Astrophysics Data System (ADS)
Zhu, F.; Yu, H.; Rilee, M. L.; Kuo, K. S.; Yu, L.; Pan, Y.; Jiang, H.
2017-12-01
Since the establishment of data archive centers and the standardization of file formats, scientists are required to search metadata catalogs for data needed and download the data files to their local machines to carry out data analysis. This approach has facilitated data discovery and access for decades, but it inevitably leads to data transfer from data archive centers to scientists' computers through low-bandwidth Internet connections. Data transfer becomes a major performance bottleneck in such an approach. Combined with generally constrained local compute/storage resources, they limit the extent of scientists' studies and deprive them of timely outcomes. Thus, this conventional approach is not scalable with respect to both the volume and variety of geoscience data. A much more viable solution is to couple analysis and storage systems to minimize data transfer. In our study, we compare loosely coupled approaches (exemplified by Spark and Hadoop) and tightly coupled approaches (exemplified by parallel distributed database management systems, e.g., SciDB). In particular, we investigate the optimization of data placement and movement to effectively tackle the variety challenge, and boost the popularization of parallelization to address the volume challenge. Our goal is to enable high-performance interactive analysis for a good portion of geoscience data analysis exercise. We show that tightly coupled approaches can concentrate data traffic between local storage systems and compute units, and thereby optimizing bandwidth utilization to achieve a better throughput. Based on our observations, we develop a geoscience data analysis system that tightly couples analysis engines with storages, which has direct access to the detailed map of data partition locations. Through an innovation data partitioning and distribution scheme, our system has demonstrated scalable and interactive performance in real-world geoscience data analysis applications.
MONGKIE: an integrated tool for network analysis and visualization for multi-omics data.
Jang, Yeongjun; Yu, Namhee; Seo, Jihae; Kim, Sun; Lee, Sanghyuk
2016-03-18
Network-based integrative analysis is a powerful technique for extracting biological insights from multilayered omics data such as somatic mutations, copy number variations, and gene expression data. However, integrated analysis of multi-omics data is quite complicated and can hardly be done in an automated way. Thus, a powerful interactive visual mining tool supporting diverse analysis algorithms for identification of driver genes and regulatory modules is much needed. Here, we present a software platform that integrates network visualization with omics data analysis tools seamlessly. The visualization unit supports various options for displaying multi-omics data as well as unique network models for describing sophisticated biological networks such as complex biomolecular reactions. In addition, we implemented diverse in-house algorithms for network analysis including network clustering and over-representation analysis. Novel functions include facile definition and optimized visualization of subgroups, comparison of a series of data sets in an identical network by data-to-visual mapping and subsequent overlaying function, and management of custom interaction networks. Utility of MONGKIE for network-based visual data mining of multi-omics data was demonstrated by analysis of the TCGA glioblastoma data. MONGKIE was developed in Java based on the NetBeans plugin architecture, thus being OS-independent with intrinsic support of module extension by third-party developers. We believe that MONGKIE would be a valuable addition to network analysis software by supporting many unique features and visualization options, especially for analysing multi-omics data sets in cancer and other diseases. .
Eijssen, Lars M T; Goelela, Varshna S; Kelder, Thomas; Adriaens, Michiel E; Evelo, Chris T; Radonjic, Marijana
2015-06-30
Illumina whole-genome expression bead arrays are a widely used platform for transcriptomics. Most of the tools available for the analysis of the resulting data are not easily applicable by less experienced users. ArrayAnalysis.org provides researchers with an easy-to-use and comprehensive interface to the functionality of R and Bioconductor packages for microarray data analysis. As a modular open source project, it allows developers to contribute modules that provide support for additional types of data or extend workflows. To enable data analysis of Illumina bead arrays for a broad user community, we have developed a module for ArrayAnalysis.org that provides a free and user-friendly web interface for quality control and pre-processing for these arrays. This module can be used together with existing modules for statistical and pathway analysis to provide a full workflow for Illumina gene expression data analysis. The module accepts data exported from Illumina's GenomeStudio, and provides the user with quality control plots and normalized data. The outputs are directly linked to the existing statistics module of ArrayAnalysis.org, but can also be downloaded for further downstream analysis in third-party tools. The Illumina bead arrays analysis module is available at http://www.arrayanalysis.org . A user guide, a tutorial demonstrating the analysis of an example dataset, and R scripts are available. The module can be used as a starting point for statistical evaluation and pathway analysis provided on the website or to generate processed input data for a broad range of applications in life sciences research.
SEURAT: visual analytics for the integrated analysis of microarray data.
Gribov, Alexander; Sill, Martin; Lück, Sonja; Rücker, Frank; Döhner, Konstanze; Bullinger, Lars; Benner, Axel; Unwin, Antony
2010-06-03
In translational cancer research, gene expression data is collected together with clinical data and genomic data arising from other chip based high throughput technologies. Software tools for the joint analysis of such high dimensional data sets together with clinical data are required. We have developed an open source software tool which provides interactive visualization capability for the integrated analysis of high-dimensional gene expression data together with associated clinical data, array CGH data and SNP array data. The different data types are organized by a comprehensive data manager. Interactive tools are provided for all graphics: heatmaps, dendrograms, barcharts, histograms, eventcharts and a chromosome browser, which displays genetic variations along the genome. All graphics are dynamic and fully linked so that any object selected in a graphic will be highlighted in all other graphics. For exploratory data analysis the software provides unsupervised data analytics like clustering, seriation algorithms and biclustering algorithms. The SEURAT software meets the growing needs of researchers to perform joint analysis of gene expression, genomical and clinical data.
Statistical analysis and handling of missing data in cluster randomized trials: a systematic review.
Fiero, Mallorie H; Huang, Shuang; Oren, Eyal; Bell, Melanie L
2016-02-09
Cluster randomized trials (CRTs) randomize participants in groups, rather than as individuals and are key tools used to assess interventions in health research where treatment contamination is likely or if individual randomization is not feasible. Two potential major pitfalls exist regarding CRTs, namely handling missing data and not accounting for clustering in the primary analysis. The aim of this review was to evaluate approaches for handling missing data and statistical analysis with respect to the primary outcome in CRTs. We systematically searched for CRTs published between August 2013 and July 2014 using PubMed, Web of Science, and PsycINFO. For each trial, two independent reviewers assessed the extent of the missing data and method(s) used for handling missing data in the primary and sensitivity analyses. We evaluated the primary analysis and determined whether it was at the cluster or individual level. Of the 86 included CRTs, 80 (93%) trials reported some missing outcome data. Of those reporting missing data, the median percent of individuals with a missing outcome was 19% (range 0.5 to 90%). The most common way to handle missing data in the primary analysis was complete case analysis (44, 55%), whereas 18 (22%) used mixed models, six (8%) used single imputation, four (5%) used unweighted generalized estimating equations, and two (2%) used multiple imputation. Fourteen (16%) trials reported a sensitivity analysis for missing data, but most assumed the same missing data mechanism as in the primary analysis. Overall, 67 (78%) trials accounted for clustering in the primary analysis. High rates of missing outcome data are present in the majority of CRTs, yet handling missing data in practice remains suboptimal. Researchers and applied statisticians should carry out appropriate missing data methods, which are valid under plausible assumptions in order to increase statistical power in trials and reduce the possibility of bias. Sensitivity analysis should be performed, with weakened assumptions regarding the missing data mechanism to explore the robustness of results reported in the primary analysis.
Reduction and analysis of data collected during the electromagnetic tornado experiment
NASA Technical Reports Server (NTRS)
Davisson, L. D.
1976-01-01
Techniques for data processing and analysis are described to support tornado detection by analysis of radio frequency interference in various frequency bands, and sea state determination from short pulse radar measurements. Activities include: strip chart recording of tornado data; the development and implementation of computer programs for digitalization and analysis of the data; data reduction techniques for short pulse radar data, and the simulation of radar returns from the sea surface by computer models.
Solar Data Mining at Georgia State University
NASA Astrophysics Data System (ADS)
Angryk, R.; Martens, P. C.; Schuh, M.; Aydin, B.; Kempton, D.; Banda, J.; Ma, R.; Naduvil-Vadukootu, S.; Akkineni, V.; Küçük, A.; Filali Boubrahimi, S.; Hamdi, S. M.
2016-12-01
In this talk we give an overview of research projects related to solar data analysis that are conducted at Georgia State University. We will provide update on multiple advances made by our research team on the analysis of image parameters, spatio-temporal patterns mining, temporal data analysis and our experiences with big, heterogeneous solar data visualization, analysis, processing and storage. We will talk about up-to-date data mining methodologies, and their importance for big data-driven solar physics research.
Butensky, Samuel D; Sloan, Andrew P; Meyers, Eric; Carmel, Jason B
2017-07-15
Hand function is critical for independence, and neurological injury often impairs dexterity. To measure hand function in people or forelimb function in animals, sensors are employed to quantify manipulation. These sensors make assessment easier and more quantitative and allow automation of these tasks. While automated tasks improve objectivity and throughput, they also produce large amounts of data that can be burdensome to analyze. We created software called Dexterity that simplifies data analysis of automated reaching tasks. Dexterity is MATLAB software that enables quick analysis of data from forelimb tasks. Through a graphical user interface, files are loaded and data are identified and analyzed. These data can be annotated or graphed directly. Analysis is saved, and the graph and corresponding data can be exported. For additional analysis, Dexterity provides access to custom scripts created by other users. To determine the utility of Dexterity, we performed a study to evaluate the effects of task difficulty on the degree of impairment after injury. Dexterity analyzed two months of data and allowed new users to annotate the experiment, visualize results, and save and export data easily. Previous analysis of tasks was performed with custom data analysis, requiring expertise with analysis software. Dexterity made the tools required to analyze, visualize and annotate data easy to use by investigators without data science experience. Dexterity increases accessibility to automated tasks that measure dexterity by making analysis of large data intuitive, robust, and efficient. Copyright © 2017 Elsevier B.V. All rights reserved.
Reporting Data with "Over-the-Counter" Data Analysis Supports Increases Educators' Analysis Accuracy
ERIC Educational Resources Information Center
Rankin, Jenny Grant
2013-01-01
There is extensive research on the benefits of making data-informed decisions to improve learning, but these benefits rely on the data being effectively interpreted. Despite educators' above-average intellect and education levels, there is evidence many educators routinely misinterpret student data. Data analysis problems persist even at districts…
Velo and REXAN - Integrated Data Management and High Speed Analysis for Experimental Facilities
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kleese van Dam, Kerstin; Carson, James P.; Corrigan, Abigail L.
2013-01-10
The Chemical Imaging Initiative at the Pacific Northwest National Laboratory (PNNL) is creating a ‘Rapid Experimental Analysis’ (REXAN) Framework, based on the concept of reusable component libraries. REXAN allows developers to quickly compose and customize high throughput analysis pipelines for a range of experiments, as well as supporting the creation of multi-modal analysis pipelines. In addition, PNNL has coupled REXAN with its collaborative data management and analysis environment Velo to create an easy to use data management and analysis environments for experimental facilities. This paper will discuss the benefits of Velo and REXAN in the context of three examples: PNNLmore » High Resolution Mass Spectrometry - reducing analysis times from hours to seconds, and enabling the analysis of much larger data samples (100KB to 40GB) at the same time · ALS X-Ray tomography - reducing analysis times of combined STXM and EM data collected at the ALS from weeks to minutes, decreasing manual work and increasing data volumes that can be analysed in a single step ·Multi-modal nano-scale analysis of STXM and TEM data - providing a semi automated process for particle detection The creation of REXAN has significantly shortened the development time for these analysis pipelines. The integration of Velo and REXAN has significantly increased the scientific productivity of the instruments and their users by creating easy to use data management and analysis environments with greatly reduced analysis times and improved analysis capabilities.« less
Power Analysis for Anticipated Non-Response in Randomized Block Designs
ERIC Educational Resources Information Center
Pustejovsky, James E.
2011-01-01
Recent guidance on the treatment of missing data in experiments advocates the use of sensitivity analysis and worst-case bounds analysis for addressing non-ignorable missing data mechanisms; moreover, plans for the analysis of missing data should be specified prior to data collection (Puma et al., 2009). While these authors recommend only that…
18 CFR 357.3 - FERC Form No. 73, Oil Pipeline Data for Depreciation Analysis.
Code of Federal Regulations, 2010 CFR
2010-04-01
... Pipeline Data for Depreciation Analysis. 357.3 Section 357.3 Conservation of Power and Water Resources... No. 73, Oil Pipeline Data for Depreciation Analysis. (a) Who must file. Any oil pipeline company.... 73, Oil Pipeline Data for Depreciation Analysis, available for review at the Commission's Public...
Computers as an Instrument for Data Analysis. Technical Report No. 11.
ERIC Educational Resources Information Center
Muller, Mervin E.
A review of statistical data analysis involving computers as a multi-dimensional problem provides the perspective for consideration of the use of computers in statistical analysis and the problems associated with large data files. An overall description of STATJOB, a particular system for doing statistical data analysis on a digital computer,…
Beyond Constant Comparison Qualitative Data Analysis: Using NVivo
ERIC Educational Resources Information Center
Leech, Nancy L.; Onwuegbuzie, Anthony J.
2011-01-01
The purposes of this paper are to outline seven types of qualitative data analysis techniques, to present step-by-step guidance for conducting these analyses via a computer-assisted qualitative data analysis software program (i.e., NVivo9), and to present screenshots of the data analysis process. Specifically, the following seven analyses are…
An integrated data-analysis and database system for AMS 14C
NASA Astrophysics Data System (ADS)
Kjeldsen, Henrik; Olsen, Jesper; Heinemeier, Jan
2010-04-01
AMSdata is the name of a combined database and data-analysis system for AMS 14C and stable-isotope work that has been developed at Aarhus University. The system (1) contains routines for data analysis of AMS and MS data, (2) allows a flexible and accurate description of sample extraction and pretreatment, also when samples are split into several fractions, and (3) keeps track of all measured, calculated and attributed data. The structure of the database is flexible and allows an unlimited number of measurement and pretreatment procedures. The AMS 14C data analysis routine is fairly advanced and flexible, and it can be easily optimized for different kinds of measuring processes. Technically, the system is based on a Microsoft SQL server and includes stored SQL procedures for the data analysis. Microsoft Office Access is used for the (graphical) user interface, and in addition Excel, Word and Origin are exploited for input and output of data, e.g. for plotting data during data analysis.
HEP Software Foundation Community White Paper Working Group - Data Analysis and Interpretation
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bauerdick, Lothar
At the heart of experimental high energy physics (HEP) is the development of facilities and instrumentation that provide sensitivity to new phenomena. Our understanding of nature at its most fundamental level is advanced through the analysis and interpretation of data from sophisticated detectors in HEP experiments. The goal of data analysis systems is to realize the maximum possible scientific potential of the data within the constraints of computing and human resources in the least time. To achieve this goal, future analysis systems should empower physicists to access the data with a high level of interactivity, reproducibility and throughput capability. Asmore » part of the HEP Software Foundation Community White Paper process, a working group on Data Analysis and Interpretation was formed to assess the challenges and opportunities in HEP data analysis and develop a roadmap for activities in this area over the next decade. In this report, the key findings and recommendations of the Data Analysis and Interpretation Working Group are presented.« less
Combining multiple imputation and meta-analysis with individual participant data
Burgess, Stephen; White, Ian R; Resche-Rigon, Matthieu; Wood, Angela M
2013-01-01
Multiple imputation is a strategy for the analysis of incomplete data such that the impact of the missingness on the power and bias of estimates is mitigated. When data from multiple studies are collated, we can propose both within-study and multilevel imputation models to impute missing data on covariates. It is not clear how to choose between imputation models or how to combine imputation and inverse-variance weighted meta-analysis methods. This is especially important as often different studies measure data on different variables, meaning that we may need to impute data on a variable which is systematically missing in a particular study. In this paper, we consider a simulation analysis of sporadically missing data in a single covariate with a linear analysis model and discuss how the results would be applicable to the case of systematically missing data. We find in this context that ensuring the congeniality of the imputation and analysis models is important to give correct standard errors and confidence intervals. For example, if the analysis model allows between-study heterogeneity of a parameter, then we should incorporate this heterogeneity into the imputation model to maintain the congeniality of the two models. In an inverse-variance weighted meta-analysis, we should impute missing data and apply Rubin's rules at the study level prior to meta-analysis, rather than meta-analyzing each of the multiple imputations and then combining the meta-analysis estimates using Rubin's rules. We illustrate the results using data from the Emerging Risk Factors Collaboration. PMID:23703895
Inauen, A; Jenny, G J; Bauer, G F
2012-06-01
This article focuses on organizational analysis in workplace health promotion (WHP) projects. It shows how this analysis can be designed such that it provides rational data relevant to the further context-specific and goal-oriented planning of WHP and equally supports individual and organizational change processes implied by WHP. Design principles for organizational analysis were developed on the basis of a narrative review of the guiding principles of WHP interventions and organizational change as well as the scientific principles of data collection. Further, the practical experience of WHP consultants who routinely conduct organizational analysis was considered. This resulted in a framework with data-oriented and change-oriented design principles, addressing the following elements of organizational analysis in WHP: planning the overall procedure, data content, data-collection methods and information processing. Overall, the data-oriented design principles aim to produce valid, reliable and representative data, whereas the change-oriented design principles aim to promote motivation, coherence and a capacity for self-analysis. We expect that the simultaneous consideration of data- and change-oriented design principles for organizational analysis will strongly support the WHP process. We finally illustrate the applicability of the design principles to health promotion within a WHP case study.
NASA Astrophysics Data System (ADS)
Hendikawati, P.; Arifudin, R.; Zahid, M. Z.
2018-03-01
This study aims to design an android Statistics Data Analysis application that can be accessed through mobile devices to making it easier for users to access. The Statistics Data Analysis application includes various topics of basic statistical along with a parametric statistics data analysis application. The output of this application system is parametric statistics data analysis that can be used for students, lecturers, and users who need the results of statistical calculations quickly and easily understood. Android application development is created using Java programming language. The server programming language uses PHP with the Code Igniter framework, and the database used MySQL. The system development methodology used is the Waterfall methodology with the stages of analysis, design, coding, testing, and implementation and system maintenance. This statistical data analysis application is expected to support statistical lecturing activities and make students easier to understand the statistical analysis of mobile devices.
Shifting from Stewardship to Analytics of Massive Science Data
NASA Astrophysics Data System (ADS)
Crichton, D. J.; Doyle, R.; Law, E.; Hughes, S.; Huang, T.; Mahabal, A.
2015-12-01
Currently, the analysis of large data collections is executed through traditional computational and data analysis approaches, which require users to bring data to their desktops and perform local data analysis. Data collection, archiving and analysis from future remote sensing missions, be it from earth science satellites, planetary robotic missions, or massive radio observatories may not scale as more capable instruments stress existing architectural approaches and systems due to more continuous data streams, data from multiple observational platforms, and measurements and models from different agencies. A new paradigm is needed in order to increase the productivity and effectiveness of scientific data analysis. This paradigm must recognize that architectural choices, data processing, management, analysis, etc are interrelated, and must be carefully coordinated in any system that aims to allow efficient, interactive scientific exploration and discovery to exploit massive data collections. Future observational systems, including satellite and airborne experiments, and research in climate modeling will significantly increase the size of the data requiring new methodological approaches towards data analytics where users can more effectively interact with the data and apply automated mechanisms for data reduction, reduction and fusion across these massive data repositories. This presentation will discuss architecture, use cases, and approaches for developing a big data analytics strategy across multiple science disciplines.
Using MetaboAnalyst 3.0 for Comprehensive Metabolomics Data Analysis.
Xia, Jianguo; Wishart, David S
2016-09-07
MetaboAnalyst (http://www.metaboanalyst.ca) is a comprehensive Web application for metabolomic data analysis and interpretation. MetaboAnalyst handles most of the common metabolomic data types from most kinds of metabolomics platforms (MS and NMR) for most kinds of metabolomics experiments (targeted, untargeted, quantitative). In addition to providing a variety of data processing and normalization procedures, MetaboAnalyst also supports a number of data analysis and data visualization tasks using a range of univariate, multivariate methods such as PCA (principal component analysis), PLS-DA (partial least squares discriminant analysis), heatmap clustering and machine learning methods. MetaboAnalyst also offers a variety of tools for metabolomic data interpretation including MSEA (metabolite set enrichment analysis), MetPA (metabolite pathway analysis), and biomarker selection via ROC (receiver operating characteristic) curve analysis, as well as time series and power analysis. This unit provides an overview of the main functional modules and the general workflow of the latest version of MetaboAnalyst (MetaboAnalyst 3.0), followed by eight detailed protocols. © 2016 by John Wiley & Sons, Inc. Copyright © 2016 John Wiley & Sons, Inc.
An interdisciplinary analysis of ERTS data for Colorado mountain environments using ADP Techniques
NASA Technical Reports Server (NTRS)
Hoffer, R. M. (Principal Investigator)
1972-01-01
Author identified significant preliminary results from the Ouachita portion of the Texoma frame of data indicate many potentials in the analysis and interpretation of ERTS data. It is believed that one of the more significant aspects of this analysis sequence has been the investigation of a technique to relate ERTS analysis and surface observation analysis. At present a sequence involving (1) preliminary analysis based solely upon the spectral characteristics of the data, followed by (2) a surface observation mission to obtain visual information and oblique photography to particular points of interest in the test site area, appears to provide an extremely efficient technique for obtaining particularly meaningful surface observation data. Following such a procedure permits concentration on particular points of interest in the entire ERTS frame and thereby makes the surface observation data obtained to be particularly significant and meaningful. The analysis of the Texoma frame has also been significant from the standpoint of demonstrating a fast turn around analysis capability. Additionally, the analysis has shown the potential accuracy and degree of complexity of features that can be identified and mapped using ERTS data.
SUSHI: an exquisite recipe for fully documented, reproducible and reusable NGS data analysis.
Hatakeyama, Masaomi; Opitz, Lennart; Russo, Giancarlo; Qi, Weihong; Schlapbach, Ralph; Rehrauer, Hubert
2016-06-02
Next generation sequencing (NGS) produces massive datasets consisting of billions of reads and up to thousands of samples. Subsequent bioinformatic analysis is typically done with the help of open source tools, where each application performs a single step towards the final result. This situation leaves the bioinformaticians with the tasks to combine the tools, manage the data files and meta-information, document the analysis, and ensure reproducibility. We present SUSHI, an agile data analysis framework that relieves bioinformaticians from the administrative challenges of their data analysis. SUSHI lets users build reproducible data analysis workflows from individual applications and manages the input data, the parameters, meta-information with user-driven semantics, and the job scripts. As distinguishing features, SUSHI provides an expert command line interface as well as a convenient web interface to run bioinformatics tools. SUSHI datasets are self-contained and self-documented on the file system. This makes them fully reproducible and ready to be shared. With the associated meta-information being formatted as plain text tables, the datasets can be readily further analyzed and interpreted outside SUSHI. SUSHI provides an exquisite recipe for analysing NGS data. By following the SUSHI recipe, SUSHI makes data analysis straightforward and takes care of documentation and administration tasks. Thus, the user can fully dedicate his time to the analysis itself. SUSHI is suitable for use by bioinformaticians as well as life science researchers. It is targeted for, but by no means constrained to, NGS data analysis. Our SUSHI instance is in productive use and has served as data analysis interface for more than 1000 data analysis projects. SUSHI source code as well as a demo server are freely available.
Data Analysis of Film from AFGL Rocket A31.603.
1979-08-20
AD0A092 705 PIOTOMETRICS INC LEXINGTON MA FIG 22/2 DATA ANALYSIS OF FILM FROM AFGL ROCKET A31.603.(U) AUG 79 M T CHAMBERLAIN F19628-79M010...5 II FILMv ANALYSIS ..... ....... ........... 7 PREVIOUS DOCUMENTATION ................. 7 FILM FRAME IDENTIFICATION ............. 7 CAMERA...changes ........... ............ 18 ,- 1 4 SECTION I OVERVIEW OF THE DATA ANALYSIS This report describes analysis of the film data from a split field
Relating interesting quantitative time series patterns with text events and text features
NASA Astrophysics Data System (ADS)
Wanner, Franz; Schreck, Tobias; Jentner, Wolfgang; Sharalieva, Lyubka; Keim, Daniel A.
2013-12-01
In many application areas, the key to successful data analysis is the integrated analysis of heterogeneous data. One example is the financial domain, where time-dependent and highly frequent quantitative data (e.g., trading volume and price information) and textual data (e.g., economic and political news reports) need to be considered jointly. Data analysis tools need to support an integrated analysis, which allows studying the relationships between textual news documents and quantitative properties of the stock market price series. In this paper, we describe a workflow and tool that allows a flexible formation of hypotheses about text features and their combinations, which reflect quantitative phenomena observed in stock data. To support such an analysis, we combine the analysis steps of frequent quantitative and text-oriented data using an existing a-priori method. First, based on heuristics we extract interesting intervals and patterns in large time series data. The visual analysis supports the analyst in exploring parameter combinations and their results. The identified time series patterns are then input for the second analysis step, in which all identified intervals of interest are analyzed for frequent patterns co-occurring with financial news. An a-priori method supports the discovery of such sequential temporal patterns. Then, various text features like the degree of sentence nesting, noun phrase complexity, the vocabulary richness, etc. are extracted from the news to obtain meta patterns. Meta patterns are defined by a specific combination of text features which significantly differ from the text features of the remaining news data. Our approach combines a portfolio of visualization and analysis techniques, including time-, cluster- and sequence visualization and analysis functionality. We provide two case studies, showing the effectiveness of our combined quantitative and textual analysis work flow. The workflow can also be generalized to other application domains such as data analysis of smart grids, cyber physical systems or the security of critical infrastructure, where the data consists of a combination of quantitative and textual time series data.
Correlating Detergent Fiber Analysis and Dietary Fiber Analysis Data for Corn Stover
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wolfrum, E. J.; Lorenz, A. J.; deLeon, N.
There exist large amounts of detergent fiber analysis data [neutral detergent fiber (NDF), acid detergent fiber (ADF), acid detergent lignin (ADL)] for many different potential cellulosic ethanol feedstocks, since these techniques are widely used for the analysis of forages. Researchers working in the area of cellulosic ethanol are interested in the structural carbohydrates in a feedstock (principally glucan and xylan), which are typically determined by acid hydrolysis of the structural fraction after multiple extractions of the biomass. These so-called dietary fiber analysis methods are significantly more involved than detergent fiber analysis methods. The purpose of this study was to determinemore » whether it is feasible to correlate detergent fiber analysis values to glucan and xylan content determined by dietary fiber analysis methods for corn stover. In the detergent fiber analysis literature cellulose is often estimated as the difference between ADF and ADL, while hemicellulose is often estimated as the difference between NDF and ADF. Examination of a corn stover dataset containing both detergent fiber analysis data and dietary fiber analysis data predicted using near infrared spectroscopy shows that correlations between structural glucan measured using dietary fiber techniques and cellulose estimated using detergent techniques, and between structural xylan measured using dietary fiber techniques and hemicellulose estimated using detergent techniques are high, but are driven largely by the underlying correlation between total extractives measured by fiber analysis and NDF/ADF. That is, detergent analysis data is correlated to dietary fiber analysis data for structural carbohydrates, but only indirectly; the main correlation is between detergent analysis data and solvent extraction data produced during the dietary fiber analysis procedure.« less
Weniger, Markus; Engelmann, Julia C; Schultz, Jörg
2007-01-01
Background Regulation of gene expression is relevant to many areas of biology and medicine, in the study of treatments, diseases, and developmental stages. Microarrays can be used to measure the expression level of thousands of mRNAs at the same time, allowing insight into or comparison of different cellular conditions. The data derived out of microarray experiments is highly dimensional and often noisy, and interpretation of the results can get intricate. Although programs for the statistical analysis of microarray data exist, most of them lack an integration of analysis results and biological interpretation. Results We have developed GEPAT, Genome Expression Pathway Analysis Tool, offering an analysis of gene expression data under genomic, proteomic and metabolic context. We provide an integration of statistical methods for data import and data analysis together with a biological interpretation for subsets of probes or single probes on the chip. GEPAT imports various types of oligonucleotide and cDNA array data formats. Different normalization methods can be applied to the data, afterwards data annotation is performed. After import, GEPAT offers various statistical data analysis methods, as hierarchical, k-means and PCA clustering, a linear model based t-test or chromosomal profile comparison. The results of the analysis can be interpreted by enrichment of biological terms, pathway analysis or interaction networks. Different biological databases are included, to give various information for each probe on the chip. GEPAT offers no linear work flow, but allows the usage of any subset of probes and samples as a start for a new data analysis. GEPAT relies on established data analysis packages, offers a modular approach for an easy extension, and can be run on a computer grid to allow a large number of users. It is freely available under the LGPL open source license for academic and commercial users at . Conclusion GEPAT is a modular, scalable and professional-grade software integrating analysis and interpretation of microarray gene expression data. An installation available for academic users can be found at . PMID:17543125
High-Level Overview of Data Needs for RE Analysis
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lopez, Anthony
2016-12-22
This presentation provides a high level overview of analysis topics and associated data needs. Types of renewable energy analysis are grouped into two buckets: First, analysis for renewable energy potential, and second, analysis for other goals. Data requirements are similar but and they build upon one another.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ruebel, Oliver
2009-11-20
Knowledge discovery from large and complex collections of today's scientific datasets is a challenging task. With the ability to measure and simulate more processes at increasingly finer spatial and temporal scales, the increasing number of data dimensions and data objects is presenting tremendous challenges for data analysis and effective data exploration methods and tools. Researchers are overwhelmed with data and standard tools are often insufficient to enable effective data analysis and knowledge discovery. The main objective of this thesis is to provide important new capabilities to accelerate scientific knowledge discovery form large, complex, and multivariate scientific data. The research coveredmore » in this thesis addresses these scientific challenges using a combination of scientific visualization, information visualization, automated data analysis, and other enabling technologies, such as efficient data management. The effectiveness of the proposed analysis methods is demonstrated via applications in two distinct scientific research fields, namely developmental biology and high-energy physics.Advances in microscopy, image analysis, and embryo registration enable for the first time measurement of gene expression at cellular resolution for entire organisms. Analysis of high-dimensional spatial gene expression datasets is a challenging task. By integrating data clustering and visualization, analysis of complex, time-varying, spatial gene expression patterns and their formation becomes possible. The analysis framework MATLAB and the visualization have been integrated, making advanced analysis tools accessible to biologist and enabling bioinformatic researchers to directly integrate their analysis with the visualization. Laser wakefield particle accelerators (LWFAs) promise to be a new compact source of high-energy particles and radiation, with wide applications ranging from medicine to physics. To gain insight into the complex physical processes of particle acceleration, physicists model LWFAs computationally. The datasets produced by LWFA simulations are (i) extremely large, (ii) of varying spatial and temporal resolution, (iii) heterogeneous, and (iv) high-dimensional, making analysis and knowledge discovery from complex LWFA simulation data a challenging task. To address these challenges this thesis describes the integration of the visualization system VisIt and the state-of-the-art index/query system FastBit, enabling interactive visual exploration of extremely large three-dimensional particle datasets. Researchers are especially interested in beams of high-energy particles formed during the course of a simulation. This thesis describes novel methods for automatic detection and analysis of particle beams enabling a more accurate and efficient data analysis process. By integrating these automated analysis methods with visualization, this research enables more accurate, efficient, and effective analysis of LWFA simulation data than previously possible.« less
Interfaces between statistical analysis packages and the ESRI geographic information system
NASA Technical Reports Server (NTRS)
Masuoka, E.
1980-01-01
Interfaces between ESRI's geographic information system (GIS) data files and real valued data files written to facilitate statistical analysis and display of spatially referenced multivariable data are described. An example of data analysis which utilized the GIS and the statistical analysis system is presented to illustrate the utility of combining the analytic capability of a statistical package with the data management and display features of the GIS.
ERIC Educational Resources Information Center
Simmonds, Mark C.; Higgins, Julian P. T.; Stewart, Lesley A.
2013-01-01
Meta-analysis of time-to-event data has proved difficult in the past because consistent summary statistics often cannot be extracted from published results. The use of individual patient data allows for the re-analysis of each study in a consistent fashion and thus makes meta-analysis of time-to-event data feasible. Time-to-event data can be…
Qualitative case study data analysis: an example from practice.
Houghton, Catherine; Murphy, Kathy; Shaw, David; Casey, Dympna
2015-05-01
To illustrate an approach to data analysis in qualitative case study methodology. There is often little detail in case study research about how data were analysed. However, it is important that comprehensive analysis procedures are used because there are often large sets of data from multiple sources of evidence. Furthermore, the ability to describe in detail how the analysis was conducted ensures rigour in reporting qualitative research. The research example used is a multiple case study that explored the role of the clinical skills laboratory in preparing students for the real world of practice. Data analysis was conducted using a framework guided by the four stages of analysis outlined by Morse ( 1994 ): comprehending, synthesising, theorising and recontextualising. The specific strategies for analysis in these stages centred on the work of Miles and Huberman ( 1994 ), which has been successfully used in case study research. The data were managed using NVivo software. Literature examining qualitative data analysis was reviewed and strategies illustrated by the case study example provided. Discussion Each stage of the analysis framework is described with illustration from the research example for the purpose of highlighting the benefits of a systematic approach to handling large data sets from multiple sources. By providing an example of how each stage of the analysis was conducted, it is hoped that researchers will be able to consider the benefits of such an approach to their own case study analysis. This paper illustrates specific strategies that can be employed when conducting data analysis in case study research and other qualitative research designs.
Analysis techniques for residual acceleration data
NASA Technical Reports Server (NTRS)
Rogers, Melissa J. B.; Alexander, J. Iwan D.; Snyder, Robert S.
1990-01-01
Various aspects of residual acceleration data are of interest to low-gravity experimenters. Maximum and mean values and various other statistics can be obtained from data as collected in the time domain. Additional information may be obtained through manipulation of the data. Fourier analysis is discussed as a means of obtaining information about dominant frequency components of a given data window. Transformation of data into different coordinate axes is useful in the analysis of experiments with different orientations and can be achieved by the use of a transformation matrix. Application of such analysis techniques to residual acceleration data provides additional information than what is provided in a time history and increases the effectiveness of post-flight analysis of low-gravity experiments.
Using Framework Analysis in nursing research: a worked example.
Ward, Deborah J; Furber, Christine; Tierney, Stephanie; Swallow, Veronica
2013-11-01
To demonstrate Framework Analysis using a worked example and to illustrate how criticisms of qualitative data analysis including issues of clarity and transparency can be addressed. Critics of the analysis of qualitative data sometimes cite lack of clarity and transparency about analytical procedures; this can deter nurse researchers from undertaking qualitative studies. Framework Analysis is flexible, systematic, and rigorous, offering clarity, transparency, an audit trail, an option for theme-based and case-based analysis and for readily retrievable data. This paper offers further explanation of the process undertaken which is illustrated with a worked example. Data were collected from 31 nursing students in 2009 using semi-structured interviews. The data collected are not reported directly here but used as a worked example for the five steps of Framework Analysis. Suggestions are provided to guide researchers through essential steps in undertaking Framework Analysis. The benefits and limitations of Framework Analysis are discussed. Nurses increasingly use qualitative research methods and need to use an analysis approach that offers transparency and rigour which Framework Analysis can provide. Nurse researchers may find the detailed critique of Framework Analysis presented in this paper a useful resource when designing and conducting qualitative studies. Qualitative data analysis presents challenges in relation to the volume and complexity of data obtained and the need to present an 'audit trail' for those using the research findings. Framework Analysis is an appropriate, rigorous and systematic method for undertaking qualitative analysis. © 2013 Blackwell Publishing Ltd.
CM-DataONE: A Framework for collaborative analysis of climate model output
NASA Astrophysics Data System (ADS)
Xu, Hao; Bai, Yuqi; Li, Sha; Dong, Wenhao; Huang, Wenyu; Xu, Shiming; Lin, Yanluan; Wang, Bin
2015-04-01
CM-DataONE is a distributed collaborative analysis framework for climate model data which aims to break through the data access barriers of increasing file size and to accelerate research process. As data size involved in project such as the fifth Coupled Model Intercomparison Project (CMIP5) has reached petabytes, conventional methods for analysis and diagnosis of model outputs have been rather time-consuming and redundant. CM-DataONE is developed for data publishers and researchers from relevant areas. It can enable easy access to distributed data and provide extensible analysis functions based on tools such as NCAR Command Language, NetCDF Operators (NCO) and Climate Data Operators (CDO). CM-DataONE can be easily installed, configured, and maintained. The main web application has two separate parts which communicate with each other through APIs based on HTTP protocol. The analytic server is designed to be installed in each data node while a data portal can be configured anywhere and connect to a nearest node. Functions such as data query, analytic task submission, status monitoring, visualization and product downloading are provided to end users by data portal. Data conform to CMIP5 Model Output Format in each peer node can be scanned by the server and mapped to a global information database. A scheduler included in the server is responsible for task decomposition, distribution and consolidation. Analysis functions are always executed where data locate. Analysis function package included in the server has provided commonly used functions such as EOF analysis, trend analysis and time series. Functions are coupled with data by XML descriptions and can be easily extended. Various types of results can be obtained by users for further studies. This framework has significantly decreased the amount of data to be transmitted and improved efficiency in model intercomparison jobs by supporting online analysis and multi-node collaboration. To end users, data query is therefore accelerated and the size of data to be downloaded is reduced. Methodology can be easily shared among scientists, avoiding unnecessary replication. Currently, a prototype of CM-DataONE has been deployed on two data nodes of Tsinghua University.
PREdator: a python based GUI for data analysis, evaluation and fitting
2014-01-01
The analysis of a series of experimental data is an essential procedure in virtually every field of research. The information contained in the data is extracted by fitting the experimental data to a mathematical model. The type of the mathematical model (linear, exponential, logarithmic, etc.) reflects the physical laws that underlie the experimental data. Here, we aim to provide a readily accessible, user-friendly python script for data analysis, evaluation and fitting. PREdator is presented at the example of NMR paramagnetic relaxation enhancement analysis.
Use of direct gradient analysis to uncover biological hypotheses in 16s survey data and beyond.
Erb-Downward, John R; Sadighi Akha, Amir A; Wang, Juan; Shen, Ning; He, Bei; Martinez, Fernando J; Gyetko, Margaret R; Curtis, Jeffrey L; Huffnagle, Gary B
2012-01-01
This study investigated the use of direct gradient analysis of bacterial 16S pyrosequencing surveys to identify relevant bacterial community signals in the midst of a "noisy" background, and to facilitate hypothesis-testing both within and beyond the realm of ecological surveys. The results, utilizing 3 different real world data sets, demonstrate the utility of adding direct gradient analysis to any analysis that draws conclusions from indirect methods such as Principal Component Analysis (PCA) and Principal Coordinates Analysis (PCoA). Direct gradient analysis produces testable models, and can identify significant patterns in the midst of noisy data. Additionally, we demonstrate that direct gradient analysis can be used with other kinds of multivariate data sets, such as flow cytometric data, to identify differentially expressed populations. The results of this study demonstrate the utility of direct gradient analysis in microbial ecology and in other areas of research where large multivariate data sets are involved.
Remote Sensing Data Visualization, Fusion and Analysis via Giovanni
NASA Technical Reports Server (NTRS)
Leptoukh, G.; Zubko, V.; Gopalan, A.; Khayat, M.
2007-01-01
We describe Giovanni, the NASA Goddard developed online visualization and analysis tool that allows users explore various phenomena without learning remote sensing data formats and downloading voluminous data. Using MODIS aerosol data as an example, we formulate an approach to the data fusion for Giovanni to further enrich online multi-sensor remote sensing data comparison and analysis.
14 CFR 302.713 - DOT analysis of data for submission of answers thereto.
Code of Federal Regulations, 2010 CFR
2010-01-01
... 14 Aeronautics and Space 4 2010-01-01 2010-01-01 false DOT analysis of data for submission of... Mail Rate Proceedings and Mail Contracts Informal Mail Rate Conference Procedure § 302.713 DOT analysis of data for submission of answers thereto. After a careful analysis of these data, the DOT employees...
Conducting Qualitative Data Analysis: Qualitative Data Analysis as a Metaphoric Process
ERIC Educational Resources Information Center
Chenail, Ronald J.
2012-01-01
In the second of a series of "how-to" essays on conducting qualitative data analysis, Ron Chenail argues the process can best be understood as a metaphoric process. From this orientation he suggests researchers follow Kenneth Burke's notion of metaphor and see qualitative data analysis as the analyst systematically considering the "this-ness" of…
Statistical Analysis of Research Data | Center for Cancer Research
Recent advances in cancer biology have resulted in the need for increased statistical analysis of research data. The Statistical Analysis of Research Data (SARD) course will be held on April 5-6, 2018 from 9 a.m.-5 p.m. at the National Institutes of Health's Natcher Conference Center, Balcony C on the Bethesda Campus. SARD is designed to provide an overview on the general principles of statistical analysis of research data. The first day will feature univariate data analysis, including descriptive statistics, probability distributions, one- and two-sample inferential statistics.
Objective analysis of observational data from the FGGE observing systems
NASA Technical Reports Server (NTRS)
Baker, W.; Edelmann, D.; Iredell, M.; Han, D.; Jakkempudi, S.
1981-01-01
An objective analysis procedure for updating the GLAS second and fourth order general atmospheric circulation models using observational data from the first GARP global experiment is described. The objective analysis procedure is based on a successive corrections method and the model is updated in a data assimilation cycle. Preparation of the observational data for analysis and the objective analysis scheme are described. The organization of the program and description of the required data sets are presented. The program logic and detailed descriptions of each subroutine are given.
38 CFR 75.115 - Risk analysis.
Code of Federal Regulations, 2014 CFR
2014-07-01
... preparation of the risk analysis may include data mining if necessary for the development of relevant...) INFORMATION SECURITY MATTERS Data Breaches § 75.115 Risk analysis. If a data breach involving sensitive... possible after the data breach, a non-VA entity with relevant expertise in data breach assessment and risk...
38 CFR 75.115 - Risk analysis.
Code of Federal Regulations, 2012 CFR
2012-07-01
... preparation of the risk analysis may include data mining if necessary for the development of relevant...) INFORMATION SECURITY MATTERS Data Breaches § 75.115 Risk analysis. If a data breach involving sensitive... possible after the data breach, a non-VA entity with relevant expertise in data breach assessment and risk...
38 CFR 75.115 - Risk analysis.
Code of Federal Regulations, 2013 CFR
2013-07-01
... preparation of the risk analysis may include data mining if necessary for the development of relevant...) INFORMATION SECURITY MATTERS Data Breaches § 75.115 Risk analysis. If a data breach involving sensitive... possible after the data breach, a non-VA entity with relevant expertise in data breach assessment and risk...
Simple Numerical Analysis of Longboard Speedometer Data
ERIC Educational Resources Information Center
Hare, Jonathan
2013-01-01
Simple numerical data analysis is described, using a standard spreadsheet program, to determine distance, velocity (speed) and acceleration from voltage data generated by a skateboard/longboard speedometer (Hare 2012 "Phys. Educ." 47 409-17). This simple analysis is an introduction to data processing including scaling data as well as…
38 CFR 75.115 - Risk analysis.
Code of Federal Regulations, 2011 CFR
2011-07-01
... preparation of the risk analysis may include data mining if necessary for the development of relevant...) INFORMATION SECURITY MATTERS Data Breaches § 75.115 Risk analysis. If a data breach involving sensitive... possible after the data breach, a non-VA entity with relevant expertise in data breach assessment and risk...
A Disciplined Architectural Approach to Scaling Data Analysis for Massive, Scientific Data
NASA Astrophysics Data System (ADS)
Crichton, D. J.; Braverman, A. J.; Cinquini, L.; Turmon, M.; Lee, H.; Law, E.
2014-12-01
Data collections across remote sensing and ground-based instruments in astronomy, Earth science, and planetary science are outpacing scientists' ability to analyze them. Furthermore, the distribution, structure, and heterogeneity of the measurements themselves pose challenges that limit the scalability of data analysis using traditional approaches. Methods for developing science data processing pipelines, distribution of scientific datasets, and performing analysis will require innovative approaches that integrate cyber-infrastructure, algorithms, and data into more systematic approaches that can more efficiently compute and reduce data, particularly distributed data. This requires the integration of computer science, machine learning, statistics and domain expertise to identify scalable architectures for data analysis. The size of data returned from Earth Science observing satellites and the magnitude of data from climate model output, is predicted to grow into the tens of petabytes challenging current data analysis paradigms. This same kind of growth is present in astronomy and planetary science data. One of the major challenges in data science and related disciplines defining new approaches to scaling systems and analysis in order to increase scientific productivity and yield. Specific needs include: 1) identification of optimized system architectures for analyzing massive, distributed data sets; 2) algorithms for systematic analysis of massive data sets in distributed environments; and 3) the development of software infrastructures that are capable of performing massive, distributed data analysis across a comprehensive data science framework. NASA/JPL has begun an initiative in data science to address these challenges. Our goal is to evaluate how scientific productivity can be improved through optimized architectural topologies that identify how to deploy and manage the access, distribution, computation, and reduction of massive, distributed data, while managing the uncertainties of scientific conclusions derived from such capabilities. This talk will provide an overview of JPL's efforts in developing a comprehensive architectural approach to data science.
Leveraging Data Analysis for Domain Experts: An Embeddable Framework for Basic Data Science Tasks
ERIC Educational Resources Information Center
Lohrer, Johannes-Y.; Kaltenthaler, Daniel; Kröger, Peer
2016-01-01
In this paper, we describe a framework for data analysis that can be embedded into a base application. Since it is important to analyze the data directly inside the application where the data is entered, a tool that allows the scientists to easily work with their data, supports and motivates the execution of further analysis of their data, which…
Increasing Transparency Through a Multiverse Analysis.
Steegen, Sara; Tuerlinckx, Francis; Gelman, Andrew; Vanpaemel, Wolf
2016-09-01
Empirical research inevitably includes constructing a data set by processing raw data into a form ready for statistical analysis. Data processing often involves choices among several reasonable options for excluding, transforming, and coding data. We suggest that instead of performing only one analysis, researchers could perform a multiverse analysis, which involves performing all analyses across the whole set of alternatively processed data sets corresponding to a large set of reasonable scenarios. Using an example focusing on the effect of fertility on religiosity and political attitudes, we show that analyzing a single data set can be misleading and propose a multiverse analysis as an alternative practice. A multiverse analysis offers an idea of how much the conclusions change because of arbitrary choices in data construction and gives pointers as to which choices are most consequential in the fragility of the result. © The Author(s) 2016.
NASA Technical Reports Server (NTRS)
Zoladz, T.; Earhart, E.; Fiorucci, T.
1995-01-01
Utilizing high-frequency data from a highly instrumented rotor assembly, seeded bearing defect signatures are characterized using both conventional linear approaches, such as power spectral density analysis, and recently developed nonlinear techniques such as bicoherence analysis. Traditional low-frequency (less than 20 kHz) analysis and high-frequency envelope analysis of both accelerometer and acoustic emission data are used to recover characteristic bearing distress information buried deeply in acquired data. The successful coupling of newly developed nonlinear signal analysis with recovered wideband envelope data from accelerometers and acoustic emission sensors is the innovative focus of this research.
PIVOT: platform for interactive analysis and visualization of transcriptomics data.
Zhu, Qin; Fisher, Stephen A; Dueck, Hannah; Middleton, Sarah; Khaladkar, Mugdha; Kim, Junhyong
2018-01-05
Many R packages have been developed for transcriptome analysis but their use often requires familiarity with R and integrating results of different packages requires scripts to wrangle the datatypes. Furthermore, exploratory data analyses often generate multiple derived datasets such as data subsets or data transformations, which can be difficult to track. Here we present PIVOT, an R-based platform that wraps open source transcriptome analysis packages with a uniform user interface and graphical data management that allows non-programmers to interactively explore transcriptomics data. PIVOT supports more than 40 popular open source packages for transcriptome analysis and provides an extensive set of tools for statistical data manipulations. A graph-based visual interface is used to represent the links between derived datasets, allowing easy tracking of data versions. PIVOT further supports automatic report generation, publication-quality plots, and program/data state saving, such that all analysis can be saved, shared and reproduced. PIVOT will allow researchers with broad background to easily access sophisticated transcriptome analysis tools and interactively explore transcriptome datasets.
A Mobile Computing Solution for Collecting Functional Analysis Data on a Pocket PC
Jackson, James; Dixon, Mark R
2007-01-01
The present paper provides a task analysis for creating a computerized data system using a Pocket PC and Microsoft Visual Basic. With Visual Basic software and any handheld device running the Windows Moble operating system, this task analysis will allow behavior analysts to program and customize their own functional analysis data-collection system. The program will allow the user to select the type of behavior to be recorded, choose between interval and frequency data collection, and summarize data for graphing and analysis. We also provide suggestions for customizing the data-collection system for idiosyncratic research and clinical needs. PMID:17624078
iMetaLab 1.0: A web platform for metaproteomics data analysis.
Liao, Bo; Ning, Zhibin; Cheng, Kai; Zhang, Xu; Li, Leyuan; Mayne, Janice; Figeys, Daniel
2018-06-15
The human gut microbiota, a complex, dynamic and biodiverse community, has been increasingly shown to influence many aspects of health and disease. Metaproteomic analysis has proven to be a powerful approach to study the functionality of the microbiota. However, the processing and analyses of metaproteomic mass spectrometry (MS) data remains a daunting task in metaproteomics data analysis. We developed iMetaLab, a web based platform to provide a user-friendly and comprehensive data analysis pipeline with a focus on lowering the technical barrier for metaproteomics data analysis. iMetaLab is freely available through at http://imetalab.ca. Supplementary data are available at Bioinformatics online.
Data warehouse model design technology analysis and research
NASA Astrophysics Data System (ADS)
Jiang, Wenhua; Li, Qingshui
2012-01-01
Existing data storage format can not meet the needs of information analysis, data warehouse onto the historical stage, the data warehouse is to support business decision making and the creation of specially designed data collection. With the data warehouse, the companies will all collected information is stored in the data warehouse. The data warehouse is organized according to some, making information easy to access and has value. This paper focuses on the establishment of data warehouse and analysis, design, data warehouse, two barrier models, and compares them.
Interoperability Outlook in the Big Data Future
NASA Astrophysics Data System (ADS)
Kuo, K. S.; Ramachandran, R.
2015-12-01
The establishment of distributed active archive centers (DAACs) as data warehouses and the standardization of file format by NASA's Earth Observing System Data Information System (EOSDIS) had doubtlessly propelled interoperability of NASA Earth science data to unprecedented heights in the 1990s. However, we obviously still feel wanting two decades later. We believe the inadequate interoperability we experience is a result of the the current practice that data are first packaged into files before distribution and only the metadata of these files are cataloged into databases and become searchable. Data therefore cannot be efficiently filtered. Any extensive study thus requires downloading large volumes of data files to a local system for processing and analysis.The need to download data not only creates duplication and inefficiency but also further impedes interoperability, because the analysis has to be performed locally by individual researchers in individual institutions. Each institution or researcher often has its/his/her own preference in the choice of data management practice as well as programming languages. Analysis results (derived data) so produced are thus subject to the differences of these practices, which later form formidable barriers to interoperability. A number of Big Data technologies are currently being examined and tested to address Big Earth Data issues. These technologies share one common characteristics: exploiting compute and storage affinity to more efficiently analyze large volumes and great varieties of data. Distributed active "archive" centers are likely to evolve into distributed active "analysis" centers, which not only archive data but also provide analysis service right where the data reside. "Analysis" will become the more visible function of these centers. It is thus reasonable to expect interoperability to improve because analysis, in addition to data, becomes more centralized. Within a "distributed active analysis center" interoperability is almost guaranteed because data, analysis, and results all can be readily shared and reused. Effectively, with the establishment of "distributed active analysis centers", interoperation turns from a many-to-many problem into a less complicated few-to-few problem and becomes easier to solve.
Statistical analysis of life history calendar data.
Eerola, Mervi; Helske, Satu
2016-04-01
The life history calendar is a data-collection tool for obtaining reliable retrospective data about life events. To illustrate the analysis of such data, we compare the model-based probabilistic event history analysis and the model-free data mining method, sequence analysis. In event history analysis, we estimate instead of transition hazards the cumulative prediction probabilities of life events in the entire trajectory. In sequence analysis, we compare several dissimilarity metrics and contrast data-driven and user-defined substitution costs. As an example, we study young adults' transition to adulthood as a sequence of events in three life domains. The events define the multistate event history model and the parallel life domains in multidimensional sequence analysis. The relationship between life trajectories and excess depressive symptoms in middle age is further studied by their joint prediction in the multistate model and by regressing the symptom scores on individual-specific cluster indices. The two approaches complement each other in life course analysis; sequence analysis can effectively find typical and atypical life patterns while event history analysis is needed for causal inquiries. © The Author(s) 2012.
Alternatives to current flow cytometry data analysis for clinical and research studies.
Gondhalekar, Carmen; Rajwa, Bartek; Patsekin, Valery; Ragheb, Kathy; Sturgis, Jennifer; Robinson, J Paul
2018-02-01
Flow cytometry has well-established methods for data analysis based on traditional data collection techniques. These techniques typically involved manual insertion of tube samples into an instrument that, historically, could only measure 1-3 colors. The field has since evolved to incorporate new technologies for faster and highly automated sample preparation and data collection. For example, the use of microwell plates on benchtop instruments is now a standard on virtually every new instrument, and so users can easily accumulate multiple data sets quickly. Further, because the user must carefully define the layout of the plate, this information is already defined when considering the analytical process, expanding the opportunities for automated analysis. Advances in multi-parametric data collection, as demonstrated by the development of hyperspectral flow-cytometry, 20-40 color polychromatic flow cytometry, and mass cytometry (CyTOF), are game-changing. As data and assay complexity increase, so too does the complexity of data analysis. Complex data analysis is already a challenge to traditional flow cytometry software. New methods for reviewing large and complex data sets can provide rapid insight into processes difficult to define without more advanced analytical tools. In settings such as clinical labs where rapid and accurate data analysis is a priority, rapid, efficient and intuitive software is needed. This paper outlines opportunities for analysis of complex data sets using examples of multiplexed bead-based assays, drug screens and cell cycle analysis - any of which could become integrated into the clinical environment. Copyright © 2017. Published by Elsevier Inc.
SEURAT: Visual analytics for the integrated analysis of microarray data
2010-01-01
Background In translational cancer research, gene expression data is collected together with clinical data and genomic data arising from other chip based high throughput technologies. Software tools for the joint analysis of such high dimensional data sets together with clinical data are required. Results We have developed an open source software tool which provides interactive visualization capability for the integrated analysis of high-dimensional gene expression data together with associated clinical data, array CGH data and SNP array data. The different data types are organized by a comprehensive data manager. Interactive tools are provided for all graphics: heatmaps, dendrograms, barcharts, histograms, eventcharts and a chromosome browser, which displays genetic variations along the genome. All graphics are dynamic and fully linked so that any object selected in a graphic will be highlighted in all other graphics. For exploratory data analysis the software provides unsupervised data analytics like clustering, seriation algorithms and biclustering algorithms. Conclusions The SEURAT software meets the growing needs of researchers to perform joint analysis of gene expression, genomical and clinical data. PMID:20525257
Architectural Strategies for Enabling Data-Driven Science at Scale
NASA Astrophysics Data System (ADS)
Crichton, D. J.; Law, E. S.; Doyle, R. J.; Little, M. M.
2017-12-01
The analysis of large data collections from NASA or other agencies is often executed through traditional computational and data analysis approaches, which require users to bring data to their desktops and perform local data analysis. Alternatively, data are hauled to large computational environments that provide centralized data analysis via traditional High Performance Computing (HPC). Scientific data archives, however, are not only growing massive, but are also becoming highly distributed. Neither traditional approach provides a good solution for optimizing analysis into the future. Assumptions across the NASA mission and science data lifecycle, which historically assume that all data can be collected, transmitted, processed, and archived, will not scale as more capable instruments stress legacy-based systems. New paradigms are needed to increase the productivity and effectiveness of scientific data analysis. This paradigm must recognize that architectural and analytical choices are interrelated, and must be carefully coordinated in any system that aims to allow efficient, interactive scientific exploration and discovery to exploit massive data collections, from point of collection (e.g., onboard) to analysis and decision support. The most effective approach to analyzing a distributed set of massive data may involve some exploration and iteration, putting a premium on the flexibility afforded by the architectural framework. The framework should enable scientist users to assemble workflows efficiently, manage the uncertainties related to data analysis and inference, and optimize deep-dive analytics to enhance scalability. In many cases, this "data ecosystem" needs to be able to integrate multiple observing assets, ground environments, archives, and analytics, evolving from stewardship of measurements of data to using computational methodologies to better derive insight from the data that may be fused with other sets of data. This presentation will discuss architectural strategies, including a 2015-2016 NASA AIST Study on Big Data, for evolving scientific research towards massively distributed data-driven discovery. It will include example use cases across earth science, planetary science, and other disciplines.
GEOS-2 C-band radar system project. Spectral analysis as related to C-band radar data analysis
NASA Technical Reports Server (NTRS)
1972-01-01
Work performed on spectral analysis of data from the C-band radars tracking GEOS-2 and on the development of a data compaction method for the GEOS-2 C-band radar data is described. The purposes of the spectral analysis study were to determine the optimum data recording and sampling rates for C-band radar data and to determine the optimum method of filtering and smoothing the data. The optimum data recording and sampling rate is defined as the rate which includes an optimum compromise between serial correlation and the effects of frequency folding. The goal in development of a data compaction method was to reduce to a minimum the amount of data stored, while maintaining all of the statistical information content of the non-compacted data. A digital computer program for computing estimates of the power spectral density function of sampled data was used to perform the spectral analysis study.
GIS-Based crash referencing and analysis system
DOT National Transportation Integrated Search
1999-02-01
One area where a Geographic Information System (GIS) has yet to be extensively applied is in the analysis of crash data. Computerized crash analysis systems in which crash data, roadway inventory data, and traffic operations data can be merged are us...
Chapple, Christopher R; Cardozo, Linda; Snijder, Robert; Siddiqui, Emad; Herschorn, Sender
2016-12-15
Patient-level data are available for 11 randomized, controlled, Phase III/Phase IV solifenacin clinical trials. Meta-analyses were conducted to interrogate the data, to broaden knowledge about solifenacin and overactive bladder (OAB) in general. Before integrating data, datasets from individual studies were mapped to a single format using methodology developed by the Clinical Data Interchange Standards Consortium (CDISC). Initially, the data structure was harmonized, to ensure identical categorization, using the CDISC Study Data Tabulation Model (SDTM). To allow for patient level meta-analysis, data were integrated and mapped to analysis datasets. Mapping included adding derived and categorical variables and followed standards described as the Analysis Data Model (ADaM). Mapping to both SDTM and ADaM was performed twice by two independent programming teams, results compared, and inconsistencies corrected in the final output. ADaM analysis sets included assignments of patients to the Safety Analysis Set and the Full Analysis Set. There were three analysis groupings: Analysis group 1 (placebo-controlled, monotherapy, fixed-dose studies, n = 3011); Analysis group 2 (placebo-controlled, monotherapy, pooled, fixed- and flexible-dose, n = 5379); Analysis group 3 (all solifenacin monotherapy-treated patients, n = 6539). Treatment groups were: solifenacin 5 mg fixed dose, solifenacin 5/10 mg flexible dose, solifenacin 10 mg fixed dose and overall solifenacin. Patient were similar enough for data pooling to be acceptable. Creating ADaM datasets provided significant information about individual studies and the derivation decisions made in each study; validated ADaM datasets now exist for medical history, efficacy and AEs. Results from these meta-analyses were similar over time.
Saramago, Pedro; Woods, Beth; Weatherly, Helen; Manca, Andrea; Sculpher, Mark; Khan, Kamran; Vickers, Andrew J; MacPherson, Hugh
2016-10-06
Network meta-analysis methods, which are an extension of the standard pair-wise synthesis framework, allow for the simultaneous comparison of multiple interventions and consideration of the entire body of evidence in a single statistical model. There are well-established advantages to using individual patient data to perform network meta-analysis and methods for network meta-analysis of individual patient data have already been developed for dichotomous and time-to-event data. This paper describes appropriate methods for the network meta-analysis of individual patient data on continuous outcomes. This paper introduces and describes network meta-analysis of individual patient data models for continuous outcomes using the analysis of covariance framework. Comparisons are made between this approach and change score and final score only approaches, which are frequently used and have been proposed in the methodological literature. A motivating example on the effectiveness of acupuncture for chronic pain is used to demonstrate the methods. Individual patient data on 28 randomised controlled trials were synthesised. Consistency of endpoints across the evidence base was obtained through standardisation and mapping exercises. Individual patient data availability avoided the use of non-baseline-adjusted models, allowing instead for analysis of covariance models to be applied and thus improving the precision of treatment effect estimates while adjusting for baseline imbalance. The network meta-analysis of individual patient data using the analysis of covariance approach is advocated to be the most appropriate modelling approach for network meta-analysis of continuous outcomes, particularly in the presence of baseline imbalance. Further methods developments are required to address the challenge of analysing aggregate level data in the presence of baseline imbalance.
Finak, Greg; Frelinger, Jacob; Jiang, Wenxin; Newell, Evan W.; Ramey, John; Davis, Mark M.; Kalams, Spyros A.; De Rosa, Stephen C.; Gottardo, Raphael
2014-01-01
Flow cytometry is used increasingly in clinical research for cancer, immunology and vaccines. Technological advances in cytometry instrumentation are increasing the size and dimensionality of data sets, posing a challenge for traditional data management and analysis. Automated analysis methods, despite a general consensus of their importance to the future of the field, have been slow to gain widespread adoption. Here we present OpenCyto, a new BioConductor infrastructure and data analysis framework designed to lower the barrier of entry to automated flow data analysis algorithms by addressing key areas that we believe have held back wider adoption of automated approaches. OpenCyto supports end-to-end data analysis that is robust and reproducible while generating results that are easy to interpret. We have improved the existing, widely used core BioConductor flow cytometry infrastructure by allowing analysis to scale in a memory efficient manner to the large flow data sets that arise in clinical trials, and integrating domain-specific knowledge as part of the pipeline through the hierarchical relationships among cell populations. Pipelines are defined through a text-based csv file, limiting the need to write data-specific code, and are data agnostic to simplify repetitive analysis for core facilities. We demonstrate how to analyze two large cytometry data sets: an intracellular cytokine staining (ICS) data set from a published HIV vaccine trial focused on detecting rare, antigen-specific T-cell populations, where we identify a new subset of CD8 T-cells with a vaccine-regimen specific response that could not be identified through manual analysis, and a CyTOF T-cell phenotyping data set where a large staining panel and many cell populations are a challenge for traditional analysis. The substantial improvements to the core BioConductor flow cytometry packages give OpenCyto the potential for wide adoption. It can rapidly leverage new developments in computational cytometry and facilitate reproducible analysis in a unified environment. PMID:25167361
Finak, Greg; Frelinger, Jacob; Jiang, Wenxin; Newell, Evan W; Ramey, John; Davis, Mark M; Kalams, Spyros A; De Rosa, Stephen C; Gottardo, Raphael
2014-08-01
Flow cytometry is used increasingly in clinical research for cancer, immunology and vaccines. Technological advances in cytometry instrumentation are increasing the size and dimensionality of data sets, posing a challenge for traditional data management and analysis. Automated analysis methods, despite a general consensus of their importance to the future of the field, have been slow to gain widespread adoption. Here we present OpenCyto, a new BioConductor infrastructure and data analysis framework designed to lower the barrier of entry to automated flow data analysis algorithms by addressing key areas that we believe have held back wider adoption of automated approaches. OpenCyto supports end-to-end data analysis that is robust and reproducible while generating results that are easy to interpret. We have improved the existing, widely used core BioConductor flow cytometry infrastructure by allowing analysis to scale in a memory efficient manner to the large flow data sets that arise in clinical trials, and integrating domain-specific knowledge as part of the pipeline through the hierarchical relationships among cell populations. Pipelines are defined through a text-based csv file, limiting the need to write data-specific code, and are data agnostic to simplify repetitive analysis for core facilities. We demonstrate how to analyze two large cytometry data sets: an intracellular cytokine staining (ICS) data set from a published HIV vaccine trial focused on detecting rare, antigen-specific T-cell populations, where we identify a new subset of CD8 T-cells with a vaccine-regimen specific response that could not be identified through manual analysis, and a CyTOF T-cell phenotyping data set where a large staining panel and many cell populations are a challenge for traditional analysis. The substantial improvements to the core BioConductor flow cytometry packages give OpenCyto the potential for wide adoption. It can rapidly leverage new developments in computational cytometry and facilitate reproducible analysis in a unified environment.
Chipster: user-friendly analysis software for microarray and other high-throughput data.
Kallio, M Aleksi; Tuimala, Jarno T; Hupponen, Taavi; Klemelä, Petri; Gentile, Massimiliano; Scheinin, Ilari; Koski, Mikko; Käki, Janne; Korpelainen, Eija I
2011-10-14
The growth of high-throughput technologies such as microarrays and next generation sequencing has been accompanied by active research in data analysis methodology, producing new analysis methods at a rapid pace. While most of the newly developed methods are freely available, their use requires substantial computational skills. In order to enable non-programming biologists to benefit from the method development in a timely manner, we have created the Chipster software. Chipster (http://chipster.csc.fi/) brings a powerful collection of data analysis methods within the reach of bioscientists via its intuitive graphical user interface. Users can analyze and integrate different data types such as gene expression, miRNA and aCGH. The analysis functionality is complemented with rich interactive visualizations, allowing users to select datapoints and create new gene lists based on these selections. Importantly, users can save the performed analysis steps as reusable, automatic workflows, which can also be shared with other users. Being a versatile and easily extendable platform, Chipster can be used for microarray, proteomics and sequencing data. In this article we describe its comprehensive collection of analysis and visualization tools for microarray data using three case studies. Chipster is a user-friendly analysis software for high-throughput data. Its intuitive graphical user interface enables biologists to access a powerful collection of data analysis and integration tools, and to visualize data interactively. Users can collaborate by sharing analysis sessions and workflows. Chipster is open source, and the server installation package is freely available.
Chipster: user-friendly analysis software for microarray and other high-throughput data
2011-01-01
Background The growth of high-throughput technologies such as microarrays and next generation sequencing has been accompanied by active research in data analysis methodology, producing new analysis methods at a rapid pace. While most of the newly developed methods are freely available, their use requires substantial computational skills. In order to enable non-programming biologists to benefit from the method development in a timely manner, we have created the Chipster software. Results Chipster (http://chipster.csc.fi/) brings a powerful collection of data analysis methods within the reach of bioscientists via its intuitive graphical user interface. Users can analyze and integrate different data types such as gene expression, miRNA and aCGH. The analysis functionality is complemented with rich interactive visualizations, allowing users to select datapoints and create new gene lists based on these selections. Importantly, users can save the performed analysis steps as reusable, automatic workflows, which can also be shared with other users. Being a versatile and easily extendable platform, Chipster can be used for microarray, proteomics and sequencing data. In this article we describe its comprehensive collection of analysis and visualization tools for microarray data using three case studies. Conclusions Chipster is a user-friendly analysis software for high-throughput data. Its intuitive graphical user interface enables biologists to access a powerful collection of data analysis and integration tools, and to visualize data interactively. Users can collaborate by sharing analysis sessions and workflows. Chipster is open source, and the server installation package is freely available. PMID:21999641
Geophysical data analysis and visualization using the Grid Analysis and Display System
NASA Technical Reports Server (NTRS)
Doty, Brian E.; Kinter, James L., III
1995-01-01
Several problems posed by the rapidly growing volume of geophysical data are described, and a selected set of existing solutions to these problems is outlined. A recently developed desktop software tool called the Grid Analysis and Display System (GrADS) is presented. The GrADS' user interface is a natural extension of the standard procedures scientists apply to their geophysical data analysis problems. The basic GrADS operations have defaults that naturally map to data analysis actions, and there is a programmable interface for customizing data access and manipulation. The fundamental concept of the GrADS' dimension environment, which defines both the space in which the geophysical data reside and the 'slice' of data which is being analyzed at a given time, is expressed The GrADS' data storage and access model is described. An argument is made in favor of describable data formats rather than standard data formats. The manner in which GrADS users may perform operations on their data and display the results is also described. It is argued that two-dimensional graphics provides a powerful quantitative data analysis tool whose value is underestimated in the current development environment which emphasizes three dimensional structure modeling.
Tornado detection data reduction and analysis
NASA Technical Reports Server (NTRS)
Davisson, L. D.
1977-01-01
Data processing and analysis was provided in support of tornado detection by analysis of radio frequency interference in various frequency bands. Sea state determination data from short pulse radar measurements were also processed and analyzed. A backscatter simulation was implemented to predict radar performance as a function of wind velocity. Computer programs were developed for the various data processing and analysis goals of the effort.
A program to form a multidisciplinary data base and analysis for dynamic systems
NASA Technical Reports Server (NTRS)
Taylor, L. W.; Suit, W. T.; Mayo, M. H.
1984-01-01
Diverse sets of experimental data and analysis programs have been assembled for the purpose of facilitating research in systems identification, parameter estimation and state estimation techniques. The data base analysis programs are organized to make it easy to compare alternative approaches. Additional data and alternative forms of analysis will be included as they become available.
HEPDOOP: High-Energy Physics Analysis using Hadoop
NASA Astrophysics Data System (ADS)
Bhimji, W.; Bristow, T.; Washbrook, A.
2014-06-01
We perform a LHC data analysis workflow using tools and data formats that are commonly used in the "Big Data" community outside High Energy Physics (HEP). These include Apache Avro for serialisation to binary files, Pig and Hadoop for mass data processing and Python Scikit-Learn for multi-variate analysis. Comparison is made with the same analysis performed with current HEP tools in ROOT.
Bergin, Michael
2011-01-01
Qualitative data analysis is a complex process and demands clear thinking on the part of the analyst. However, a number of deficiencies may obstruct the research analyst during the process, leading to inconsistencies occurring. This paper is a reflection on the use of a qualitative data analysis program, NVivo 8, and its usefulness in identifying consistency and inconsistency during the coding process. The author was conducting a large-scale study of providers and users of mental health services in Ireland. He used NVivo 8 to store, code and analyse the data and this paper reflects some of his observations during the study. The demands placed on the analyst in trying to balance the mechanics of working through a qualitative data analysis program, while simultaneously remaining conscious of the value of all sources are highlighted. NVivo 8 as a qualitative data analysis program is a challenging but valuable means for advancing the robustness of qualitative research. Pitfalls can be avoided during analysis by running queries as the analyst progresses from tree node to tree node rather than leaving it to a stage whereby data analysis is well advanced.
Computer-assisted qualitative data analysis software.
Cope, Diane G
2014-05-01
Advances in technology have provided new approaches for data collection methods and analysis for researchers. Data collection is no longer limited to paper-and-pencil format, and numerous methods are now available through Internet and electronic resources. With these techniques, researchers are not burdened with entering data manually and data analysis is facilitated by software programs. Quantitative research is supported by the use of computer software and provides ease in the management of large data sets and rapid analysis of numeric statistical methods. New technologies are emerging to support qualitative research with the availability of computer-assisted qualitative data analysis software (CAQDAS).CAQDAS will be presented with a discussion of advantages, limitations, controversial issues, and recommendations for this type of software use.
ERIC Educational Resources Information Center
Qin, Jian; Jurisica, Igor; Liddy, Elizabeth D.; Jansen, Bernard J; Spink, Amanda; Priss, Uta; Norton, Melanie J.
2000-01-01
These six articles discuss knowledge discovery in databases (KDD). Topics include data mining; knowledge management systems; applications of knowledge discovery; text and Web mining; text mining and information retrieval; user search patterns through Web log analysis; concept analysis; data collection; and data structure inconsistency. (LRW)
A Data Warehouse Architecture for DoD Healthcare Performance Measurements.
1999-09-01
design, develop, implement, and apply statistical analysis and data mining tools to a Data Warehouse of healthcare metrics. With the DoD healthcare...framework, this thesis defines a methodology to design, develop, implement, and apply statistical analysis and data mining tools to a Data Warehouse...21 F. INABILITY TO CONDUCT HELATHCARE ANALYSIS
Multimedia Exploratory Data Analysis for Geospatial Data Mining: The Case for Augmented Seriation.
ERIC Educational Resources Information Center
Gluck, Myke
2001-01-01
Reviews the role of exploratory data analysis (EDA) for spatial data mining and presents a case study addressing environmental risk assessments in New York State to illustrate the feasibility and usability of augmenting seriation for spatial data analysis. Describes augmentation with multimedia tools to understand relationships among spatial,…
Quantitative Data Analysis--In the Graduate Curriculum
ERIC Educational Resources Information Center
Albers, Michael J.
2017-01-01
A quantitative research study collects numerical data that must be analyzed to help draw the study's conclusions. Teaching quantitative data analysis is not teaching number crunching, but teaching a way of critical thinking for how to analyze the data. The goal of data analysis is to reveal the underlying patterns, trends, and relationships of a…
mESAdb: microRNA Expression and Sequence Analysis Database
Kaya, Koray D.; Karakülah, Gökhan; Yakıcıer, Cengiz M.; Acar, Aybar C.; Konu, Özlen
2011-01-01
microRNA expression and sequence analysis database (http://konulab.fen.bilkent.edu.tr/mirna/) (mESAdb) is a regularly updated database for the multivariate analysis of sequences and expression of microRNAs from multiple taxa. mESAdb is modular and has a user interface implemented in PHP and JavaScript and coupled with statistical analysis and visualization packages written for the R language. The database primarily comprises mature microRNA sequences and their target data, along with selected human, mouse and zebrafish expression data sets. mESAdb analysis modules allow (i) mining of microRNA expression data sets for subsets of microRNAs selected manually or by motif; (ii) pair-wise multivariate analysis of expression data sets within and between taxa; and (iii) association of microRNA subsets with annotation databases, HUGE Navigator, KEGG and GO. The use of existing and customized R packages facilitates future addition of data sets and analysis tools. Furthermore, the ability to upload and analyze user-specified data sets makes mESAdb an interactive and expandable analysis tool for microRNA sequence and expression data. PMID:21177657
mESAdb: microRNA expression and sequence analysis database.
Kaya, Koray D; Karakülah, Gökhan; Yakicier, Cengiz M; Acar, Aybar C; Konu, Ozlen
2011-01-01
microRNA expression and sequence analysis database (http://konulab.fen.bilkent.edu.tr/mirna/) (mESAdb) is a regularly updated database for the multivariate analysis of sequences and expression of microRNAs from multiple taxa. mESAdb is modular and has a user interface implemented in PHP and JavaScript and coupled with statistical analysis and visualization packages written for the R language. The database primarily comprises mature microRNA sequences and their target data, along with selected human, mouse and zebrafish expression data sets. mESAdb analysis modules allow (i) mining of microRNA expression data sets for subsets of microRNAs selected manually or by motif; (ii) pair-wise multivariate analysis of expression data sets within and between taxa; and (iii) association of microRNA subsets with annotation databases, HUGE Navigator, KEGG and GO. The use of existing and customized R packages facilitates future addition of data sets and analysis tools. Furthermore, the ability to upload and analyze user-specified data sets makes mESAdb an interactive and expandable analysis tool for microRNA sequence and expression data.
Ozone data and mission sampling analysis
NASA Technical Reports Server (NTRS)
Robbins, J. L.
1980-01-01
A methodology was developed to analyze discrete data obtained from the global distribution of ozone. Statistical analysis techniques were applied to describe the distribution of data variance in terms of empirical orthogonal functions and components of spherical harmonic models. The effects of uneven data distribution and missing data were considered. Data fill based on the autocorrelation structure of the data is described. Computer coding of the analysis techniques is included.
Multivariate data analysis methods for the interpretation of microbial flow cytometric data.
Davey, Hazel M; Davey, Christopher L
2011-01-01
Flow cytometry is an important technique in cell biology and immunology and has been applied by many groups to the analysis of microorganisms. This has been made possible by developments in hardware that is now sensitive enough to be used routinely for analysis of microbes. However, in contrast to advances in the technology that underpin flow cytometry, there has not been concomitant progress in the software tools required to analyse, display and disseminate the data and manual analysis, of individual samples remains a limiting aspect of the technology. We present two new data sets that illustrate common applications of flow cytometry in microbiology and demonstrate the application of manual data analysis, automated visualisation (including the first description of a new piece of software we are developing to facilitate this), genetic programming, principal components analysis and artificial neural nets to these data. The data analysis methods described here are equally applicable to flow cytometric applications with other cell types.
Clustering and Network Analysis of Reverse Phase Protein Array Data.
Byron, Adam
2017-01-01
Molecular profiling of proteins and phosphoproteins using a reverse phase protein array (RPPA) platform, with a panel of target-specific antibodies, enables the parallel, quantitative proteomic analysis of many biological samples in a microarray format. Hence, RPPA analysis can generate a high volume of multidimensional data that must be effectively interrogated and interpreted. A range of computational techniques for data mining can be applied to detect and explore data structure and to form functional predictions from large datasets. Here, two approaches for the computational analysis of RPPA data are detailed: the identification of similar patterns of protein expression by hierarchical cluster analysis and the modeling of protein interactions and signaling relationships by network analysis. The protocols use freely available, cross-platform software, are easy to implement, and do not require any programming expertise. Serving as data-driven starting points for further in-depth analysis, validation, and biological experimentation, these and related bioinformatic approaches can accelerate the functional interpretation of RPPA data.
Analysis of longitudinal data from animals with missing values using SPSS.
Duricki, Denise A; Soleman, Sara; Moon, Lawrence D F
2016-06-01
Testing of therapies for disease or injury often involves the analysis of longitudinal data from animals. Modern analytical methods have advantages over conventional methods (particularly when some data are missing), yet they are not used widely by preclinical researchers. Here we provide an easy-to-use protocol for the analysis of longitudinal data from animals, and we present a click-by-click guide for performing suitable analyses using the statistical package IBM SPSS Statistics software (SPSS). We guide readers through the analysis of a real-life data set obtained when testing a therapy for brain injury (stroke) in elderly rats. If a few data points are missing, as in this example data set (for example, because of animal dropout), repeated-measures analysis of covariance may fail to detect a treatment effect. An alternative analysis method, such as the use of linear models (with various covariance structures), and analysis using restricted maximum likelihood estimation (to include all available data) can be used to better detect treatment effects. This protocol takes 2 h to carry out.
NASA Technical Reports Server (NTRS)
Johnson, S. C.
1982-01-01
An interface system for passing data between a relational information management (RIM) data base complex and engineering analysis language (EAL), a finite element structural analysis program is documented. The interface system, implemented on a CDC Cyber computer, is composed of two FORTRAN programs called RIM2EAL and EAL2RIM. The RIM2EAL reads model definition data from RIM and creates a file of EAL commands to define the model. The EAL2RIM reads model definition and EAL generated analysis data from EAL's data library and stores these data dirctly in a RIM data base. These two interface programs and the format for the RIM data complex are described.
Data handling and analysis for the 1971 corn blight watch experiment.
NASA Technical Reports Server (NTRS)
Anuta, P. E.; Phillips, T. L.; Landgrebe, D. A.
1972-01-01
Review of the data handling and analysis methods used in the near-operational test of remote sensing systems provided by the 1971 corn blight watch experiment. The general data analysis techniques and, particularly, the statistical multispectral pattern recognition methods for automatic computer analysis of aircraft scanner data are described. Some of the results obtained are examined, and the implications of the experiment for future data communication requirements of earth resource survey systems are discussed.
Szymańska, Ewa; Tinnevelt, Gerjen H; Brodrick, Emma; Williams, Mark; Davies, Antony N; van Manen, Henk-Jan; Buydens, Lutgarde M C
2016-08-05
Current challenges of clinical breath analysis include large data size and non-clinically relevant variations observed in exhaled breath measurements, which should be urgently addressed with competent scientific data tools. In this study, three different baseline correction methods are evaluated within a previously developed data size reduction strategy for multi capillary column - ion mobility spectrometry (MCC-IMS) datasets. Introduced for the first time in breath data analysis, the Top-hat method is presented as the optimum baseline correction method. A refined data size reduction strategy is employed in the analysis of a large breathomic dataset on a healthy and respiratory disease population. New insights into MCC-IMS spectra differences associated with respiratory diseases are provided, demonstrating the additional value of the refined data analysis strategy in clinical breath analysis. Copyright © 2016 Elsevier B.V. All rights reserved.
An interactive web-based application for Comprehensive Analysis of RNAi-screen Data.
Dutta, Bhaskar; Azhir, Alaleh; Merino, Louis-Henri; Guo, Yongjian; Revanur, Swetha; Madhamshettiwar, Piyush B; Germain, Ronald N; Smith, Jennifer A; Simpson, Kaylene J; Martin, Scott E; Buehler, Eugen; Beuhler, Eugen; Fraser, Iain D C
2016-02-23
RNAi screens are widely used in functional genomics. Although the screen data can be susceptible to a number of experimental biases, many of these can be corrected by computational analysis. For this purpose, here we have developed a web-based platform for integrated analysis and visualization of RNAi screen data named CARD (for Comprehensive Analysis of RNAi Data; available at https://card.niaid.nih.gov). CARD allows the user to seamlessly carry out sequential steps in a rigorous data analysis workflow, including normalization, off-target analysis, integration of gene expression data, optimal thresholds for hit selection and network/pathway analysis. To evaluate the utility of CARD, we describe analysis of three genome-scale siRNA screens and demonstrate: (i) a significant increase both in selection of subsequently validated hits and in rejection of false positives, (ii) an increased overlap of hits from independent screens of the same biology and (iii) insight to microRNA (miRNA) activity based on siRNA seed enrichment.
Analysis of airborne MAIS imaging spectrometric data for mineral exploration
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wang Jinnian; Zheng Lanfen; Tong Qingxi
1996-11-01
The high spectral resolution imaging spectrometric system made quantitative analysis and mapping of surface composition possible. The key issue will be the quantitative approach for analysis of surface parameters for imaging spectrometer data. This paper describes the methods and the stages of quantitative analysis. (1) Extracting surface reflectance from imaging spectrometer image. Lab. and inflight field measurements are conducted for calibration of imaging spectrometer data, and the atmospheric correction has also been used to obtain ground reflectance by using empirical line method and radiation transfer modeling. (2) Determining quantitative relationship between absorption band parameters from the imaging spectrometer data andmore » chemical composition of minerals. (3) Spectral comparison between the spectra of spectral library and the spectra derived from the imagery. The wavelet analysis-based spectrum-matching techniques for quantitative analysis of imaging spectrometer data has beer, developed. Airborne MAIS imaging spectrometer data were used for analysis and the analysis results have been applied to the mineral and petroleum exploration in Tarim Basin area china. 8 refs., 8 figs.« less
Query-Driven Visualization and Analysis
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ruebel, Oliver; Bethel, E. Wes; Prabhat, Mr.
2012-11-01
This report focuses on an approach to high performance visualization and analysis, termed query-driven visualization and analysis (QDV). QDV aims to reduce the amount of data that needs to be processed by the visualization, analysis, and rendering pipelines. The goal of the data reduction process is to separate out data that is "scientifically interesting'' and to focus visualization, analysis, and rendering on that interesting subset. The premise is that for any given visualization or analysis task, the data subset of interest is much smaller than the larger, complete data set. This strategy---extracting smaller data subsets of interest and focusing ofmore » the visualization processing on these subsets---is complementary to the approach of increasing the capacity of the visualization, analysis, and rendering pipelines through parallelism. This report discusses the fundamental concepts in QDV, their relationship to different stages in the visualization and analysis pipelines, and presents QDV's application to problems in diverse areas, ranging from forensic cybersecurity to high energy physics.« less
An interactive web-based application for Comprehensive Analysis of RNAi-screen Data
Dutta, Bhaskar; Azhir, Alaleh; Merino, Louis-Henri; Guo, Yongjian; Revanur, Swetha; Madhamshettiwar, Piyush B.; Germain, Ronald N.; Smith, Jennifer A.; Simpson, Kaylene J.; Martin, Scott E.; Beuhler, Eugen; Fraser, Iain D. C.
2016-01-01
RNAi screens are widely used in functional genomics. Although the screen data can be susceptible to a number of experimental biases, many of these can be corrected by computational analysis. For this purpose, here we have developed a web-based platform for integrated analysis and visualization of RNAi screen data named CARD (for Comprehensive Analysis of RNAi Data; available at https://card.niaid.nih.gov). CARD allows the user to seamlessly carry out sequential steps in a rigorous data analysis workflow, including normalization, off-target analysis, integration of gene expression data, optimal thresholds for hit selection and network/pathway analysis. To evaluate the utility of CARD, we describe analysis of three genome-scale siRNA screens and demonstrate: (i) a significant increase both in selection of subsequently validated hits and in rejection of false positives, (ii) an increased overlap of hits from independent screens of the same biology and (iii) insight to microRNA (miRNA) activity based on siRNA seed enrichment. PMID:26902267
MANTiS: a program for the analysis of X-ray spectromicroscopy data.
Lerotic, Mirna; Mak, Rachel; Wirick, Sue; Meirer, Florian; Jacobsen, Chris
2014-09-01
Spectromicroscopy combines spectral data with microscopy, where typical datasets consist of a stack of images taken across a range of energies over a microscopic region of the sample. Manual analysis of these complex datasets can be time-consuming, and can miss the important traits in the data. With this in mind we have developed MANTiS, an open-source tool developed in Python for spectromicroscopy data analysis. The backbone of the package involves principal component analysis and cluster analysis, classifying pixels according to spectral similarity. Our goal is to provide a data analysis tool which is comprehensive, yet intuitive and easy to use. MANTiS is designed to lead the user through the analysis using story boards that describe each step in detail so that both experienced users and beginners are able to analyze their own data independently. These capabilities are illustrated through analysis of hard X-ray imaging of iron in Roman ceramics, and soft X-ray imaging of a malaria-infected red blood cell.
NASA Technical Reports Server (NTRS)
Tiao, G. C.
1992-01-01
Work performed during the project period July 1, 1990 to June 30, 1992 on the statistical analysis of stratospheric temperature data, rawinsonde temperature data, and ozone profile data for the detection of trends is described. Our principal topics of research are trend analysis of NOAA stratospheric temperature data over the period 1978-1989; trend analysis of rawinsonde temperature data for the period 1964-1988; trend analysis of Umkehr ozone profile data for the period 1977-1991; and comparison of observed ozone and temperature trends in the lower stratosphere. Analysis of NOAA stratospheric temperature data indicates the existence of large negative trends at 0.4 mb level, with magnitudes increasing with latitudes away from the equator. Trend analysis of rawinsonde temperature data over 184 stations shows significant positive trends about 0.2 C per decade at surface to 500 mb range, decreasing to negative trends about -0.3 C at 100 to 50 mb range, and increasing slightly at 30 mb level. There is little evidence of seasonal variation in trends. Analysis of Umkehr ozone data for 12 northern hemispheric stations shows significant negative trends about -.5 percent per year in Umkehr layers 7-9 and layer 3, but somewhat less negative trends in layers 4-6. There is no pronounced seasonal variation in trends, especially in layers 4-9. A comparison was made of empirical temperature trends from rawinsonde data in the lower stratosphere with temperature changes determined from a one-dimensional radiative transfer calculation that prescribed a given ozone change over the altitude region, surface to 50 km, obtained from trend analysis of ozonsonde and Umkehr profile data. The empirical and calculated temperature trends are found in substantive agreement in profile shape and magnitude.
Rocketdyne automated dynamics data analysis and management system
NASA Technical Reports Server (NTRS)
Tarn, Robert B.
1988-01-01
An automated dynamics data analysis and management systems implemented on a DEC VAX minicomputer cluster is described. Multichannel acquisition, Fast Fourier Transformation analysis, and an online database have significantly improved the analysis of wideband transducer responses from Space Shuttle Main Engine testing. Leakage error correction to recover sinusoid amplitudes and correct for frequency slewing is described. The phase errors caused by FM recorder/playback head misalignment are automatically measured and used to correct the data. Data compression methods are described and compared. The system hardware is described. Applications using the data base are introduced, including software for power spectral density, instantaneous time history, amplitude histogram, fatigue analysis, and rotordynamics expert system analysis.
Multivariate analysis for scanning tunneling spectroscopy data
NASA Astrophysics Data System (ADS)
Yamanishi, Junsuke; Iwase, Shigeru; Ishida, Nobuyuki; Fujita, Daisuke
2018-01-01
We applied principal component analysis (PCA) to two-dimensional tunneling spectroscopy (2DTS) data obtained on a Si(111)-(7 × 7) surface to explore the effectiveness of multivariate analysis for interpreting 2DTS data. We demonstrated that several components that originated mainly from specific atoms at the Si(111)-(7 × 7) surface can be extracted by PCA. Furthermore, we showed that hidden components in the tunneling spectra can be decomposed (peak separation), which is difficult to achieve with normal 2DTS analysis without the support of theoretical calculations. Our analysis showed that multivariate analysis can be an additional powerful way to analyze 2DTS data and extract hidden information from a large amount of spectroscopic data.
Chang, Cheng; Xu, Kaikun; Guo, Chaoping; Wang, Jinxia; Yan, Qi; Zhang, Jian; He, Fuchu; Zhu, Yunping
2018-05-22
Compared with the numerous software tools developed for identification and quantification of -omics data, there remains a lack of suitable tools for both downstream analysis and data visualization. To help researchers better understand the biological meanings in their -omics data, we present an easy-to-use tool, named PANDA-view, for both statistical analysis and visualization of quantitative proteomics data and other -omics data. PANDA-view contains various kinds of analysis methods such as normalization, missing value imputation, statistical tests, clustering and principal component analysis, as well as the most commonly-used data visualization methods including an interactive volcano plot. Additionally, it provides user-friendly interfaces for protein-peptide-spectrum representation of the quantitative proteomics data. PANDA-view is freely available at https://sourceforge.net/projects/panda-view/. 1987ccpacer@163.com and zhuyunping@gmail.com. Supplementary data are available at Bioinformatics online.
NASA Technical Reports Server (NTRS)
Davis, Frank W.; Quattrochi, Dale A.; Ridd, Merrill K.; Lam, Nina S.-N.; Walsh, Stephen J.
1991-01-01
This paper discusses some basic scientific issues and research needs in the joint processing of remotely sensed and GIS data for environmental analysis. Two general topics are treated in detail: (1) scale dependence of geographic data and the analysis of multiscale remotely sensed and GIS data, and (2) data transformations and information flow during data processing. The discussion of scale dependence focuses on the theory and applications of spatial autocorrelation, geostatistics, and fractals for characterizing and modeling spatial variation. Data transformations during processing are described within the larger framework of geographical analysis, encompassing sampling, cartography, remote sensing, and GIS. Development of better user interfaces between image processing, GIS, database management, and statistical software is needed to expedite research on these and other impediments to integrated analysis of remotely sensed and GIS data.
Long-term Preservation of Data Analysis Capabilities
NASA Astrophysics Data System (ADS)
Gabriel, C.; Arviset, C.; Ibarra, A.; Pollock, A.
2015-09-01
While the long-term preservation of scientific data obtained by large astrophysics missions is ensured through science archives, the issue of data analysis software preservation has hardly been addressed. Efforts by large data centres have contributed so far to maintain some instrument or mission-specific data reduction packages on top of high-level general purpose data analysis software. However, it is always difficult to keep software alive without support and maintenance once the active phase of a mission is over. This is especially difficult in the budgetary model followed by space agencies. We discuss the importance of extending the lifetime of dedicated data analysis packages and review diverse strategies under development at ESA using new paradigms such as Virtual Machines, Cloud Computing, and Software as a Service for making possible full availability of data analysis and calibration software for decades at minimal cost.
7 CFR 275.15 - Data management.
Code of Federal Regulations, 2012 CFR
2012-01-01
... Regulations of the Department of Agriculture (Continued) FOOD AND NUTRITION SERVICE, DEPARTMENT OF AGRICULTURE FOOD STAMP AND FOOD DISTRIBUTION PROGRAM PERFORMANCE REPORTING SYSTEM Data Analysis and Evaluation § 275.15 Data management. (a) Analysis. Analysis is the process of classifying data, such as by areas of...
7 CFR 275.15 - Data management.
Code of Federal Regulations, 2014 CFR
2014-01-01
... Regulations of the Department of Agriculture (Continued) FOOD AND NUTRITION SERVICE, DEPARTMENT OF AGRICULTURE FOOD STAMP AND FOOD DISTRIBUTION PROGRAM PERFORMANCE REPORTING SYSTEM Data Analysis and Evaluation § 275.15 Data management. (a) Analysis. Analysis is the process of classifying data, such as by areas of...
7 CFR 275.15 - Data management.
Code of Federal Regulations, 2011 CFR
2011-01-01
... Regulations of the Department of Agriculture (Continued) FOOD AND NUTRITION SERVICE, DEPARTMENT OF AGRICULTURE FOOD STAMP AND FOOD DISTRIBUTION PROGRAM PERFORMANCE REPORTING SYSTEM Data Analysis and Evaluation § 275.15 Data management. (a) Analysis. Analysis is the process of classifying data, such as by areas of...
7 CFR 275.15 - Data management.
Code of Federal Regulations, 2013 CFR
2013-01-01
... Regulations of the Department of Agriculture (Continued) FOOD AND NUTRITION SERVICE, DEPARTMENT OF AGRICULTURE FOOD STAMP AND FOOD DISTRIBUTION PROGRAM PERFORMANCE REPORTING SYSTEM Data Analysis and Evaluation § 275.15 Data management. (a) Analysis. Analysis is the process of classifying data, such as by areas of...
Ofner, Johannes; Kamilli, Katharina A; Eitenberger, Elisabeth; Friedbacher, Gernot; Lendl, Bernhard; Held, Andreas; Lohninger, Hans
2015-09-15
The chemometric analysis of multisensor hyperspectral data allows a comprehensive image-based analysis of precipitated atmospheric particles. Atmospheric particulate matter was precipitated on aluminum foils and analyzed by Raman microspectroscopy and subsequently by electron microscopy and energy dispersive X-ray spectroscopy. All obtained images were of the same spot of an area of 100 × 100 μm(2). The two hyperspectral data sets and the high-resolution scanning electron microscope images were fused into a combined multisensor hyperspectral data set. This multisensor data cube was analyzed using principal component analysis, hierarchical cluster analysis, k-means clustering, and vertex component analysis. The detailed chemometric analysis of the multisensor data allowed an extensive chemical interpretation of the precipitated particles, and their structure and composition led to a comprehensive understanding of atmospheric particulate matter.
CADDIS Volume 4. Data Analysis: Basic Principles & Issues
Use of inferential statistics in causal analysis, introduction to data independence and autocorrelation, methods to identifying and control for confounding variables, references for the Basic Principles section of Data Analysis.
2015-09-30
TREX13 data analysis /modeling Dajun (DJ) Tang Applied Physics Laboratory, University of Washington 1013 NE 40th Street, Seattle, WA 98105...accuracy in those predictions. With extensive TREX13 data in hand, the objective now shifts to realizing the long-term goals using data analysis and...be quantitatively addressed. The approach to analysis can be summarized into the following steps: 1. Based on measurements, assess to what degree
NASA Technical Reports Server (NTRS)
1972-01-01
An economic analysis of space tug operations is presented. The subjects discussed are: (1) data base for orbit injection stages, (2) data base for reusable space tug, (3) performance equations, (4) data integration and interpretation, (5) tug performance and mission model accomodation, (6) total program cost, (7) payload analysis, (8) computer software, and (9) comparison of tug concepts.
[Preliminarily application of content analysis to qualitative nursing data].
Liang, Shu-Yuan; Chuang, Yeu-Hui; Wu, Shu-Fang
2012-10-01
Content analysis is a methodology for objectively and systematically studying the content of communication in various formats. Content analysis in nursing research and nursing education is called qualitative content analysis. Qualitative content analysis is frequently applied to nursing research, as it allows researchers to determine categories inductively and deductively. This article examines qualitative content analysis in nursing research from theoretical and practical perspectives. We first describe how content analysis concepts such as unit of analysis, meaning unit, code, category, and theme are used. Next, we describe the basic steps involved in using content analysis, including data preparation, data familiarization, analysis unit identification, creating tentative coding categories, category refinement, and establishing category integrity. Finally, this paper introduces the concept of content analysis rigor, including dependability, confirmability, credibility, and transferability. This article elucidates the content analysis method in order to help professionals conduct systematic research that generates data that are informative and useful in practical application.
Qualitative Data Analysis for Health Services Research: Developing Taxonomy, Themes, and Theory
Bradley, Elizabeth H; Curry, Leslie A; Devers, Kelly J
2007-01-01
Objective To provide practical strategies for conducting and evaluating analyses of qualitative data applicable for health services researchers. Data Sources and Design We draw on extant qualitative methodological literature to describe practical approaches to qualitative data analysis. Approaches to data analysis vary by discipline and analytic tradition; however, we focus on qualitative data analysis that has as a goal the generation of taxonomy, themes, and theory germane to health services research. Principle Findings We describe an approach to qualitative data analysis that applies the principles of inductive reasoning while also employing predetermined code types to guide data analysis and interpretation. These code types (conceptual, relationship, perspective, participant characteristics, and setting codes) define a structure that is appropriate for generation of taxonomy, themes, and theory. Conceptual codes and subcodes facilitate the development of taxonomies. Relationship and perspective codes facilitate the development of themes and theory. Intersectional analyses with data coded for participant characteristics and setting codes can facilitate comparative analyses. Conclusions Qualitative inquiry can improve the description and explanation of complex, real-world phenomena pertinent to health services research. Greater understanding of the processes of qualitative data analysis can be helpful for health services researchers as they use these methods themselves or collaborate with qualitative researchers from a wide range of disciplines. PMID:17286625
Qualitative data analysis for health services research: developing taxonomy, themes, and theory.
Bradley, Elizabeth H; Curry, Leslie A; Devers, Kelly J
2007-08-01
To provide practical strategies for conducting and evaluating analyses of qualitative data applicable for health services researchers. DATA SOURCES AND DESIGN: We draw on extant qualitative methodological literature to describe practical approaches to qualitative data analysis. Approaches to data analysis vary by discipline and analytic tradition; however, we focus on qualitative data analysis that has as a goal the generation of taxonomy, themes, and theory germane to health services research. We describe an approach to qualitative data analysis that applies the principles of inductive reasoning while also employing predetermined code types to guide data analysis and interpretation. These code types (conceptual, relationship, perspective, participant characteristics, and setting codes) define a structure that is appropriate for generation of taxonomy, themes, and theory. Conceptual codes and subcodes facilitate the development of taxonomies. Relationship and perspective codes facilitate the development of themes and theory. Intersectional analyses with data coded for participant characteristics and setting codes can facilitate comparative analyses. Qualitative inquiry can improve the description and explanation of complex, real-world phenomena pertinent to health services research. Greater understanding of the processes of qualitative data analysis can be helpful for health services researchers as they use these methods themselves or collaborate with qualitative researchers from a wide range of disciplines.
A new metaphor for projection-based visual analysis and data exploration
NASA Astrophysics Data System (ADS)
Schreck, Tobias; Panse, Christian
2007-01-01
In many important application domains such as Business and Finance, Process Monitoring, and Security, huge and quickly increasing volumes of complex data are collected. Strong efforts are underway developing automatic and interactive analysis tools for mining useful information from these data repositories. Many data analysis algorithms require an appropriate definition of similarity (or distance) between data instances to allow meaningful clustering, classification, and retrieval, among other analysis tasks. Projection-based data visualization is highly interesting (a) for visual discrimination analysis of a data set within a given similarity definition, and (b) for comparative analysis of similarity characteristics of a given data set represented by different similarity definitions. We introduce an intuitive and effective novel approach for projection-based similarity visualization for interactive discrimination analysis, data exploration, and visual evaluation of metric space effectiveness. The approach is based on the convex hull metaphor for visually aggregating sets of points in projected space, and it can be used with a variety of different projection techniques. The effectiveness of the approach is demonstrated by application on two well-known data sets. Statistical evidence supporting the validity of the hull metaphor is presented. We advocate the hull-based approach over the standard symbol-based approach to projection visualization, as it allows a more effective perception of similarity relationships and class distribution characteristics.
Aviation Data Integration System
NASA Technical Reports Server (NTRS)
Kulkarni, Deepak; Wang, Yao; Windrem, May; Patel, Hemil; Keller, Richard
2003-01-01
During the analysis of flight data and safety reports done in ASAP and FOQA programs, airline personnel are not able to access relevant aviation data for a variety of reasons. We have developed the Aviation Data Integration System (ADIS), a software system that provides integrated heterogeneous data to support safety analysis. Types of data available in ADIS include weather, D-ATIS, RVR, radar data, and Jeppesen charts, and flight data. We developed three versions of ADIS to support airlines. The first version has been developed to support ASAP teams. A second version supports FOQA teams, and it integrates aviation data with flight data while keeping identification information inaccessible. Finally, we developed a prototype that demonstrates the integration of aviation data into flight data analysis programs. The initial feedback from airlines is that ADIS is very useful in FOQA and ASAP analysis.
Integrative Analysis of Omics Big Data.
Yu, Xiang-Tian; Zeng, Tao
2018-01-01
The diversity and huge omics data take biology and biomedicine research and application into a big data era, just like that popular in human society a decade ago. They are opening a new challenge from horizontal data ensemble (e.g., the similar types of data collected from different labs or companies) to vertical data ensemble (e.g., the different types of data collected for a group of person with match information), which requires the integrative analysis in biology and biomedicine and also asks for emergent development of data integration to address the great changes from previous population-guided to newly individual-guided investigations.Data integration is an effective concept to solve the complex problem or understand the complicate system. Several benchmark studies have revealed the heterogeneity and trade-off that existed in the analysis of omics data. Integrative analysis can combine and investigate many datasets in a cost-effective reproducible way. Current integration approaches on biological data have two modes: one is "bottom-up integration" mode with follow-up manual integration, and the other one is "top-down integration" mode with follow-up in silico integration.This paper will firstly summarize the combinatory analysis approaches to give candidate protocol on biological experiment design for effectively integrative study on genomics and then survey the data fusion approaches to give helpful instruction on computational model development for biological significance detection, which have also provided newly data resources and analysis tools to support the precision medicine dependent on the big biomedical data. Finally, the problems and future directions are highlighted for integrative analysis of omics big data.
PROS: An IRAF based system for analysis of x ray data
NASA Technical Reports Server (NTRS)
Conroy, M. A.; Deponte, J.; Moran, J. F.; Orszak, J. S.; Roberts, W. P.; Schmidt, D.
1992-01-01
PROS is an IRAF based software package for the reduction and analysis of x-ray data. The use of a standard, portable, integrated environment provides for both multi-frequency and multi-mission analysis. The analysis of x-ray data differs from optical analysis due to the nature of the x-ray data and its acquisition during constantly varying conditions. The scarcity of data, the low signal-to-noise ratio and the large gaps in exposure time make data screening and masking an important part of the analysis. PROS was developed to support the analysis of data from the ROSAT and Einstein missions but many of the tasks have been used on data from other missions. IRAF/PROS provides a complete end-to-end system for x-ray data analysis: (1) a set of tools for importing and exporting data via FITS format -- in particular, IRAF provides a specialized event-list format, QPOE, that is compatible with its IMAGE (2-D array) format; (2) a powerful set of IRAF system capabilities for both temporal and spatial event filtering; (3) full set of imaging and graphics tasks; (4) specialized packages for scientific analysis such as spatial, spectral and timing analysis -- these consist of both general and mission specific tasks; and (5) complete system support including ftp and magnetic tape releases, electronic and conventional mail hotline support, electronic mail distribution of solutions to frequently asked questions and current known bugs. We will discuss the philosophy, architecture and development environment used by PROS to generate a portable, multimission software environment. PROS is available on all platforms that support IRAF, including Sun/Unix, VAX/VMS, HP, and Decstations. It is available on request at no charge.
Meta-analysis of pathway enrichment: combining independent and dependent omics data sets.
Kaever, Alexander; Landesfeind, Manuel; Feussner, Kirstin; Morgenstern, Burkhard; Feussner, Ivo; Meinicke, Peter
2014-01-01
A major challenge in current systems biology is the combination and integrative analysis of large data sets obtained from different high-throughput omics platforms, such as mass spectrometry based Metabolomics and Proteomics or DNA microarray or RNA-seq-based Transcriptomics. Especially in the case of non-targeted Metabolomics experiments, where it is often impossible to unambiguously map ion features from mass spectrometry analysis to metabolites, the integration of more reliable omics technologies is highly desirable. A popular method for the knowledge-based interpretation of single data sets is the (Gene) Set Enrichment Analysis. In order to combine the results from different analyses, we introduce a methodical framework for the meta-analysis of p-values obtained from Pathway Enrichment Analysis (Set Enrichment Analysis based on pathways) of multiple dependent or independent data sets from different omics platforms. For dependent data sets, e.g. obtained from the same biological samples, the framework utilizes a covariance estimation procedure based on the nonsignificant pathways in single data set enrichment analysis. The framework is evaluated and applied in the joint analysis of Metabolomics mass spectrometry and Transcriptomics DNA microarray data in the context of plant wounding. In extensive studies of simulated data set dependence, the introduced correlation could be fully reconstructed by means of the covariance estimation based on pathway enrichment. By restricting the range of p-values of pathways considered in the estimation, the overestimation of correlation, which is introduced by the significant pathways, could be reduced. When applying the proposed methods to the real data sets, the meta-analysis was shown not only to be a powerful tool to investigate the correlation between different data sets and summarize the results of multiple analyses but also to distinguish experiment-specific key pathways.
An interactive environment for the analysis of large Earth observation and model data sets
NASA Technical Reports Server (NTRS)
Bowman, Kenneth P.; Walsh, John E.; Wilhelmson, Robert B.
1993-01-01
We propose to develop an interactive environment for the analysis of large Earth science observation and model data sets. We will use a standard scientific data storage format and a large capacity (greater than 20 GB) optical disk system for data management; develop libraries for coordinate transformation and regridding of data sets; modify the NCSA X Image and X DataSlice software for typical Earth observation data sets by including map transformations and missing data handling; develop analysis tools for common mathematical and statistical operations; integrate the components described above into a system for the analysis and comparison of observations and model results; and distribute software and documentation to the scientific community.
An interactive environment for the analysis of large Earth observation and model data sets
NASA Technical Reports Server (NTRS)
Bowman, Kenneth P.; Walsh, John E.; Wilhelmson, Robert B.
1992-01-01
We propose to develop an interactive environment for the analysis of large Earth science observation and model data sets. We will use a standard scientific data storage format and a large capacity (greater than 20 GB) optical disk system for data management; develop libraries for coordinate transformation and regridding of data sets; modify the NCSA X Image and X Data Slice software for typical Earth observation data sets by including map transformations and missing data handling; develop analysis tools for common mathematical and statistical operations; integrate the components described above into a system for the analysis and comparison of observations and model results; and distribute software and documentation to the scientific community.
Data analysis and software support for the Earth radiation budget experiment
NASA Technical Reports Server (NTRS)
Edmonds, W.; Natarajan, S.
1987-01-01
Computer programming and data analysis efforts were performed in support of the Earth Radiation Budget Experiment (ERBE) at NASA/Langley. A brief description of the ERBE followed by sections describing software development and data analysis for both prelaunch and postlaunch instrument data are presented.
NASA Technical Reports Server (NTRS)
Wilson, J. L.
1974-01-01
A users guide to the Sampled Data Stability Analysis Program (SADSAP) is provided. This program is a general purpose sampled data Stability Analysis Program capable of providing frequency response on root locus data.
Post-test navigation data analysis techniques for the shuttle ALT
NASA Technical Reports Server (NTRS)
1975-01-01
Postflight test analysis data processing techniques for shuttle approach and landing tests (ALT) navigation data are defined. Postfight test processor requirements are described along with operational and design requirements, data input requirements, and software test requirements. The postflight test data processing is described based on the natural test sequence: quick-look analysis, postflight navigation processing, and error isolation processing. Emphasis is placed on the tradeoffs that must remain open and subject to analysis until final definition is achieved in the shuttle data processing system and the overall ALT plan. A development plan for the implementation of the ALT postflight test navigation data processing system is presented. Conclusions are presented.
Shulman, Nick; Bellew, Matthew; Snelling, George; Carter, Donald; Huang, Yunda; Li, Hongli; Self, Steven G.; McElrath, M. Juliana; De Rosa, Stephen C.
2008-01-01
Background Intracellular cytokine staining (ICS) by multiparameter flow cytometry is one of the primary methods for determining T cell immunogenicity in HIV-1 clinical vaccine trials. Data analysis requires considerable expertise and time. The amount of data is quickly increasing as more and larger trials are performed, and thus there is a critical need for high throughput methods of data analysis. Methods A web based flow cytometric analysis system, LabKey Flow, was developed for analyses of data from standardized ICS assays. A gating template was created manually in commercially-available flow cytometric analysis software. Using this template, the system automatically compensated and analyzed all data sets. Quality control queries were designed to identify potentially incorrect sample collections. Results Comparison of the semi-automated analysis performed by LabKey Flow and the manual analysis performed using FlowJo software demonstrated excellent concordance (concordance correlation coefficient >0.990). Manual inspection of the analyses performed by LabKey Flow for 8-color ICS data files from several clinical vaccine trials indicates that template gates can appropriately be used for most data sets. Conclusions The semi-automated LabKey Flow analysis system can analyze accurately large ICS data files. Routine use of the system does not require specialized expertise. This high-throughput analysis will provide great utility for rapid evaluation of complex multiparameter flow cytometric measurements collected from large clinical trials. PMID:18615598
49 CFR 1244.8 - Analysis of waybill data.
Code of Federal Regulations, 2010 CFR
2010-10-01
... 49 Transportation 9 2010-10-01 2010-10-01 false Analysis of waybill data. 1244.8 Section 1244.8... OF TRANSPORTATION (CONTINUED) ACCOUNTS, RECORDS AND REPORTS WAYBILL ANALYSIS OF TRANSPORTATION OF PROPERTY-RAILROADS § 1244.8 Analysis of waybill data. Users of the waybill sample when presenting waybill...
Hasson, Uri; Skipper, Jeremy I; Wilde, Michael J; Nusbaum, Howard C; Small, Steven L
2008-01-15
The increasingly complex research questions addressed by neuroimaging research impose substantial demands on computational infrastructures. These infrastructures need to support management of massive amounts of data in a way that affords rapid and precise data analysis, to allow collaborative research, and to achieve these aims securely and with minimum management overhead. Here we present an approach that overcomes many current limitations in data analysis and data sharing. This approach is based on open source database management systems that support complex data queries as an integral part of data analysis, flexible data sharing, and parallel and distributed data processing using cluster computing and Grid computing resources. We assess the strengths of these approaches as compared to current frameworks based on storage of binary or text files. We then describe in detail the implementation of such a system and provide a concrete description of how it was used to enable a complex analysis of fMRI time series data.
Hasson, Uri; Skipper, Jeremy I.; Wilde, Michael J.; Nusbaum, Howard C.; Small, Steven L.
2007-01-01
The increasingly complex research questions addressed by neuroimaging research impose substantial demands on computational infrastructures. These infrastructures need to support management of massive amounts of data in a way that affords rapid and precise data analysis, to allow collaborative research, and to achieve these aims securely and with minimum management overhead. Here we present an approach that overcomes many current limitations in data analysis and data sharing. This approach is based on open source database management systems that support complex data queries as an integral part of data analysis, flexible data sharing, and parallel and distributed data processing using cluster computing and Grid computing resources. We assess the strengths of these approaches as compared to current frameworks based on storage of binary or text files. We then describe in detail the implementation of such a system and provide a concrete description of how it was used to enable a complex analysis of fMRI time series data. PMID:17964812
Content analysis and thematic analysis: Implications for conducting a qualitative descriptive study.
Vaismoradi, Mojtaba; Turunen, Hannele; Bondas, Terese
2013-09-01
Qualitative content analysis and thematic analysis are two commonly used approaches in data analysis of nursing research, but boundaries between the two have not been clearly specified. In other words, they are being used interchangeably and it seems difficult for the researcher to choose between them. In this respect, this paper describes and discusses the boundaries between qualitative content analysis and thematic analysis and presents implications to improve the consistency between the purpose of related studies and the method of data analyses. This is a discussion paper, comprising an analytical overview and discussion of the definitions, aims, philosophical background, data gathering, and analysis of content analysis and thematic analysis, and addressing their methodological subtleties. It is concluded that in spite of many similarities between the approaches, including cutting across data and searching for patterns and themes, their main difference lies in the opportunity for quantification of data. It means that measuring the frequency of different categories and themes is possible in content analysis with caution as a proxy for significance. © 2013 Wiley Publishing Asia Pty Ltd.
Research on the raw data processing method of the hydropower construction project
NASA Astrophysics Data System (ADS)
Tian, Zhichao
2018-01-01
In this paper, based on the characteristics of the fixed data, this paper compares the various mathematical statistics analysis methods and chooses the improved Grabs criterion to analyze the data, and through the analysis of the data processing, the data processing method is not suitable. It is proved that this method can be applied to the processing of fixed raw data. This paper provides a reference for reasonably determining the effective quota analysis data.
Tidal analysis and Arrival Process Mining Using Automatic Identification System (AIS) Data
2017-01-01
files, organized by location. The data were processed using the Python programming language (van Rossum and Drake 2001), the Pandas data analysis...ER D C/ CH L TR -1 7- 2 Coastal Inlets Research Program Tidal Analysis and Arrival Process Mining Using Automatic Identification System...17-2 January 2017 Tidal Analysis and Arrival Process Mining Using Automatic Identification System (AIS) Data Brandan M. Scully Coastal and
Shuttle Electrical Power Analysis Program (SEPAP); single string circuit analysis report
NASA Technical Reports Server (NTRS)
Murdock, C. R.
1974-01-01
An evaluation is reported of the data obtained from an analysis of the distribution network characteristics of the shuttle during a spacelab mission. A description of the approach utilized in the development of the computer program and data base is provided and conclusions are drawn from the analysis of the data. Data sheets are provided for information to support the detailed discussion on each computer run.
An R package for the integrated analysis of metabolomics and spectral data.
Costa, Christopher; Maraschin, Marcelo; Rocha, Miguel
2016-06-01
Recently, there has been a growing interest in the field of metabolomics, materialized by a remarkable growth in experimental techniques, available data and related biological applications. Indeed, techniques as nuclear magnetic resonance, gas or liquid chromatography, mass spectrometry, infrared and UV-visible spectroscopies have provided extensive datasets that can help in tasks as biological and biomedical discovery, biotechnology and drug development. However, as it happens with other omics data, the analysis of metabolomics datasets provides multiple challenges, both in terms of methodologies and in the development of appropriate computational tools. Indeed, from the available software tools, none addresses the multiplicity of existing techniques and data analysis tasks. In this work, we make available a novel R package, named specmine, which provides a set of methods for metabolomics data analysis, including data loading in different formats, pre-processing, metabolite identification, univariate and multivariate data analysis, machine learning, and feature selection. Importantly, the implemented methods provide adequate support for the analysis of data from diverse experimental techniques, integrating a large set of functions from several R packages in a powerful, yet simple to use environment. The package, already available in CRAN, is accompanied by a web site where users can deposit datasets, scripts and analysis reports to be shared with the community, promoting the efficient sharing of metabolomics data analysis pipelines. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Solutions for Mining Distributed Scientific Data
NASA Astrophysics Data System (ADS)
Lynnes, C.; Pham, L.; Graves, S.; Ramachandran, R.; Maskey, M.; Keiser, K.
2007-12-01
Researchers at the University of Alabama in Huntsville (UAH) and the Goddard Earth Sciences Data and Information Services Center (GES DISC) are working on approaches and methodologies facilitating the analysis of large amounts of distributed scientific data. Despite the existence of full-featured analysis tools, such as the Algorithm Development and Mining (ADaM) toolkit from UAH, and data repositories, such as the GES DISC, that provide online access to large amounts of data, there remain obstacles to getting the analysis tools and the data together in a workable environment. Does one bring the data to the tools or deploy the tools close to the data? The large size of many current Earth science datasets incurs significant overhead in network transfer for analysis workflows, even with the advanced networking capabilities that are available between many educational and government facilities. The UAH and GES DISC team are developing a capability to define analysis workflows using distributed services and online data resources. We are developing two solutions for this problem that address different analysis scenarios. The first is a Data Center Deployment of the analysis services for large data selections, orchestrated by a remotely defined analysis workflow. The second is a Data Mining Center approach of providing a cohesive analysis solution for smaller subsets of data. The two approaches can be complementary and thus provide flexibility for researchers to exploit the best solution for their data requirements. The Data Center Deployment of the analysis services has been implemented by deploying ADaM web services at the GES DISC so they can access the data directly, without the need of network transfers. Using the Mining Workflow Composer, a user can define an analysis workflow that is then submitted through a Web Services interface to the GES DISC for execution by a processing engine. The workflow definition is composed, maintained and executed at a distributed location, but most of the actual services comprising the workflow are available local to the GES DISC data repository. Additional refinements will ultimately provide a package that is easily implemented and configured at additional data centers for analysis of additional science data sets. Enhancements to the ADaM toolkit allow the staging of distributed data wherever the services are deployed, to support a Data Mining Center that can provide additional computational resources, large storage of output, easier addition and updates to available services, and access to data from multiple repositories. The Data Mining Center case provides researchers more flexibility to quickly try different workflow configurations and refine the process, using smaller amounts of data that may likely be transferred from distributed online repositories. This environment is sufficient for some analyses, but can also be used as an initial sandbox to test and refine a solution before staging the execution at a Data Center Deployment. Detection of airborne dust both over water and land in MODIS imagery using mining services for both solutions will be presented. The dust detection is just one possible example of the mining and analysis capabilities the proposed mining services solutions will provide to the science community. More information about the available services and the current status of this project is available at http://www.itsc.uah.edu/mws/
40 CFR 92.131 - Smoke, data analysis.
Code of Federal Regulations, 2013 CFR
2013-07-01
... 40 Protection of Environment 21 2013-07-01 2013-07-01 false Smoke, data analysis. 92.131 Section...) CONTROL OF AIR POLLUTION FROM LOCOMOTIVES AND LOCOMOTIVE ENGINES Test Procedures § 92.131 Smoke, data analysis. The following procedure shall be used to analyze the smoke test data: (a) Locate each throttle...
40 CFR 92.131 - Smoke, data analysis.
Code of Federal Regulations, 2012 CFR
2012-07-01
... 40 Protection of Environment 21 2012-07-01 2012-07-01 false Smoke, data analysis. 92.131 Section...) CONTROL OF AIR POLLUTION FROM LOCOMOTIVES AND LOCOMOTIVE ENGINES Test Procedures § 92.131 Smoke, data analysis. The following procedure shall be used to analyze the smoke test data: (a) Locate each throttle...
40 CFR 92.131 - Smoke, data analysis.
Code of Federal Regulations, 2014 CFR
2014-07-01
... 40 Protection of Environment 20 2014-07-01 2013-07-01 true Smoke, data analysis. 92.131 Section 92...) CONTROL OF AIR POLLUTION FROM LOCOMOTIVES AND LOCOMOTIVE ENGINES Test Procedures § 92.131 Smoke, data analysis. The following procedure shall be used to analyze the smoke test data: (a) Locate each throttle...
40 CFR 92.131 - Smoke, data analysis.
Code of Federal Regulations, 2011 CFR
2011-07-01
... 40 Protection of Environment 20 2011-07-01 2011-07-01 false Smoke, data analysis. 92.131 Section...) CONTROL OF AIR POLLUTION FROM LOCOMOTIVES AND LOCOMOTIVE ENGINES Test Procedures § 92.131 Smoke, data analysis. The following procedure shall be used to analyze the smoke test data: (a) Locate each throttle...
Methods for Mediation Analysis with Missing Data
ERIC Educational Resources Information Center
Zhang, Zhiyong; Wang, Lijuan
2013-01-01
Despite wide applications of both mediation models and missing data techniques, formal discussion of mediation analysis with missing data is still rare. We introduce and compare four approaches to dealing with missing data in mediation analysis including list wise deletion, pairwise deletion, multiple imputation (MI), and a two-stage maximum…
Cross-Cutting Interoperability in an Earth Science Collaboratory
NASA Technical Reports Server (NTRS)
Lynnes, Christopher; Ramachandran, Rahul; Kuo, Kuo-Sen
2011-01-01
An Earth Science Collaboratory is: A rich data analysis environment with: (1) Access to a wide spectrum of Earth Science data, (3) A diverse set of science analysis services and tools, (4) A means to collaborate on data, tools and analysis, and (5)Supports sharing of data, tools, results and knowledge
Federal Register 2010, 2011, 2012, 2013, 2014
2012-08-08
... Awards; Technical Assistance on State Data Collection, Analysis, and Reporting--National IDEA Technical Assistance Center on Early Childhood Longitudinal Data Systems; Notice #0;#0;Federal Register / Vol. 77 , No... for New Awards; Technical Assistance on State Data Collection, Analysis, and Reporting--National IDEA...
Intelligent Data Analysis in the 21st Century
NASA Astrophysics Data System (ADS)
Cohen, Paul; Adams, Niall
When IDA began, data sets were small and clean, data provenance and management were not significant issues, workflows and grid computing and cloud computing didn’t exist, and the world was not populated with billions of cellphone and computer users. The original conception of intelligent data analysis — automating some of the reasoning of skilled data analysts — has not been updated to account for the dramatic changes in what skilled data analysis means, today. IDA might update its mission to address pressing problems in areas such as climate change, habitat loss, education, and medicine. It might anticipate data analysis opportunities five to ten years out, such as customizing educational trajectories to individual students, and personalizing medical protocols. Such developments will elevate the conference and our community by shifting our focus from arbitrary measures of the performance of isolated algorithms to the practical, societal value of intelligent data analysis systems.
Parallel line analysis: multifunctional software for the biomedical sciences
NASA Technical Reports Server (NTRS)
Swank, P. R.; Lewis, M. L.; Damron, K. L.; Morrison, D. R.
1990-01-01
An easy to use, interactive FORTRAN program for analyzing the results of parallel line assays is described. The program is menu driven and consists of five major components: data entry, data editing, manual analysis, manual plotting, and automatic analysis and plotting. Data can be entered from the terminal or from previously created data files. The data editing portion of the program is used to inspect and modify data and to statistically identify outliers. The manual analysis component is used to test the assumptions necessary for parallel line assays using analysis of covariance techniques and to determine potency ratios with confidence limits. The manual plotting component provides a graphic display of the data on the terminal screen or on a standard line printer. The automatic portion runs through multiple analyses without operator input. Data may be saved in a special file to expedite input at a future time.
LANDSAT-4 image data quality analysis for energy related applications. [nuclear power plant sites
NASA Technical Reports Server (NTRS)
Wukelic, G. E. (Principal Investigator)
1983-01-01
No useable LANDSAT 4 TM data were obtained for the Hanford site in the Columbia Plateau region, but TM simulator data for a Virginia Electric Company nuclear power plant was used to test image processing algorithms. Principal component analyses of this data set clearly indicated that thermal plumes in surface waters used for reactor cooling would be discrenible. Image processing and analysis programs were successfully testing using the 7 band Arkansas test scene and preliminary analysis of TM data for the Savanah River Plant shows that current interactive, image enhancement, analysis and integration techniques can be effectively used for LANDSAT 4 data. Thermal band data appear adequate for gross estimates of thermal changes occurring near operating nuclear facilities especially in surface water bodies being used for reactor cooling purposes. Additional image processing software was written and tested which provides for more rapid and effective analysis of the 7 band TM data.
Big data analysis framework for healthcare and social sectors in Korea.
Song, Tae-Min; Ryu, Seewon
2015-01-01
We reviewed applications of big data analysis of healthcare and social services in developed countries, and subsequently devised a framework for such an analysis in Korea. We reviewed the status of implementing big data analysis of health care and social services in developed countries, and strategies used by the Ministry of Health and Welfare of Korea (Government 3.0). We formulated a conceptual framework of big data in the healthcare and social service sectors at the national level. As a specific case, we designed a process and method of social big data analysis on suicide buzz. Developed countries (e.g., the United States, the UK, Singapore, Australia, and even OECD and EU) are emphasizing the potential of big data, and using it as a tool to solve their long-standing problems. Big data strategies for the healthcare and social service sectors were formulated based on an ICT-based policy of current government and the strategic goals of the Ministry of Health and Welfare. We suggest a framework of big data analysis in the healthcare and welfare service sectors separately and assigned them tentative names: 'health risk analysis center' and 'integrated social welfare service network'. A framework of social big data analysis is presented by applying it to the prevention and proactive detection of suicide in Korea. There are some concerns with the utilization of big data in the healthcare and social welfare sectors. Thus, research on these issues must be conducted so that sophisticated and practical solutions can be reached.
Can You Fathom This? Connecting Data Analysis, Algebra, and Geometry with Probability Simulation
ERIC Educational Resources Information Center
Edwards, Michael Todd; Phelps, Steve
2008-01-01
Data analysis plays a prominent role in various facets of modern life: Schools evaluate and revise programs on the basis of test scores; policymakers make decisions on the basis of information gleaned from polling data; supermarkets stock shelves on the basis of data collected at checkout lanes. Data analysis provides teachers with new tools and…
BiNA: A Visual Analytics Tool for Biological Network Data
Gerasch, Andreas; Faber, Daniel; Küntzer, Jan; Niermann, Peter; Kohlbacher, Oliver; Lenhof, Hans-Peter; Kaufmann, Michael
2014-01-01
Interactive visual analysis of biological high-throughput data in the context of the underlying networks is an essential task in modern biomedicine with applications ranging from metabolic engineering to personalized medicine. The complexity and heterogeneity of data sets require flexible software architectures for data analysis. Concise and easily readable graphical representation of data and interactive navigation of large data sets are essential in this context. We present BiNA - the Biological Network Analyzer - a flexible open-source software for analyzing and visualizing biological networks. Highly configurable visualization styles for regulatory and metabolic network data offer sophisticated drawings and intuitive navigation and exploration techniques using hierarchical graph concepts. The generic projection and analysis framework provides powerful functionalities for visual analyses of high-throughput omics data in the context of networks, in particular for the differential analysis and the analysis of time series data. A direct interface to an underlying data warehouse provides fast access to a wide range of semantically integrated biological network databases. A plugin system allows simple customization and integration of new analysis algorithms or visual representations. BiNA is available under the 3-clause BSD license at http://bina.unipax.info/. PMID:24551056
Carroll, Regina A; Kodak, Tiffany
2014-01-01
The type of procedure used to measure a target behavior may directly influence the perceived treatment outcomes. In the present study, we examined the influence of different data-analysis procedures on the outcomes of two commonly used treatments on the vocal stereotypy of 2 children with an autism spectrum disorder. In Study 1, we compared an interrupted and uninterrupted data-analysis procedure to measure vocal stereotypy during the implementation of response interruption and redirection (RIRD). The results showed that the interrupted data-analysis procedure overestimated the effectiveness of RIRD. In Study 2, we examined the influence of different data-analysis procedures on the interpretation of the relative effects of 2 different treatments for vocal stereotypy. Specifically, we compared interrupted and uninterrupted data-analysis procedures during the implementation of RIRD and noncontingent reinforcement (NCR) as a treatment for vocal stereotypy. The results showed that, as in Study 1, the interrupted data-analysis procedure overestimated the effectiveness of RIRD; however, this effect was not apparent with NCR. These findings suggest that different types of data analysis can influence the perceived success of a treatment. © Society for the Experimental Analysis of Behavior.
Digital processing of mesoscale analysis and space sensor data
NASA Technical Reports Server (NTRS)
Hickey, J. S.; Karitani, S.
1985-01-01
The mesoscale analysis and space sensor (MASS) data management and analysis system on the research computer system is presented. The MASS data base management and analysis system was implemented on the research computer system which provides a wide range of capabilities for processing and displaying large volumes of conventional and satellite derived meteorological data. The research computer system consists of three primary computers (HP-1000F, Harris/6, and Perkin-Elmer 3250), each of which performs a specific function according to its unique capabilities. The overall tasks performed concerning the software, data base management and display capabilities of the research computer system in terms of providing a very effective interactive research tool for the digital processing of mesoscale analysis and space sensor data is described.
The statistics of identifying differentially expressed genes in Expresso and TM4: a comparison
Sioson, Allan A; Mane, Shrinivasrao P; Li, Pinghua; Sha, Wei; Heath, Lenwood S; Bohnert, Hans J; Grene, Ruth
2006-01-01
Background Analysis of DNA microarray data takes as input spot intensity measurements from scanner software and returns differential expression of genes between two conditions, together with a statistical significance assessment. This process typically consists of two steps: data normalization and identification of differentially expressed genes through statistical analysis. The Expresso microarray experiment management system implements these steps with a two-stage, log-linear ANOVA mixed model technique, tailored to individual experimental designs. The complement of tools in TM4, on the other hand, is based on a number of preset design choices that limit its flexibility. In the TM4 microarray analysis suite, normalization, filter, and analysis methods form an analysis pipeline. TM4 computes integrated intensity values (IIV) from the average intensities and spot pixel counts returned by the scanner software as input to its normalization steps. By contrast, Expresso can use either IIV data or median intensity values (MIV). Here, we compare Expresso and TM4 analysis of two experiments and assess the results against qRT-PCR data. Results The Expresso analysis using MIV data consistently identifies more genes as differentially expressed, when compared to Expresso analysis with IIV data. The typical TM4 normalization and filtering pipeline corrects systematic intensity-specific bias on a per microarray basis. Subsequent statistical analysis with Expresso or a TM4 t-test can effectively identify differentially expressed genes. The best agreement with qRT-PCR data is obtained through the use of Expresso analysis and MIV data. Conclusion The results of this research are of practical value to biologists who analyze microarray data sets. The TM4 normalization and filtering pipeline corrects microarray-specific systematic bias and complements the normalization stage in Expresso analysis. The results of Expresso using MIV data have the best agreement with qRT-PCR results. In one experiment, MIV is a better choice than IIV as input to data normalization and statistical analysis methods, as it yields as greater number of statistically significant differentially expressed genes; TM4 does not support the choice of MIV input data. Overall, the more flexible and extensive statistical models of Expresso achieve more accurate analytical results, when judged by the yardstick of qRT-PCR data, in the context of an experimental design of modest complexity. PMID:16626497
Ye, Hao; Luo, Heng; Ng, Hui Wen; Meehan, Joe; Ge, Weigong; Tong, Weida; Hong, Huixiao
2016-01-01
ToxCast data have been used to develop models for predicting in vivo toxicity. To predict the in vivo toxicity of a new chemical using a ToxCast data based model, its ToxCast bioactivity data are needed but not normally available. The capability of predicting ToxCast bioactivity data is necessary to fully utilize ToxCast data in the risk assessment of chemicals. We aimed to understand and elucidate the relationships between the chemicals and bioactivity data of the assays in ToxCast and to develop a network analysis based method for predicting ToxCast bioactivity data. We conducted modularity analysis on a quantitative network constructed from ToxCast data to explore the relationships between the assays and chemicals. We further developed Nebula (neighbor-edges based and unbiased leverage algorithm) for predicting ToxCast bioactivity data. Modularity analysis on the network constructed from ToxCast data yielded seven modules. Assays and chemicals in the seven modules were distinct. Leave-one-out cross-validation yielded a Q(2) of 0.5416, indicating ToxCast bioactivity data can be predicted by Nebula. Prediction domain analysis showed some types of ToxCast assay data could be more reliably predicted by Nebula than others. Network analysis is a promising approach to understand ToxCast data. Nebula is an effective algorithm for predicting ToxCast bioactivity data, helping fully utilize ToxCast data in the risk assessment of chemicals. Published by Elsevier Ltd.
Rethinking Meta-Analysis: Applications for Air Pollution Data and Beyond
Goodman, Julie E; Petito Boyce, Catherine; Sax, Sonja N; Beyer, Leslie A; Prueitt, Robyn L
2015-01-01
Meta-analyses offer a rigorous and transparent systematic framework for synthesizing data that can be used for a wide range of research areas, study designs, and data types. Both the outcome of meta-analyses and the meta-analysis process itself can yield useful insights for answering scientific questions and making policy decisions. Development of the National Ambient Air Quality Standards illustrates many potential applications of meta-analysis. These applications demonstrate the strengths and limitations of meta-analysis, issues that arise in various data realms, how meta-analysis design choices can influence interpretation of results, and how meta-analysis can be used to address bias and heterogeneity. Reviewing available data from a meta-analysis perspective can provide a useful framework and impetus for identifying and refining strategies for future research. Moreover, increased pervasiveness of a meta-analysis mindset—focusing on how the pieces of the research puzzle fit together—would benefit scientific research and data syntheses regardless of whether or not a quantitative meta-analysis is undertaken. While an individual meta-analysis can only synthesize studies addressing the same research question, the results of separate meta-analyses can be combined to address a question encompassing multiple data types. This observation applies to any scientific or policy area where information from a variety of disciplines must be considered to address a broader research question. PMID:25969128
Proteomics wants cRacker: automated standardized data analysis of LC-MS derived proteomic data.
Zauber, Henrik; Schulze, Waltraud X
2012-11-02
The large-scale analysis of thousands of proteins under various experimental conditions or in mutant lines has gained more and more importance in hypothesis-driven scientific research and systems biology in the past years. Quantitative analysis by large scale proteomics using modern mass spectrometry usually results in long lists of peptide ion intensities. The main interest for most researchers, however, is to draw conclusions on the protein level. Postprocessing and combining peptide intensities of a proteomic data set requires expert knowledge, and the often repetitive and standardized manual calculations can be time-consuming. The analysis of complex samples can result in very large data sets (lists with several 1000s to 100,000 entries of different peptides) that cannot easily be analyzed using standard spreadsheet programs. To improve speed and consistency of the data analysis of LC-MS derived proteomic data, we developed cRacker. cRacker is an R-based program for automated downstream proteomic data analysis including data normalization strategies for metabolic labeling and label free quantitation. In addition, cRacker includes basic statistical analysis, such as clustering of data, or ANOVA and t tests for comparison between treatments. Results are presented in editable graphic formats and in list files.
Application of Ontology Technology in Health Statistic Data Analysis.
Guo, Minjiang; Hu, Hongpu; Lei, Xingyun
2017-01-01
Research Purpose: establish health management ontology for analysis of health statistic data. Proposed Methods: this paper established health management ontology based on the analysis of the concepts in China Health Statistics Yearbook, and used protégé to define the syntactic and semantic structure of health statistical data. six classes of top-level ontology concepts and their subclasses had been extracted and the object properties and data properties were defined to establish the construction of these classes. By ontology instantiation, we can integrate multi-source heterogeneous data and enable administrators to have an overall understanding and analysis of the health statistic data. ontology technology provides a comprehensive and unified information integration structure of the health management domain and lays a foundation for the efficient analysis of multi-source and heterogeneous health system management data and enhancement of the management efficiency.
Comprehensive Occupational Data Analysis Programs 80 (CODAP80) User’s Manual.
1984-01-01
8217 -____ ____ ___ - -- -- -~--- - PA"m: 84/006a 5, -(,0 i 4. TH IkdmS. lbne ISCOMlPREHENS IVE OCCUPATIONAL DATA ANALYSIS PROGRAMS 80 (CODAP80) JAUARY 1984 User s Manua 7. Auw...AD-A144 125 COMPEHIENSIVE OCCUPATIONAL DATA ANALYSIS PROGRAMS 80 ’i (CODAP8OI USES MANUAtfUD NAVY OCCUPATIONAL DEVELOPMENT AND ANALYSIS CENTER...e COMPREVENSIVE OCCUPATIONAL DATA ANALYSIS PROGRAMS 80 (CODAP80) JANUARY 1984Uer s Manual 7. Author(s) L. pt, t wis ONapetion ft.p No. N/A 9
A Conceptual Model for Multidimensional Analysis of Documents
NASA Astrophysics Data System (ADS)
Ravat, Franck; Teste, Olivier; Tournier, Ronan; Zurlfluh, Gilles
Data warehousing and OLAP are mainly used for the analysis of transactional data. Nowadays, with the evolution of Internet, and the development of semi-structured data exchange format (such as XML), it is possible to consider entire fragments of data such as documents as analysis sources. As a consequence, an adapted multidimensional analysis framework needs to be provided. In this paper, we introduce an OLAP multidimensional conceptual model without facts. This model is based on the unique concept of dimensions and is adapted for multidimensional document analysis. We also provide a set of manipulation operations.
Collection Analysis: Powerful Ways To Collect, Analyze, and Present Your Data.
ERIC Educational Resources Information Center
Hart, Amy
2003-01-01
Discussion of collection analysis in school libraries focuses on the kinds of data used and how to use library automation software to collect the data. Describes the use of Microsoft Excel and its chart-making capabilities to enhance the presentation of the analysis and suggests ways to use collection analysis output. (LRW)
Analysis of Doppler radar windshear data
NASA Technical Reports Server (NTRS)
Williams, F.; Mckinney, P.; Ozmen, F.
1989-01-01
The objective of this analysis is to process Lincoln Laboratory Doppler radar data obtained during FLOWS testing at Huntsville, Alabama, in the summer of 1986, to characterize windshear events. The processing includes plotting velocity and F-factor profiles, histogram analysis to summarize statistics, and correlation analysis to demonstrate any correlation between different data fields.
The Interaction between Multimedia Data Analysis and Theory Development in Design Research
ERIC Educational Resources Information Center
van Nes, Fenna; Doorman, Michiel
2010-01-01
Mathematics education researchers conducting instruction experiments using a design research methodology are challenged with the analysis of often complex and large amounts of qualitative data. In this paper, we present two case studies that show how multimedia analysis software can greatly support video data analysis and theory development in…
18 CFR 300.12 - Analysis of supporting data.
Code of Federal Regulations, 2011 CFR
2011-04-01
... 18 Conservation of Power and Water Resources 1 2011-04-01 2011-04-01 false Analysis of supporting... APPROVAL OF THE RATES OF FEDERAL POWER MARKETING ADMINISTRATIONS Filing Requirements § 300.12 Analysis of supporting data. (a) An analysis of the data provided under § 300.11 must be supported by an appropriate...
18 CFR 300.12 - Analysis of supporting data.
Code of Federal Regulations, 2010 CFR
2010-04-01
... 18 Conservation of Power and Water Resources 1 2010-04-01 2010-04-01 false Analysis of supporting... APPROVAL OF THE RATES OF FEDERAL POWER MARKETING ADMINISTRATIONS Filing Requirements § 300.12 Analysis of supporting data. (a) An analysis of the data provided under § 300.11 must be supported by an appropriate...
ERIC Educational Resources Information Center
Putten, Jim Vander; Nolen, Amanda L.
2010-01-01
This study compared qualitative research results obtained by manual constant comparative analysis with results obtained by computer software analysis of the same data. An investigated about issues of trustworthiness and accuracy ensued. Results indicated that the inductive constant comparative data analysis generated 51 codes and two coding levels…
Neo: an object model for handling electrophysiology data in multiple formats
Garcia, Samuel; Guarino, Domenico; Jaillet, Florent; Jennings, Todd; Pröpper, Robert; Rautenberg, Philipp L.; Rodgers, Chris C.; Sobolev, Andrey; Wachtler, Thomas; Yger, Pierre; Davison, Andrew P.
2014-01-01
Neuroscientists use many different software tools to acquire, analyze and visualize electrophysiological signals. However, incompatible data models and file formats make it difficult to exchange data between these tools. This reduces scientific productivity, renders potentially useful analysis methods inaccessible and impedes collaboration between labs. A common representation of the core data would improve interoperability and facilitate data-sharing. To that end, we propose here a language-independent object model, named “Neo,” suitable for representing data acquired from electroencephalographic, intracellular, or extracellular recordings, or generated from simulations. As a concrete instantiation of this object model we have developed an open source implementation in the Python programming language. In addition to representing electrophysiology data in memory for the purposes of analysis and visualization, the Python implementation provides a set of input/output (IO) modules for reading/writing the data from/to a variety of commonly used file formats. Support is included for formats produced by most of the major manufacturers of electrophysiology recording equipment and also for more generic formats such as MATLAB. Data representation and data analysis are conceptually separate: it is easier to write robust analysis code if it is focused on analysis and relies on an underlying package to handle data representation. For that reason, and also to be as lightweight as possible, the Neo object model and the associated Python package are deliberately limited to representation of data, with no functions for data analysis or visualization. Software for neurophysiology data analysis and visualization built on top of Neo automatically gains the benefits of interoperability, easier data sharing and automatic format conversion; there is already a burgeoning ecosystem of such tools. We intend that Neo should become the standard basis for Python tools in neurophysiology. PMID:24600386
Neo: an object model for handling electrophysiology data in multiple formats.
Garcia, Samuel; Guarino, Domenico; Jaillet, Florent; Jennings, Todd; Pröpper, Robert; Rautenberg, Philipp L; Rodgers, Chris C; Sobolev, Andrey; Wachtler, Thomas; Yger, Pierre; Davison, Andrew P
2014-01-01
Neuroscientists use many different software tools to acquire, analyze and visualize electrophysiological signals. However, incompatible data models and file formats make it difficult to exchange data between these tools. This reduces scientific productivity, renders potentially useful analysis methods inaccessible and impedes collaboration between labs. A common representation of the core data would improve interoperability and facilitate data-sharing. To that end, we propose here a language-independent object model, named "Neo," suitable for representing data acquired from electroencephalographic, intracellular, or extracellular recordings, or generated from simulations. As a concrete instantiation of this object model we have developed an open source implementation in the Python programming language. In addition to representing electrophysiology data in memory for the purposes of analysis and visualization, the Python implementation provides a set of input/output (IO) modules for reading/writing the data from/to a variety of commonly used file formats. Support is included for formats produced by most of the major manufacturers of electrophysiology recording equipment and also for more generic formats such as MATLAB. Data representation and data analysis are conceptually separate: it is easier to write robust analysis code if it is focused on analysis and relies on an underlying package to handle data representation. For that reason, and also to be as lightweight as possible, the Neo object model and the associated Python package are deliberately limited to representation of data, with no functions for data analysis or visualization. Software for neurophysiology data analysis and visualization built on top of Neo automatically gains the benefits of interoperability, easier data sharing and automatic format conversion; there is already a burgeoning ecosystem of such tools. We intend that Neo should become the standard basis for Python tools in neurophysiology.
Data for Renewable Energy Planning, Policy, and Investment
DOE Office of Scientific and Technical Information (OSTI.GOV)
Cox, Sarah L
Reliable, robust, and validated data are critical for informed planning, policy development, and investment in the clean energy sector. The Renewable Energy (RE) Explorer was developed to support data-driven renewable energy analysis that can inform key renewable energy decisions globally. This document presents the types of geospatial and other data at the core of renewable energy analysis and decision making. Individual data sets used to inform decisions vary in relation to spatial and temporal resolution, quality, and overall usefulness. From Data to Decisions, a complementary geospatial data and analysis decision guide, provides an in-depth view of these and other considerationsmore » to enable data-driven planning, policymaking, and investment. Data support a wide variety of renewable energy analyses and decisions, including technical and economic potential assessment, renewable energy zone analysis, grid integration, risk and resiliency identification, electrification, and distributed solar photovoltaic potential. This fact sheet provides information on the types of data that are important for renewable energy decision making using the RE Data Explorer or similar types of geospatial analysis tools.« less
Data handling and analysis for the 1971 corn blight watch experiment
NASA Technical Reports Server (NTRS)
Anuta, P. E.; Phillips, T. L.
1973-01-01
The overall corn blight watch experiment data flow is described and the organization of the LARS/Purdue data center is discussed. Data analysis techniques are discussed in general and the use of statistical multispectral pattern recognition methods for automatic computer analysis of aircraft scanner data is described. Some of the results obtained are discussed and the implications of the experiment on future data communication requirements for earth resource survey systems is discussed.
A statistical package for computing time and frequency domain analysis
NASA Technical Reports Server (NTRS)
Brownlow, J.
1978-01-01
The spectrum analysis (SPA) program is a general purpose digital computer program designed to aid in data analysis. The program does time and frequency domain statistical analyses as well as some preanalysis data preparation. The capabilities of the SPA program include linear trend removal and/or digital filtering of data, plotting and/or listing of both filtered and unfiltered data, time domain statistical characterization of data, and frequency domain statistical characterization of data.
A method for data base management and analysis for wind tunnel data
NASA Technical Reports Server (NTRS)
Biser, Aileen O.
1987-01-01
To respond to the need for improved data base management and analysis capabilities for wind-tunnel data at the Langley 16-Foot Transonic Tunnel, research was conducted into current methods of managing wind-tunnel data and a method was developed as a solution to this need. This paper describes the development of the data base management and analysis method for wind-tunnel data. The design and implementation of the software system are discussed and examples of its use are shown.
Test data analysis for concentrating photovoltaic arrays
NASA Astrophysics Data System (ADS)
Maish, A. B.; Cannon, J. E.
A test data analysis approach for use with steady state efficiency measurements taken on concentrating photovoltaic arrays is presented. The analysis procedures can be used to identify based and erroneous data. The steps involved in analyzing the test data are screening the data, developing coefficients for the performance equation, analyzing statistics to ensure adequacy of the regression fit to the data, and plotting the data. In addition, this paper analyzes the sources and magnitudes of precision and bias errors that affect measurement accuracy are analyzed.
NGNP Data Management and Analysis System Analysis and Web Delivery Capabilities
DOE Office of Scientific and Technical Information (OSTI.GOV)
Cynthia D. Gentillon
2011-09-01
Projects for the Very High Temperature Reactor (VHTR) Technology Development Office provide data in support of Nuclear Regulatory Commission licensing of the very high temperature reactor. Fuel and materials to be used in the reactor are tested and characterized to quantify performance in high-temperature and high-fluence environments. The NGNP Data Management and Analysis System (NDMAS) at the Idaho National Laboratory has been established to ensure that VHTR data are (1) qualified for use, (2) stored in a readily accessible electronic form, and (3) analyzed to extract useful results. This document focuses on the third NDMAS objective. It describes capabilities formore » displaying the data in meaningful ways and for data analysis to identify useful relationships among the measured quantities. The capabilities are described from the perspective of NDMAS users, starting with those who just view experimental data and analytical results on the INL NDMAS web portal. Web display and delivery capabilities are described in detail. Also the current web pages that show Advanced Gas Reactor, Advanced Graphite Capsule, and High Temperature Materials test results are itemized. Capabilities available to NDMAS developers are more extensive, and are described using a second series of examples. Much of the data analysis efforts focus on understanding how thermocouple measurements relate to simulated temperatures and other experimental parameters. Statistical control charts and correlation monitoring provide an ongoing assessment of instrument accuracy. Data analysis capabilities are virtually unlimited for those who use the NDMAS web data download capabilities and the analysis software of their choice. Overall, the NDMAS provides convenient data analysis and web delivery capabilities for studying a very large and rapidly increasing database of well-documented, pedigreed data.« less
Reduction and analysis of data collected during the electromagnetic tornado experiment
NASA Technical Reports Server (NTRS)
Davisson, L. D.; Bradbury, J.
1975-01-01
Progress is reviewed on the reduction and analysis of tornado data collected on analog tape. The strip chart recording of 7 tracks from all available analog data for quick look analysis is emphasized.
Hopkins, Jesse Bennett; Gillilan, Richard E; Skou, Soren
2017-10-01
BioXTAS RAW is a graphical-user-interface-based free open-source Python program for reduction and analysis of small-angle X-ray solution scattering (SAXS) data. The software is designed for biological SAXS data and enables creation and plotting of one-dimensional scattering profiles from two-dimensional detector images, standard data operations such as averaging and subtraction and analysis of radius of gyration and molecular weight, and advanced analysis such as calculation of inverse Fourier transforms and envelopes. It also allows easy processing of inline size-exclusion chromatography coupled SAXS data and data deconvolution using the evolving factor analysis method. It provides an alternative to closed-source programs such as Primus and ScÅtter for primary data analysis. Because it can calibrate, mask and integrate images it also provides an alternative to synchrotron beamline pipelines that scientists can install on their own computers and use both at home and at the beamline.
Anticipated Changes in Conducting Scientific Data-Analysis Research in the Big-Data Era
NASA Astrophysics Data System (ADS)
Kuo, Kwo-Sen; Seablom, Michael; Clune, Thomas; Ramachandran, Rahul
2014-05-01
A Big-Data environment is one that is capable of orchestrating quick-turnaround analyses involving large volumes of data for numerous simultaneous users. Based on our experiences with a prototype Big-Data analysis environment, we anticipate some important changes in research behaviors and processes while conducting scientific data-analysis research in the near future as such Big-Data environments become the mainstream. The first anticipated change will be the reduced effort and difficulty in most parts of the data management process. A Big-Data analysis environment is likely to house most of the data required for a particular research discipline along with appropriate analysis capabilities. This will reduce the need for researchers to download local copies of data. In turn, this also reduces the need for compute and storage procurement by individual researchers or groups, as well as associated maintenance and management afterwards. It is almost certain that Big-Data environments will require a different "programming language" to fully exploit the latent potential. In addition, the process of extending the environment to provide new analysis capabilities will likely be more involved than, say, compiling a piece of new or revised code. We thus anticipate that researchers will require support from dedicated organizations associated with the environment that are composed of professional software engineers and data scientists. A major benefit will likely be that such extensions are of higher-quality and broader applicability than ad hoc changes by physical scientists. Another anticipated significant change is improved collaboration among the researchers using the same environment. Since the environment is homogeneous within itself, many barriers to collaboration are minimized or eliminated. For example, data and analysis algorithms can be seamlessly shared, reused and re-purposed. In conclusion, we will be able to achieve a new level of scientific productivity in the Big-Data analysis environments.
Anticipated Changes in Conducting Scientific Data-Analysis Research in the Big-Data Era
NASA Technical Reports Server (NTRS)
Kuo, Kwo-Sen; Seablom, Michael; Clune, Thomas; Ramachandran, Rahul
2014-01-01
A Big-Data environment is one that is capable of orchestrating quick-turnaround analyses involving large volumes of data for numerous simultaneous users. Based on our experiences with a prototype Big-Data analysis environment, we anticipate some important changes in research behaviors and processes while conducting scientific data-analysis research in the near future as such Big-Data environments become the mainstream. The first anticipated change will be the reduced effort and difficulty in most parts of the data management process. A Big-Data analysis environment is likely to house most of the data required for a particular research discipline along with appropriate analysis capabilities. This will reduce the need for researchers to download local copies of data. In turn, this also reduces the need for compute and storage procurement by individual researchers or groups, as well as associated maintenance and management afterwards. It is almost certain that Big-Data environments will require a different "programming language" to fully exploit the latent potential. In addition, the process of extending the environment to provide new analysis capabilities will likely be more involved than, say, compiling a piece of new or revised code.We thus anticipate that researchers will require support from dedicated organizations associated with the environment that are composed of professional software engineers and data scientists. A major benefit will likely be that such extensions are of higherquality and broader applicability than ad hoc changes by physical scientists. Another anticipated significant change is improved collaboration among the researchers using the same environment. Since the environment is homogeneous within itself, many barriers to collaboration are minimized or eliminated. For example, data and analysis algorithms can be seamlessly shared, reused and re-purposed. In conclusion, we will be able to achieve a new level of scientific productivity in the Big-Data analysis environments.
Tool for Rapid Analysis of Monte Carlo Simulations
NASA Technical Reports Server (NTRS)
Restrepo, Carolina; McCall, Kurt E.; Hurtado, John E.
2011-01-01
Designing a spacecraft, or any other complex engineering system, requires extensive simulation and analysis work. Oftentimes, the large amounts of simulation data generated are very di cult and time consuming to analyze, with the added risk of overlooking potentially critical problems in the design. The authors have developed a generic data analysis tool that can quickly sort through large data sets and point an analyst to the areas in the data set that cause specific types of failures. The Tool for Rapid Analysis of Monte Carlo simulations (TRAM) has been used in recent design and analysis work for the Orion vehicle, greatly decreasing the time it takes to evaluate performance requirements. A previous version of this tool was developed to automatically identify driving design variables in Monte Carlo data sets. This paper describes a new, parallel version, of TRAM implemented on a graphical processing unit, and presents analysis results for NASA's Orion Monte Carlo data to demonstrate its capabilities.
Damming the genomic data flood using a comprehensive analysis and storage data structure
Bouffard, Marc; Phillips, Michael S.; Brown, Andrew M.K.; Marsh, Sharon; Tardif, Jean-Claude; van Rooij, Tibor
2010-01-01
Data generation, driven by rapid advances in genomic technologies, is fast outpacing our analysis capabilities. Faced with this flood of data, more hardware and software resources are added to accommodate data sets whose structure has not specifically been designed for analysis. This leads to unnecessarily lengthy processing times and excessive data handling and storage costs. Current efforts to address this have centered on developing new indexing schemas and analysis algorithms, whereas the root of the problem lies in the format of the data itself. We have developed a new data structure for storing and analyzing genotype and phenotype data. By leveraging data normalization techniques, database management system capabilities and the use of a novel multi-table, multidimensional database structure we have eliminated the following: (i) unnecessarily large data set size due to high levels of redundancy, (ii) sequential access to these data sets and (iii) common bottlenecks in analysis times. The resulting novel data structure horizontally divides the data to circumvent traditional problems associated with the use of databases for very large genomic data sets. The resulting data set required 86% less disk space and performed analytical calculations 6248 times faster compared to a standard approach without any loss of information. Database URL: http://castor.pharmacogenomics.ca PMID:21159730
Cloud-Based Orchestration of a Model-Based Power and Data Analysis Toolchain
NASA Technical Reports Server (NTRS)
Post, Ethan; Cole, Bjorn; Dinkel, Kevin; Kim, Hongman; Lee, Erich; Nairouz, Bassem
2016-01-01
The proposed Europa Mission concept contains many engineering and scientific instruments that consume varying amounts of power and produce varying amounts of data throughout the mission. System-level power and data usage must be well understood and analyzed to verify design requirements. Numerous cross-disciplinary tools and analysis models are used to simulate the system-level spacecraft power and data behavior. This paper addresses the problem of orchestrating a consistent set of models, tools, and data in a unified analysis toolchain when ownership is distributed among numerous domain experts. An analysis and simulation environment was developed as a way to manage the complexity of the power and data analysis toolchain and to reduce the simulation turnaround time. A system model data repository is used as the trusted store of high-level inputs and results while other remote servers are used for archival of larger data sets and for analysis tool execution. Simulation data passes through numerous domain-specific analysis tools and end-to-end simulation execution is enabled through a web-based tool. The use of a cloud-based service facilitates coordination among distributed developers and enables scalable computation and storage needs, and ensures a consistent execution environment. Configuration management is emphasized to maintain traceability between current and historical simulation runs and their corresponding versions of models, tools and data.
Data engineering systems: Computerized modeling and data bank capabilities for engineering analysis
NASA Technical Reports Server (NTRS)
Kopp, H.; Trettau, R.; Zolotar, B.
1984-01-01
The Data Engineering System (DES) is a computer-based system that organizes technical data and provides automated mechanisms for storage, retrieval, and engineering analysis. The DES combines the benefits of a structured data base system with automated links to large-scale analysis codes. While the DES provides the user with many of the capabilities of a computer-aided design (CAD) system, the systems are actually quite different in several respects. A typical CAD system emphasizes interactive graphics capabilities and organizes data in a manner that optimizes these graphics. On the other hand, the DES is a computer-aided engineering system intended for the engineer who must operationally understand an existing or planned design or who desires to carry out additional technical analysis based on a particular design. The DES emphasizes data retrieval in a form that not only provides the engineer access to search and display the data but also links the data automatically with the computer analysis codes.
Maximum unbiased validation (MUV) data sets for virtual screening based on PubChem bioactivity data.
Rohrer, Sebastian G; Baumann, Knut
2009-02-01
Refined nearest neighbor analysis was recently introduced for the analysis of virtual screening benchmark data sets. It constitutes a technique from the field of spatial statistics and provides a mathematical framework for the nonparametric analysis of mapped point patterns. Here, refined nearest neighbor analysis is used to design benchmark data sets for virtual screening based on PubChem bioactivity data. A workflow is devised that purges data sets of compounds active against pharmaceutically relevant targets from unselective hits. Topological optimization using experimental design strategies monitored by refined nearest neighbor analysis functions is applied to generate corresponding data sets of actives and decoys that are unbiased with regard to analogue bias and artificial enrichment. These data sets provide a tool for Maximum Unbiased Validation (MUV) of virtual screening methods. The data sets and a software package implementing the MUV design workflow are freely available at http://www.pharmchem.tu-bs.de/lehre/baumann/MUV.html.
GenePublisher: Automated analysis of DNA microarray data.
Knudsen, Steen; Workman, Christopher; Sicheritz-Ponten, Thomas; Friis, Carsten
2003-07-01
GenePublisher, a system for automatic analysis of data from DNA microarray experiments, has been implemented with a web interface at http://www.cbs.dtu.dk/services/GenePublisher. Raw data are uploaded to the server together with a specification of the data. The server performs normalization, statistical analysis and visualization of the data. The results are run against databases of signal transduction pathways, metabolic pathways and promoter sequences in order to extract more information. The results of the entire analysis are summarized in report form and returned to the user.
NASA Technical Reports Server (NTRS)
Smith, D. R.; Leslie, F. W.
1984-01-01
The Purdue Regional Objective Analysis of the Mesoscale (PROAM) is a successive correction type scheme for the analysis of surface meteorological data. The scheme is subjected to a series of experiments to evaluate its performance under a variety of analysis conditions. The tests include use of a known analytic temperature distribution to quantify error bounds for the scheme. Similar experiments were conducted using actual atmospheric data. Results indicate that the multiple pass technique increases the accuracy of the analysis. Furthermore, the tests suggest appropriate values for the analysis parameters in resolving disturbances for the data set used in this investigation.
Integrative sparse principal component analysis of gene expression data.
Liu, Mengque; Fan, Xinyan; Fang, Kuangnan; Zhang, Qingzhao; Ma, Shuangge
2017-12-01
In the analysis of gene expression data, dimension reduction techniques have been extensively adopted. The most popular one is perhaps the PCA (principal component analysis). To generate more reliable and more interpretable results, the SPCA (sparse PCA) technique has been developed. With the "small sample size, high dimensionality" characteristic of gene expression data, the analysis results generated from a single dataset are often unsatisfactory. Under contexts other than dimension reduction, integrative analysis techniques, which jointly analyze the raw data of multiple independent datasets, have been developed and shown to outperform "classic" meta-analysis and other multidatasets techniques and single-dataset analysis. In this study, we conduct integrative analysis by developing the iSPCA (integrative SPCA) method. iSPCA achieves the selection and estimation of sparse loadings using a group penalty. To take advantage of the similarity across datasets and generate more accurate results, we further impose contrasted penalties. Different penalties are proposed to accommodate different data conditions. Extensive simulations show that iSPCA outperforms the alternatives under a wide spectrum of settings. The analysis of breast cancer and pancreatic cancer data further shows iSPCA's satisfactory performance. © 2017 WILEY PERIODICALS, INC.
Investigation of Antarctic crust and upper mantle using MAGSAT and other geophysical data
NASA Technical Reports Server (NTRS)
Bentley, C. R. (Principal Investigator)
1981-01-01
Progress in processing and analysis of Investigator B MAGSAT data is reported. Data processing tasks required prior to data analysis, including translation and reformatting of tapes and development of computer routines, were performed. A scalar anomaly map of Antarctica is near completion. Data analysis included a qualitative correlation of NASA's 4/81 scalar map of Antarctica with other geopotential data and correlation of POGO and continental scale gravity data with MAGSAT data. A magnetic high was found to exist over the Ross Embayment.
Head Start Program and Cost Data Analysis: Final Report - Volume II.
ERIC Educational Resources Information Center
Cordes, Joseph; And Others
This second volume of the Head Start Program and Cost Data Analysis Final Report analyzes data from sources other than the Head Start Program Information Report (PIR). The report is divided into three sections: Distributional Impact of Head Start Financing, Pilot Study of Program Compliance, and Recommendations for Secondary Data Analysis. The…
NASA Technical Reports Server (NTRS)
Melnick, Gary J.
1990-01-01
The Mission Operations and Data Analysis Plan is presented for the Submillimeter Wave Astronomy Satellite (SWAS) Project. It defines organizational responsibilities, discusses target selection and navigation, specifies instrument command and data requirements, defines data reduction and analysis hardware and software requirements, and discusses mission operations center staffing requirements.
Maturity Curve of Systems Engineering
2008-12-01
b. Analysis of Data .......................................................... 41 4. Fuzzy Logic...the collection and analysis of data . (Hart, 1998) 13 1. Methodology Overview A qualitative approach in acquiring and managing the data was used...for this analysis . A quantitative tool was used to examine and evaluate the data . The qualitative approach was intended to sort the acquired traits
Code of Federal Regulations, 2014 CFR
2014-07-01
..., or interpretation of any geological data and information. Initial analysis and processing are the stages of analysis or processing where the data and information first become available for in-house... geochemical) data and information describing each operation of analysis, processing, and interpretation; (2...
Code of Federal Regulations, 2012 CFR
2012-07-01
..., or interpretation of any geological data and information. Initial analysis and processing are the stages of analysis or processing where the data and information first become available for in-house... geochemical) data and information describing each operation of analysis, processing, and interpretation; (2...
Code of Federal Regulations, 2013 CFR
2013-07-01
..., or interpretation of any geological data and information. Initial analysis and processing are the stages of analysis or processing where the data and information first become available for in-house... geochemical) data and information describing each operation of analysis, processing, and interpretation; (2...
Conducting Qualitative Data Analysis: Managing Dynamic Tensions within
ERIC Educational Resources Information Center
Chenail, Ronald J.
2012-01-01
In the third of a series of "how-to" essays on conducting qualitative data analysis, Ron Chenail examines the dynamic tensions within the process of qualitative data analysis that qualitative researchers must manage in order to produce credible and creative results. These tensions include (a) the qualities of the data and the qualitative data…
JOINT AND INDIVIDUAL VARIATION EXPLAINED (JIVE) FOR INTEGRATED ANALYSIS OF MULTIPLE DATA TYPES.
Lock, Eric F; Hoadley, Katherine A; Marron, J S; Nobel, Andrew B
2013-03-01
Research in several fields now requires the analysis of datasets in which multiple high-dimensional types of data are available for a common set of objects. In particular, The Cancer Genome Atlas (TCGA) includes data from several diverse genomic technologies on the same cancerous tumor samples. In this paper we introduce Joint and Individual Variation Explained (JIVE), a general decomposition of variation for the integrated analysis of such datasets. The decomposition consists of three terms: a low-rank approximation capturing joint variation across data types, low-rank approximations for structured variation individual to each data type, and residual noise. JIVE quantifies the amount of joint variation between data types, reduces the dimensionality of the data, and provides new directions for the visual exploration of joint and individual structure. The proposed method represents an extension of Principal Component Analysis and has clear advantages over popular two-block methods such as Canonical Correlation Analysis and Partial Least Squares. A JIVE analysis of gene expression and miRNA data on Glioblastoma Multiforme tumor samples reveals gene-miRNA associations and provides better characterization of tumor types.
Exploring NASA and ESA Atmospheric Data Using GIOVANNI, the Online Visualization and Analysis Tool
NASA Technical Reports Server (NTRS)
Leptoukh, Gregory
2007-01-01
Giovanni, the NASA Goddard online visualization and analysis tool (http://giovanni.gsfc.nasa.gov) allows users explore various atmospheric phenomena without learning remote sensing data formats and downloading voluminous data. Using NASA MODIS (Terra and Aqua) and ESA MERIS (ENVISAT) aerosol data as an example, we demonstrate Giovanni usage for online multi-sensor remote sensing data comparison and analysis.
Zackay, Arie; Steinhoff, Christine
2010-12-15
Exploration of DNA methylation and its impact on various regulatory mechanisms has become a very active field of research. Simultaneously there is an arising need for tools to process and analyse the data together with statistical investigation and visualisation. MethVisual is a new application that enables exploratory analysis and intuitive visualization of DNA methylation data as is typically generated by bisulfite sequencing. The package allows the import of DNA methylation sequences, aligns them and performs quality control comparison. It comprises basic analysis steps as lollipop visualization, co-occurrence display of methylation of neighbouring and distant CpG sites, summary statistics on methylation status, clustering and correspondence analysis. The package has been developed for methylation data but can be also used for other data types for which binary coding can be inferred. The application of the package, as well as a comparison to existing DNA methylation analysis tools and its workflow based on two datasets is presented in this paper. The R package MethVisual offers various analysis procedures for data that can be binarized, in particular for bisulfite sequenced methylation data. R/Bioconductor has become one of the most important environments for statistical analysis of various types of biological and medical data. Therefore, any data analysis within R that allows the integration of various data types as provided from different technological platforms is convenient. It is the first and so far the only specific package for DNA methylation analysis, in particular for bisulfite sequenced data available in R/Bioconductor enviroment. The package is available for free at http://methvisual.molgen.mpg.de/ and from the Bioconductor Consortium http://www.bioconductor.org.
2010-01-01
Background Exploration of DNA methylation and its impact on various regulatory mechanisms has become a very active field of research. Simultaneously there is an arising need for tools to process and analyse the data together with statistical investigation and visualisation. Findings MethVisual is a new application that enables exploratory analysis and intuitive visualization of DNA methylation data as is typically generated by bisulfite sequencing. The package allows the import of DNA methylation sequences, aligns them and performs quality control comparison. It comprises basic analysis steps as lollipop visualization, co-occurrence display of methylation of neighbouring and distant CpG sites, summary statistics on methylation status, clustering and correspondence analysis. The package has been developed for methylation data but can be also used for other data types for which binary coding can be inferred. The application of the package, as well as a comparison to existing DNA methylation analysis tools and its workflow based on two datasets is presented in this paper. Conclusions The R package MethVisual offers various analysis procedures for data that can be binarized, in particular for bisulfite sequenced methylation data. R/Bioconductor has become one of the most important environments for statistical analysis of various types of biological and medical data. Therefore, any data analysis within R that allows the integration of various data types as provided from different technological platforms is convenient. It is the first and so far the only specific package for DNA methylation analysis, in particular for bisulfite sequenced data available in R/Bioconductor enviroment. The package is available for free at http://methvisual.molgen.mpg.de/ and from the Bioconductor Consortium http://www.bioconductor.org. PMID:21159174
Titaley, Ivan A; Ogba, O Maduka; Chibwe, Leah; Hoh, Eunha; Cheong, Paul H-Y; Simonich, Staci L Massey
2018-03-16
Non-targeted analysis of environmental samples, using comprehensive two-dimensional gas chromatography coupled with time-of-flight mass spectrometry (GC × GC/ToF-MS), poses significant data analysis challenges due to the large number of possible analytes. Non-targeted data analysis of complex mixtures is prone to human bias and is laborious, particularly for comparative environmental samples such as contaminated soil pre- and post-bioremediation. To address this research bottleneck, we developed OCTpy, a Python™ script that acts as a data reduction filter to automate GC × GC/ToF-MS data analysis from LECO ® ChromaTOF ® software and facilitates selection of analytes of interest based on peak area comparison between comparative samples. We used data from polycyclic aromatic hydrocarbon (PAH) contaminated soil, pre- and post-bioremediation, to assess the effectiveness of OCTpy in facilitating the selection of analytes that have formed or degraded following treatment. Using datasets from the soil extracts pre- and post-bioremediation, OCTpy selected, on average, 18% of the initial suggested analytes generated by the LECO ® ChromaTOF ® software Statistical Compare feature. Based on this list, 63-100% of the candidate analytes identified by a highly trained individual were also selected by OCTpy. This process was accomplished in several minutes per sample, whereas manual data analysis took several hours per sample. OCTpy automates the analysis of complex mixtures of comparative samples, reduces the potential for human error during heavy data handling and decreases data analysis time by at least tenfold. Copyright © 2018 Elsevier B.V. All rights reserved.
Tool for Rapid Analysis of Monte Carlo Simulations
NASA Technical Reports Server (NTRS)
Restrepo, Carolina; McCall, Kurt E.; Hurtado, John E.
2013-01-01
Designing a spacecraft, or any other complex engineering system, requires extensive simulation and analysis work. Oftentimes, the large amounts of simulation data generated are very difficult and time consuming to analyze, with the added risk of overlooking potentially critical problems in the design. The authors have developed a generic data analysis tool that can quickly sort through large data sets and point an analyst to the areas in the data set that cause specific types of failures. The first version of this tool was a serial code and the current version is a parallel code, which has greatly increased the analysis capabilities. This paper describes the new implementation of this analysis tool on a graphical processing unit, and presents analysis results for NASA's Orion Monte Carlo data to demonstrate its capabilities.
Cao, Hongbao; Duan, Junbo; Lin, Dongdong; Shugart, Yin Yao; Calhoun, Vince; Wang, Yu-Ping
2014-11-15
Integrative analysis of multiple data types can take advantage of their complementary information and therefore may provide higher power to identify potential biomarkers that would be missed using individual data analysis. Due to different natures of diverse data modality, data integration is challenging. Here we address the data integration problem by developing a generalized sparse model (GSM) using weighting factors to integrate multi-modality data for biomarker selection. As an example, we applied the GSM model to a joint analysis of two types of schizophrenia data sets: 759,075 SNPs and 153,594 functional magnetic resonance imaging (fMRI) voxels in 208 subjects (92 cases/116 controls). To solve this small-sample-large-variable problem, we developed a novel sparse representation based variable selection (SRVS) algorithm, with the primary aim to identify biomarkers associated with schizophrenia. To validate the effectiveness of the selected variables, we performed multivariate classification followed by a ten-fold cross validation. We compared our proposed SRVS algorithm with an earlier sparse model based variable selection algorithm for integrated analysis. In addition, we compared with the traditional statistics method for uni-variant data analysis (Chi-squared test for SNP data and ANOVA for fMRI data). Results showed that our proposed SRVS method can identify novel biomarkers that show stronger capability in distinguishing schizophrenia patients from healthy controls. Moreover, better classification ratios were achieved using biomarkers from both types of data, suggesting the importance of integrative analysis. Copyright © 2014 Elsevier Inc. All rights reserved.
DataSHIELD: taking the analysis to the data, not the data to the analysis.
Gaye, Amadou; Marcon, Yannick; Isaeva, Julia; LaFlamme, Philippe; Turner, Andrew; Jones, Elinor M; Minion, Joel; Boyd, Andrew W; Newby, Christopher J; Nuotio, Marja-Liisa; Wilson, Rebecca; Butters, Oliver; Murtagh, Barnaby; Demir, Ipek; Doiron, Dany; Giepmans, Lisette; Wallace, Susan E; Budin-Ljøsne, Isabelle; Oliver Schmidt, Carsten; Boffetta, Paolo; Boniol, Mathieu; Bota, Maria; Carter, Kim W; deKlerk, Nick; Dibben, Chris; Francis, Richard W; Hiekkalinna, Tero; Hveem, Kristian; Kvaløy, Kirsti; Millar, Sean; Perry, Ivan J; Peters, Annette; Phillips, Catherine M; Popham, Frank; Raab, Gillian; Reischl, Eva; Sheehan, Nuala; Waldenberger, Melanie; Perola, Markus; van den Heuvel, Edwin; Macleod, John; Knoppers, Bartha M; Stolk, Ronald P; Fortier, Isabel; Harris, Jennifer R; Woffenbuttel, Bruce H R; Murtagh, Madeleine J; Ferretti, Vincent; Burton, Paul R
2014-12-01
Research in modern biomedicine and social science requires sample sizes so large that they can often only be achieved through a pooled co-analysis of data from several studies. But the pooling of information from individuals in a central database that may be queried by researchers raises important ethico-legal questions and can be controversial. In the UK this has been highlighted by recent debate and controversy relating to the UK's proposed 'care.data' initiative, and these issues reflect important societal and professional concerns about privacy, confidentiality and intellectual property. DataSHIELD provides a novel technological solution that can circumvent some of the most basic challenges in facilitating the access of researchers and other healthcare professionals to individual-level data. Commands are sent from a central analysis computer (AC) to several data computers (DCs) storing the data to be co-analysed. The data sets are analysed simultaneously but in parallel. The separate parallelized analyses are linked by non-disclosive summary statistics and commands transmitted back and forth between the DCs and the AC. This paper describes the technical implementation of DataSHIELD using a modified R statistical environment linked to an Opal database deployed behind the computer firewall of each DC. Analysis is controlled through a standard R environment at the AC. Based on this Opal/R implementation, DataSHIELD is currently used by the Healthy Obese Project and the Environmental Core Project (BioSHaRE-EU) for the federated analysis of 10 data sets across eight European countries, and this illustrates the opportunities and challenges presented by the DataSHIELD approach. DataSHIELD facilitates important research in settings where: (i) a co-analysis of individual-level data from several studies is scientifically necessary but governance restrictions prohibit the release or sharing of some of the required data, and/or render data access unacceptably slow; (ii) a research group (e.g. in a developing nation) is particularly vulnerable to loss of intellectual property-the researchers want to fully share the information held in their data with national and international collaborators, but do not wish to hand over the physical data themselves; and (iii) a data set is to be included in an individual-level co-analysis but the physical size of the data precludes direct transfer to a new site for analysis. © The Author 2014; all rights reserved. Published by Oxford University Press on behalf of the International Epidemiological Association.
Analysis of FARS data on state highways in Oklahoma.
DOT National Transportation Integrated Search
2012-11-01
Analysis of fatality automobile accident data can be challenging in rural areas where a relatively small number of : such accidents occurs on specific sections of highways. Combining crash data for 1998 to 2011 from the Fatality : Analysis Reporting ...
Karlsson, Alexander; Riveiro, Maria; Améen, Caroline; Åkesson, Karolina; Andersson, Christian X.; Sartipy, Peter; Synnergren, Jane
2017-01-01
The development of high-throughput biomolecular technologies has resulted in generation of vast omics data at an unprecedented rate. This is transforming biomedical research into a big data discipline, where the main challenges relate to the analysis and interpretation of data into new biological knowledge. The aim of this study was to develop a framework for biomedical big data analytics, and apply it for analyzing transcriptomics time series data from early differentiation of human pluripotent stem cells towards the mesoderm and cardiac lineages. To this end, transcriptome profiling by microarray was performed on differentiating human pluripotent stem cells sampled at eleven consecutive days. The gene expression data was analyzed using the five-stage analysis framework proposed in this study, including data preparation, exploratory data analysis, confirmatory analysis, biological knowledge discovery, and visualization of the results. Clustering analysis revealed several distinct expression profiles during differentiation. Genes with an early transient response were strongly related to embryonic- and mesendoderm development, for example CER1 and NODAL. Pluripotency genes, such as NANOG and SOX2, exhibited substantial downregulation shortly after onset of differentiation. Rapid induction of genes related to metal ion response, cardiac tissue development, and muscle contraction were observed around day five and six. Several transcription factors were identified as potential regulators of these processes, e.g. POU1F1, TCF4 and TBP for muscle contraction genes. Pathway analysis revealed temporal activity of several signaling pathways, for example the inhibition of WNT signaling on day 2 and its reactivation on day 4. This study provides a comprehensive characterization of biological events and key regulators of the early differentiation of human pluripotent stem cells towards the mesoderm and cardiac lineages. The proposed analysis framework can be used to structure data analysis in future research, both in stem cell differentiation, and more generally, in biomedical big data analytics. PMID:28654683
Semantic integration of gene expression analysis tools and data sources using software connectors
2013-01-01
Background The study and analysis of gene expression measurements is the primary focus of functional genomics. Once expression data is available, biologists are faced with the task of extracting (new) knowledge associated to the underlying biological phenomenon. Most often, in order to perform this task, biologists execute a number of analysis activities on the available gene expression dataset rather than a single analysis activity. The integration of heteregeneous tools and data sources to create an integrated analysis environment represents a challenging and error-prone task. Semantic integration enables the assignment of unambiguous meanings to data shared among different applications in an integrated environment, allowing the exchange of data in a semantically consistent and meaningful way. This work aims at developing an ontology-based methodology for the semantic integration of gene expression analysis tools and data sources. The proposed methodology relies on software connectors to support not only the access to heterogeneous data sources but also the definition of transformation rules on exchanged data. Results We have studied the different challenges involved in the integration of computer systems and the role software connectors play in this task. We have also studied a number of gene expression technologies, analysis tools and related ontologies in order to devise basic integration scenarios and propose a reference ontology for the gene expression domain. Then, we have defined a number of activities and associated guidelines to prescribe how the development of connectors should be carried out. Finally, we have applied the proposed methodology in the construction of three different integration scenarios involving the use of different tools for the analysis of different types of gene expression data. Conclusions The proposed methodology facilitates the development of connectors capable of semantically integrating different gene expression analysis tools and data sources. The methodology can be used in the development of connectors supporting both simple and nontrivial processing requirements, thus assuring accurate data exchange and information interpretation from exchanged data. PMID:24341380
Ulfenborg, Benjamin; Karlsson, Alexander; Riveiro, Maria; Améen, Caroline; Åkesson, Karolina; Andersson, Christian X; Sartipy, Peter; Synnergren, Jane
2017-01-01
The development of high-throughput biomolecular technologies has resulted in generation of vast omics data at an unprecedented rate. This is transforming biomedical research into a big data discipline, where the main challenges relate to the analysis and interpretation of data into new biological knowledge. The aim of this study was to develop a framework for biomedical big data analytics, and apply it for analyzing transcriptomics time series data from early differentiation of human pluripotent stem cells towards the mesoderm and cardiac lineages. To this end, transcriptome profiling by microarray was performed on differentiating human pluripotent stem cells sampled at eleven consecutive days. The gene expression data was analyzed using the five-stage analysis framework proposed in this study, including data preparation, exploratory data analysis, confirmatory analysis, biological knowledge discovery, and visualization of the results. Clustering analysis revealed several distinct expression profiles during differentiation. Genes with an early transient response were strongly related to embryonic- and mesendoderm development, for example CER1 and NODAL. Pluripotency genes, such as NANOG and SOX2, exhibited substantial downregulation shortly after onset of differentiation. Rapid induction of genes related to metal ion response, cardiac tissue development, and muscle contraction were observed around day five and six. Several transcription factors were identified as potential regulators of these processes, e.g. POU1F1, TCF4 and TBP for muscle contraction genes. Pathway analysis revealed temporal activity of several signaling pathways, for example the inhibition of WNT signaling on day 2 and its reactivation on day 4. This study provides a comprehensive characterization of biological events and key regulators of the early differentiation of human pluripotent stem cells towards the mesoderm and cardiac lineages. The proposed analysis framework can be used to structure data analysis in future research, both in stem cell differentiation, and more generally, in biomedical big data analytics.
Semantic integration of gene expression analysis tools and data sources using software connectors.
Miyazaki, Flávia A; Guardia, Gabriela D A; Vêncio, Ricardo Z N; de Farias, Cléver R G
2013-10-25
The study and analysis of gene expression measurements is the primary focus of functional genomics. Once expression data is available, biologists are faced with the task of extracting (new) knowledge associated to the underlying biological phenomenon. Most often, in order to perform this task, biologists execute a number of analysis activities on the available gene expression dataset rather than a single analysis activity. The integration of heterogeneous tools and data sources to create an integrated analysis environment represents a challenging and error-prone task. Semantic integration enables the assignment of unambiguous meanings to data shared among different applications in an integrated environment, allowing the exchange of data in a semantically consistent and meaningful way. This work aims at developing an ontology-based methodology for the semantic integration of gene expression analysis tools and data sources. The proposed methodology relies on software connectors to support not only the access to heterogeneous data sources but also the definition of transformation rules on exchanged data. We have studied the different challenges involved in the integration of computer systems and the role software connectors play in this task. We have also studied a number of gene expression technologies, analysis tools and related ontologies in order to devise basic integration scenarios and propose a reference ontology for the gene expression domain. Then, we have defined a number of activities and associated guidelines to prescribe how the development of connectors should be carried out. Finally, we have applied the proposed methodology in the construction of three different integration scenarios involving the use of different tools for the analysis of different types of gene expression data. The proposed methodology facilitates the development of connectors capable of semantically integrating different gene expression analysis tools and data sources. The methodology can be used in the development of connectors supporting both simple and nontrivial processing requirements, thus assuring accurate data exchange and information interpretation from exchanged data.
Database integration for investigative data visualization with the Temporal Analysis System
NASA Astrophysics Data System (ADS)
Barth, Stephen W.
1997-02-01
This paper describes an effort to provide mechanisms for integration of existing law enforcement databases with the temporal analysis system (TAS) -- an application for analysis and visualization of military intelligence data. Such integration mechanisms are essential for bringing advanced military intelligence data handling software applications to bear on the analysis of data used in criminal investigations. Our approach involved applying a software application for intelligence message handling to the problem of data base conversion. This application provides mechanisms for distributed processing and delivery of converted data records to an end-user application. It also provides a flexible graphic user interface for development and customization in the field.
Measuring hospital efficiency--comparing four European countries.
Mateus, Céu; Joaquim, Inês; Nunes, Carla
2015-02-01
Performing international comparisons on efficiency usually has two main drawbacks: the lack of comparability of data from different countries and the appropriateness and adequacy of data selected for efficiency measurement. With inpatient discharges for four countries, some of the problems of data comparability usually found in international comparisons were mitigated. The objectives are to assess and compare hospital efficiency levels within and between countries, using stochastic frontier analysis with both cross-sectional and panel data. Data from English (2005-2008), Portuguese (2002-2009), Spanish (2003-2009) and Slovenian (2005-2009) hospital discharges and characteristics are used. Weighted hospital discharges were considered as outputs while the number of employees, physicians, nurses and beds were selected as inputs of the production function. Stochastic frontier analysis using both cross-sectional and panel data were performed, as well as ordinary least squares (OLS) analysis. The adequacy of the data was assessed with Kolmogorov-Smirnov and Breusch-Pagan/Cook-Weisberg tests. Data available results were redundant to perform efficiency measurements using stochastic frontier analysis with cross-sectional data. The likelihood ratio test reveals that in cross-sectional data stochastic frontier analysis (SFA) is not statistically different from OLS in Portuguese data, while SFA and OLS estimates are statistically different for Spanish, Slovenian and English data. In the panel data, the inefficiency term is statistically different from 0 in the four countries in analysis, though for Portugal it is still close to 0. Panel data are preferred over cross-section analysis because results are more robust. For all countries except Slovenia, beds and employees are relevant inputs for the production process. © The Author 2015. Published by Oxford University Press on behalf of the European Public Health Association. All rights reserved.
[Big data analysis and evidence-based medicine: controversy or cooperation].
Chen, Xinzu; Hu, Jiankun
2016-01-01
The development of evidence-based medicince should be an important milestone from the empirical medicine to the evidence-driving modern medicine. With the outbreak in biomedical data, the rising big data analysis can efficiently solve exploratory questions or decision-making issues in biomedicine and healthcare activities. The current problem in China is that big data analysis is still not well conducted and applied to deal with problems such as clinical decision-making, public health policy, and should not be a debate whether big data analysis can replace evidence-based medicine or not. Therefore, we should clearly understand, no matter whether evidence-based medicine or big data analysis, the most critical infrastructure must be the substantial work in the design, constructure and collection of original database in China.
NASA Technical Reports Server (NTRS)
Wilmington, R. P.; Klute, Glenn K. (Editor); Carroll, Amy E. (Editor); Stuart, Mark A. (Editor); Poliner, Jeff (Editor); Rajulu, Sudhakar (Editor); Stanush, Julie (Editor)
1992-01-01
Kinematics, the study of motion exclusive of the influences of mass and force, is one of the primary methods used for the analysis of human biomechanical systems as well as other types of mechanical systems. The Anthropometry and Biomechanics Laboratory (ABL) in the Crew Interface Analysis section of the Man-Systems Division performs both human body kinematics as well as mechanical system kinematics using the Ariel Performance Analysis System (APAS). The APAS supports both analysis of analog signals (e.g. force plate data collection) as well as digitization and analysis of video data. The current evaluations address several methodology issues concerning the accuracy of the kinematic data collection and analysis used in the ABL. This document describes a series of evaluations performed to gain quantitative data pertaining to position and constant angular velocity movements under several operating conditions. Two-dimensional as well as three-dimensional data collection and analyses were completed in a controlled laboratory environment using typical hardware setups. In addition, an evaluation was performed to evaluate the accuracy impact due to a single axis camera offset. Segment length and positional data exhibited errors within 3 percent when using three-dimensional analysis and yielded errors within 8 percent through two-dimensional analysis (Direct Linear Software). Peak angular velocities displayed errors within 6 percent through three-dimensional analyses and exhibited errors of 12 percent when using two-dimensional analysis (Direct Linear Software). The specific results from this series of evaluations and their impacts on the methodology issues of kinematic data collection and analyses are presented in detail. The accuracy levels observed in these evaluations are also presented.
Concepts of formal concept analysis
NASA Astrophysics Data System (ADS)
Žáček, Martin; Homola, Dan; Miarka, Rostislav
2017-07-01
The aim of this article is apply of Formal Concept Analysis on concept of world. Formal concept analysis (FCA) as a methodology of data analysis, information management and knowledge representation has potential to be applied to a verity of linguistic problems. FCA is mathematical theory for concepts and concept hierarchies that reflects an understanding of concept. Formal concept analysis explicitly formalizes extension and intension of a concept, their mutual relationships. A distinguishing feature of FCA is an inherent integration of three components of conceptual processing of data and knowledge, namely, the discovery and reasoning with concepts in data, discovery and reasoning with dependencies in data, and visualization of data, concepts, and dependencies with folding/unfolding capabilities.
Jesse, Stephen; Kalinin, Sergei V
2009-02-25
An approach for the analysis of multi-dimensional, spectroscopic-imaging data based on principal component analysis (PCA) is explored. PCA selects and ranks relevant response components based on variance within the data. It is shown that for examples with small relative variations between spectra, the first few PCA components closely coincide with results obtained using model fitting, and this is achieved at rates approximately four orders of magnitude faster. For cases with strong response variations, PCA allows an effective approach to rapidly process, de-noise, and compress data. The prospects for PCA combined with correlation function analysis of component maps as a universal tool for data analysis and representation in microscopy are discussed.
NASA Astrophysics Data System (ADS)
Arnaud, Keith A.; Smith, R. K.; Siemiginowska, A.; Edgar, R. J.; Grant, C. E.; Kuntz, K. D.; Schwartz, D. A.
2011-09-01
This poster advertises a book to be published in September 2011 by Cambridge University Press. Written for graduate students, professional astronomers and researchers who want to start working in this field, this book is a practical guide to x-ray astronomy. The handbook begins with x-ray optics, basic detector physics and CCDs, before focussing on data analysis. It introduces the reduction and calibration of x-ray data, scientific analysis, archives, statistical issues and the particular problems of highly extended sources. The book describes the main hardware used in x-ray astronomy, emphasizing the implications for data analysis. The concepts behind common x-ray astronomy data analysis software are explained. The appendices present reference material often required during data analysis.
Integrative Analysis of “-Omics” Data Using Penalty Functions
Zhao, Qing; Shi, Xingjie; Huang, Jian; Liu, Jin; Li, Yang; Ma, Shuangge
2014-01-01
In the analysis of omics data, integrative analysis provides an effective way of pooling information across multiple datasets or multiple correlated responses, and can be more effective than single-dataset (response) analysis. Multiple families of integrative analysis methods have been proposed in the literature. The current review focuses on the penalization methods. Special attention is paid to sparse meta-analysis methods that pool summary statistics across datasets, and integrative analysis methods that pool raw data across datasets. We discuss their formulation and rationale. Beyond “standard” penalized selection, we also review contrasted penalization and Laplacian penalization which accommodate finer data structures. The computational aspects, including computational algorithms and tuning parameter selection, are examined. This review concludes with possible limitations and extensions. PMID:25691921
2016-09-01
HEALTHCARE’S QUANTIFIED-SELF DATA: A COMPARATIVE ANALYSIS VERSUS PERSONAL FINANCIAL ACCOUNT AGGREGATORS BASED ON PORTER’S FIVE FORCES FRAMEWORK FOR...TITLE AND SUBTITLE SECURING HEALTHCARE’S QUANTIFIED-SELF DATA: A COMPARATIVE ANALYSIS VERSUS PERSONAL FINANCIAL ACCOUNT AGGREGATORS BASED ON...Distribution is unlimited. SECURING HEALTHCARE’S QUANTIFIED-SELF DATA: A COMPARATIVE ANALYSIS VERSUS PERSONAL FINANCIAL ACCOUNT AGGREGATORS BASED ON
Applying Dataflow Architecture and Visualization Tools to In Vitro Pharmacology Data Automation.
Pechter, David; Xu, Serena; Kurtz, Marc; Williams, Steven; Sonatore, Lisa; Villafania, Artjohn; Agrawal, Sony
2016-12-01
The pace and complexity of modern drug discovery places ever-increasing demands on scientists for data analysis and interpretation. Data flow programming and modern visualization tools address these demands directly. Three different requirements-one for allosteric modulator analysis, one for a specialized clotting analysis, and one for enzyme global progress curve analysis-are reviewed, and their execution in a combined data flow/visualization environment is outlined. © 2016 Society for Laboratory Automation and Screening.
A Mobile Computing Solution for Collecting Functional Analysis Data on a Pocket PC
ERIC Educational Resources Information Center
Jackson, James; Dixon, Mark R.
2007-01-01
The present paper provides a task analysis for creating a computerized data system using a Pocket PC and Microsoft Visual Basic. With Visual Basic software and any handheld device running the Windows MOBLE operating system, this task analysis will allow behavior analysts to program and customize their own functional analysis data-collection…
MeDICi Software Superglue for Data Analysis Pipelines
Ian Gorton
2017-12-09
The Middleware for Data-Intensive Computing (MeDICi) Integration Framework is an integrated middleware platform developed to solve data analysis and processing needs of scientists across many domains. MeDICi is scalable, easily modified, and robust to multiple languages, protocols, and hardware platforms, and in use today by PNNL scientists for bioinformatics, power grid failure analysis, and text analysis.
ANALYSIS/PLOT: a graphics package for use with the SORT/ANALYSIS data bases
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sady, C.A.
1983-08-01
This report describes a graphics package that is used with the SORT/ANALYSIS data bases. The data listed by the SORT/ANALYSIS program can be presented in pie, bar, line, or Gantt chart form. Instructions for the use of the plotting program and descriptions of the subroutines are given in the report.
Air Traffic Complexity Measurement Environment (ACME): Software User's Guide
NASA Technical Reports Server (NTRS)
1996-01-01
A user's guide for the Air Traffic Complexity Measurement Environment (ACME) software is presented. The ACME consists of two major components, a complexity analysis tool and user interface. The Complexity Analysis Tool (CAT) analyzes complexity off-line, producing data files which may be examined interactively via the Complexity Data Analysis Tool (CDAT). The Complexity Analysis Tool is composed of three independently executing processes that communicate via PVM (Parallel Virtual Machine) and Unix sockets. The Runtime Data Management and Control process (RUNDMC) extracts flight plan and track information from a SAR input file, and sends the information to GARP (Generate Aircraft Routes Process) and CAT (Complexity Analysis Task). GARP in turn generates aircraft trajectories, which are utilized by CAT to calculate sector complexity. CAT writes flight plan, track and complexity data to an output file, which can be examined interactively. The Complexity Data Analysis Tool (CDAT) provides an interactive graphic environment for examining the complexity data produced by the Complexity Analysis Tool (CAT). CDAT can also play back track data extracted from System Analysis Recording (SAR) tapes. The CDAT user interface consists of a primary window, a controls window, and miscellaneous pop-ups. Aircraft track and position data is displayed in the main viewing area of the primary window. The controls window contains miscellaneous control and display items. Complexity data is displayed in pop-up windows. CDAT plays back sector complexity and aircraft track and position data as a function of time. Controls are provided to start and stop playback, adjust the playback rate, and reposition the display to a specified time.
Rigbolt, Kristoffer T G; Vanselow, Jens T; Blagoev, Blagoy
2011-08-01
Recent technological advances have made it possible to identify and quantify thousands of proteins in a single proteomics experiment. As a result of these developments, the analysis of data has become the bottleneck of proteomics experiment. To provide the proteomics community with a user-friendly platform for comprehensive analysis, inspection and visualization of quantitative proteomics data we developed the Graphical Proteomics Data Explorer (GProX)(1). The program requires no special bioinformatics training, as all functions of GProX are accessible within its graphical user-friendly interface which will be intuitive to most users. Basic features facilitate the uncomplicated management and organization of large data sets and complex experimental setups as well as the inspection and graphical plotting of quantitative data. These are complemented by readily available high-level analysis options such as database querying, clustering based on abundance ratios, feature enrichment tests for e.g. GO terms and pathway analysis tools. A number of plotting options for visualization of quantitative proteomics data is available and most analysis functions in GProX create customizable high quality graphical displays in both vector and bitmap formats. The generic import requirements allow data originating from essentially all mass spectrometry platforms, quantitation strategies and software to be analyzed in the program. GProX represents a powerful approach to proteomics data analysis providing proteomics experimenters with a toolbox for bioinformatics analysis of quantitative proteomics data. The program is released as open-source and can be freely downloaded from the project webpage at http://gprox.sourceforge.net.
Big Data Analysis Framework for Healthcare and Social Sectors in Korea
Song, Tae-Min
2015-01-01
Objectives We reviewed applications of big data analysis of healthcare and social services in developed countries, and subsequently devised a framework for such an analysis in Korea. Methods We reviewed the status of implementing big data analysis of health care and social services in developed countries, and strategies used by the Ministry of Health and Welfare of Korea (Government 3.0). We formulated a conceptual framework of big data in the healthcare and social service sectors at the national level. As a specific case, we designed a process and method of social big data analysis on suicide buzz. Results Developed countries (e.g., the United States, the UK, Singapore, Australia, and even OECD and EU) are emphasizing the potential of big data, and using it as a tool to solve their long-standing problems. Big data strategies for the healthcare and social service sectors were formulated based on an ICT-based policy of current government and the strategic goals of the Ministry of Health and Welfare. We suggest a framework of big data analysis in the healthcare and welfare service sectors separately and assigned them tentative names: 'health risk analysis center' and 'integrated social welfare service network'. A framework of social big data analysis is presented by applying it to the prevention and proactive detection of suicide in Korea. Conclusions There are some concerns with the utilization of big data in the healthcare and social welfare sectors. Thus, research on these issues must be conducted so that sophisticated and practical solutions can be reached. PMID:25705552
Rigbolt, Kristoffer T. G.; Vanselow, Jens T.; Blagoev, Blagoy
2011-01-01
Recent technological advances have made it possible to identify and quantify thousands of proteins in a single proteomics experiment. As a result of these developments, the analysis of data has become the bottleneck of proteomics experiment. To provide the proteomics community with a user-friendly platform for comprehensive analysis, inspection and visualization of quantitative proteomics data we developed the Graphical Proteomics Data Explorer (GProX)1. The program requires no special bioinformatics training, as all functions of GProX are accessible within its graphical user-friendly interface which will be intuitive to most users. Basic features facilitate the uncomplicated management and organization of large data sets and complex experimental setups as well as the inspection and graphical plotting of quantitative data. These are complemented by readily available high-level analysis options such as database querying, clustering based on abundance ratios, feature enrichment tests for e.g. GO terms and pathway analysis tools. A number of plotting options for visualization of quantitative proteomics data is available and most analysis functions in GProX create customizable high quality graphical displays in both vector and bitmap formats. The generic import requirements allow data originating from essentially all mass spectrometry platforms, quantitation strategies and software to be analyzed in the program. GProX represents a powerful approach to proteomics data analysis providing proteomics experimenters with a toolbox for bioinformatics analysis of quantitative proteomics data. The program is released as open-source and can be freely downloaded from the project webpage at http://gprox.sourceforge.net. PMID:21602510
DuVernet, Amy M; Dierdorff, Erich C; Wilson, Mark A
2015-09-01
Work analysis is fundamental to designing effective human resource systems. The current investigation extends previous research by identifying the differential effects of common design decisions, purposes, and organizational contexts on the data generated by work analyses. The effects of 19 distinct factors that span choices of descriptor, collection method, rating scale, and data source, as well as project purpose and organizational features, are explored. Meta-analytic results cumulated from 205 articles indicate that many of these variables hold significant consequences for work analysis data. Factors pertaining to descriptor choice, collection method, rating scale, and the purpose for conducting the work analysis each showed strong associations with work analysis data. The source of the work analysis information and organizational context in which it was conducted displayed fewer relationships. Findings can be used to inform choices work analysts make about methodology and postcollection evaluations of work analysis information. (c) 2015 APA, all rights reserved).
Chen, Yi-An; Tripathi, Lokesh P; Mizuguchi, Kenji
2016-01-01
Data analysis is one of the most critical and challenging steps in drug discovery and disease biology. A user-friendly resource to visualize and analyse high-throughput data provides a powerful medium for both experimental and computational biologists to understand vastly different biological data types and obtain a concise, simplified and meaningful output for better knowledge discovery. We have previously developed TargetMine, an integrated data warehouse optimized for target prioritization. Here we describe how upgraded and newly modelled data types in TargetMine can now survey the wider biological and chemical data space, relevant to drug discovery and development. To enhance the scope of TargetMine from target prioritization to broad-based knowledge discovery, we have also developed a new auxiliary toolkit to assist with data analysis and visualization in TargetMine. This toolkit features interactive data analysis tools to query and analyse the biological data compiled within the TargetMine data warehouse. The enhanced system enables users to discover new hypotheses interactively by performing complicated searches with no programming and obtaining the results in an easy to comprehend output format. Database URL: http://targetmine.mizuguchilab.org. © The Author(s) 2016. Published by Oxford University Press.
Chen, Yi-An; Tripathi, Lokesh P.; Mizuguchi, Kenji
2016-01-01
Data analysis is one of the most critical and challenging steps in drug discovery and disease biology. A user-friendly resource to visualize and analyse high-throughput data provides a powerful medium for both experimental and computational biologists to understand vastly different biological data types and obtain a concise, simplified and meaningful output for better knowledge discovery. We have previously developed TargetMine, an integrated data warehouse optimized for target prioritization. Here we describe how upgraded and newly modelled data types in TargetMine can now survey the wider biological and chemical data space, relevant to drug discovery and development. To enhance the scope of TargetMine from target prioritization to broad-based knowledge discovery, we have also developed a new auxiliary toolkit to assist with data analysis and visualization in TargetMine. This toolkit features interactive data analysis tools to query and analyse the biological data compiled within the TargetMine data warehouse. The enhanced system enables users to discover new hypotheses interactively by performing complicated searches with no programming and obtaining the results in an easy to comprehend output format. Database URL: http://targetmine.mizuguchilab.org PMID:26989145
McKenna, Thomas M; Bawa, Gagandeep; Kumar, Kamal; Reifman, Jaques
2007-04-01
The physiology analysis system (PAS) was developed as a resource to support the efficient warehousing, management, and analysis of physiology data, particularly, continuous time-series data that may be extensive, of variable quality, and distributed across many files. The PAS incorporates time-series data collected by many types of data-acquisition devices, and it is designed to free users from data management burdens. This Web-based system allows both discrete (attribute) and time-series (ordered) data to be manipulated, visualized, and analyzed via a client's Web browser. All processes occur on a server, so that the client does not have to download data or any application programs, and the PAS is independent of the client's computer operating system. The PAS contains a library of functions, written in different computer languages that the client can add to and use to perform specific data operations. Functions from the library are sequentially inserted into a function chain-based logical structure to construct sophisticated data operators from simple function building blocks, affording ad hoc query and analysis of time-series data. These features support advanced mining of physiology data.
Analysis of ChIP-seq Data in R/Bioconductor.
de Santiago, Ines; Carroll, Thomas
2018-01-01
The development of novel high-throughput sequencing methods for ChIP (chromatin immunoprecipitation) has provided a very powerful tool to study gene regulation in multiple conditions at unprecedented resolution and scale. Proactive quality-control and appropriate data analysis techniques are of critical importance to extract the most meaningful results from the data. Over the last years, an array of R/Bioconductor tools has been developed allowing researchers to process and analyze ChIP-seq data. This chapter provides an overview of the methods available to analyze ChIP-seq data based primarily on software packages from the open-source Bioconductor project. Protocols described in this chapter cover basic steps including data alignment, peak calling, quality control and data visualization, as well as more complex methods such as the identification of differentially bound regions and functional analyses to annotate regulatory regions. The steps in the data analysis process were demonstrated on publicly available data sets and will serve as a demonstration of the computational procedures routinely used for the analysis of ChIP-seq data in R/Bioconductor, from which readers can construct their own analysis pipelines.
Barton, G; Abbott, J; Chiba, N; Huang, DW; Huang, Y; Krznaric, M; Mack-Smith, J; Saleem, A; Sherman, BT; Tiwari, B; Tomlinson, C; Aitman, T; Darlington, J; Game, L; Sternberg, MJE; Butcher, SA
2008-01-01
Background Microarray experimentation requires the application of complex analysis methods as well as the use of non-trivial computer technologies to manage the resultant large data sets. This, together with the proliferation of tools and techniques for microarray data analysis, makes it very challenging for a laboratory scientist to keep up-to-date with the latest developments in this field. Our aim was to develop a distributed e-support system for microarray data analysis and management. Results EMAAS (Extensible MicroArray Analysis System) is a multi-user rich internet application (RIA) providing simple, robust access to up-to-date resources for microarray data storage and analysis, combined with integrated tools to optimise real time user support and training. The system leverages the power of distributed computing to perform microarray analyses, and provides seamless access to resources located at various remote facilities. The EMAAS framework allows users to import microarray data from several sources to an underlying database, to pre-process, quality assess and analyse the data, to perform functional analyses, and to track data analysis steps, all through a single easy to use web portal. This interface offers distance support to users both in the form of video tutorials and via live screen feeds using the web conferencing tool EVO. A number of analysis packages, including R-Bioconductor and Affymetrix Power Tools have been integrated on the server side and are available programmatically through the Postgres-PLR library or on grid compute clusters. Integrated distributed resources include the functional annotation tool DAVID, GeneCards and the microarray data repositories GEO, CELSIUS and MiMiR. EMAAS currently supports analysis of Affymetrix 3' and Exon expression arrays, and the system is extensible to cater for other microarray and transcriptomic platforms. Conclusion EMAAS enables users to track and perform microarray data management and analysis tasks through a single easy-to-use web application. The system architecture is flexible and scalable to allow new array types, analysis algorithms and tools to be added with relative ease and to cope with large increases in data volume. PMID:19032776
SciDAC-Data, A Project to Enabling Data Driven Modeling of Exascale Computing
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mubarak, M.; Ding, P.; Aliaga, L.
The SciDAC-Data project is a DOE funded initiative to analyze and exploit two decades of information and analytics that have been collected by the Fermilab Data Center on the organization, movement, and consumption of High Energy Physics data. The project will analyze the analysis patterns and data organization that have been used by the NOvA, MicroBooNE, MINERvA and other experiments, to develop realistic models of HEP analysis workflows and data processing. The SciDAC-Data project aims to provide both realistic input vectors and corresponding output data that can be used to optimize and validate simulations of HEP analysis. These simulations aremore » designed to address questions of data handling, cache optimization and workflow structures that are the prerequisites for modern HEP analysis chains to be mapped and optimized to run on the next generation of leadership class exascale computing facilities. We will address the use of the SciDAC-Data distributions acquired from Fermilab Data Center’s analysis workflows and corresponding to around 71,000 HEP jobs, as the input to detailed queuing simulations that model the expected data consumption and caching behaviors of the work running in HPC environments. In particular we describe in detail how the Sequential Access via Metadata (SAM) data handling system in combination with the dCache/Enstore based data archive facilities have been analyzed to develop the radically different models of the analysis of HEP data. We present how the simulation may be used to analyze the impact of design choices in archive facilities.« less
A Proposed Data Fusion Architecture for Micro-Zone Analysis and Data Mining
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kevin McCarthy; Milos Manic
Data Fusion requires the ability to combine or “fuse” date from multiple data sources. Time Series Analysis is a data mining technique used to predict future values from a data set based upon past values. Unlike other data mining techniques, however, Time Series places special emphasis on periodicity and how seasonal and other time-based factors tend to affect trends over time. One of the difficulties encountered in developing generic time series techniques is the wide variability of the data sets available for analysis. This presents challenges all the way from the data gathering stage to results presentation. This paper presentsmore » an architecture designed and used to facilitate the collection of disparate data sets well suited to Time Series analysis as well as other predictive data mining techniques. Results show this architecture provides a flexible, dynamic framework for the capture and storage of a myriad of dissimilar data sets and can serve as a foundation from which to build a complete data fusion architecture.« less
The Mock LISA Data Challenges: History, Status, Prospects
NASA Technical Reports Server (NTRS)
Vallisneri, Michele; Babak, Stas; Baker, John; Benacquista, Matt; Cornish, Neil; Crowder, Jeff; Cutler, Curt; Larson, Shane; Littenberg, Tyson; Porter, Edward;
2007-01-01
This slide presentation reviews the importance for the Mock LISA Data Challenges (MLDC). Laser Interferometer Space Antenna (LISA) is a gravitational wave (GW) observatory that will return data such that data analysis is integral to the measurement concept. Further rationale of the MLDC are to kickstart the development of a LISA data-analysis computational infrastructure, and to encourage, track, and compare progress in LISA data-analysis development in the open community. The MLDCs is a coordinated, voluntary effort in GW community, that will periodically issue datasets with synthetic noise and GW signals from sources of undisclosed parameters; increasing difficulty. The challenge participants return parameter estimates and descriptions of search methods. Some of the challenges and the resultant entries are reviewed. The aim is to show that LISA data analysis is possible, and to develop new techniques, using multiple international teams for the development of LISA core analysis tools
Analog computation of auto and cross-correlation functions
NASA Technical Reports Server (NTRS)
1974-01-01
For analysis of the data obtained from the cross beam systems it was deemed desirable to compute the auto- and cross-correlation functions by both digital and analog methods to provide a cross-check of the analysis methods and an indication as to which of the two methods would be most suitable for routine use in the analysis of such data. It is the purpose of this appendix to provide a concise description of the equipment and procedures used for the electronic analog analysis of the cross beam data. A block diagram showing the signal processing and computation set-up used for most of the analog data analysis is provided. The data obtained at the field test sites were recorded on magnetic tape using wide-band FM recording techniques. The data as recorded were band-pass filtered by electronic signal processing in the data acquisition systems.
Satellite image analysis using neural networks
NASA Technical Reports Server (NTRS)
Sheldon, Roger A.
1990-01-01
The tremendous backlog of unanalyzed satellite data necessitates the development of improved methods for data cataloging and analysis. Ford Aerospace has developed an image analysis system, SIANN (Satellite Image Analysis using Neural Networks) that integrates the technologies necessary to satisfy NASA's science data analysis requirements for the next generation of satellites. SIANN will enable scientists to train a neural network to recognize image data containing scenes of interest and then rapidly search data archives for all such images. The approach combines conventional image processing technology with recent advances in neural networks to provide improved classification capabilities. SIANN allows users to proceed through a four step process of image classification: filtering and enhancement, creation of neural network training data via application of feature extraction algorithms, configuring and training a neural network model, and classification of images by application of the trained neural network. A prototype experimentation testbed was completed and applied to climatological data.
Plamondon, Katrina M; Bottorff, Joan L; Cole, Donald C
2015-11-01
Deliberative dialogue (DD) is a knowledge translation strategy that can serve to generate rich data and bridge health research with action. An intriguing alternative to other modes of generating data, the purposeful and evidence-informed conversations characteristic of DD generate data inclusive of collective interpretations. These data are thus dialogic, presenting complex challenges for qualitative analysis. In this article, we discuss the nature of data generated through DD, orienting ourselves toward a theoretically grounded approach to analysis. We offer an integrated framework for analysis, balancing analytical strategies of categorizing and connecting with the use of empathetic and suspicious interpretive lenses. In this framework, data generation and analysis occur in concert, alongside engaging participants and synthesizing evidence. An example of application is provided, demonstrating nuances of the framework. We conclude with reflections on the strengths and limitations of the framework, suggesting how it may be relevant in other qualitative health approaches. © The Author(s) 2015.
Analyzing How We Do Analysis and Consume Data, Results from the SciDAC-Data Project
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ding, P.; Aliaga, L.; Mubarak, M.
One of the main goals of the Dept. of Energy funded SciDAC-Data project is to analyze the more than 410,000 high energy physics datasets that have been collected, generated and defined over the past two decades by experiments using the Fermilab storage facilities. These datasets have been used as the input to over 5.6 million recorded analysis projects, for which detailed analytics have been gathered. The analytics and meta information for these datasets and analysis projects are being combined with knowledge of their part of the HEP analysis chains for major experiments to understand how modern computing and data deliverymore » is being used. We present the first results of this project, which examine in detail how the CDF, D0, NOvA, MINERvA and MicroBooNE experiments have organized, classified and consumed petascale datasets to produce their physics results. The results include analysis of the correlations in dataset/file overlap, data usage patterns, data popularity, dataset dependency and temporary dataset consumption. The results provide critical insight into how workflows and data delivery schemes can be combined with different caching strategies to more efficiently perform the work required to mine these large HEP data volumes and to understand the physics analysis requirements for the next generation of HEP computing facilities. In particular we present a detailed analysis of the NOvA data organization and consumption model corresponding to their first and second oscillation results (2014-2016) and the first look at the analysis of the Tevatron Run II experiments. We present statistical distributions for the characterization of these data and data driven models describing their consumption« less
Analyzing how we do Analysis and Consume Data, Results from the SciDAC-Data Project
NASA Astrophysics Data System (ADS)
Ding, P.; Aliaga, L.; Mubarak, M.; Tsaris, A.; Norman, A.; Lyon, A.; Ross, R.
2017-10-01
One of the main goals of the Dept. of Energy funded SciDAC-Data project is to analyze the more than 410,000 high energy physics datasets that have been collected, generated and defined over the past two decades by experiments using the Fermilab storage facilities. These datasets have been used as the input to over 5.6 million recorded analysis projects, for which detailed analytics have been gathered. The analytics and meta information for these datasets and analysis projects are being combined with knowledge of their part of the HEP analysis chains for major experiments to understand how modern computing and data delivery is being used. We present the first results of this project, which examine in detail how the CDF, D0, NOvA, MINERvA and MicroBooNE experiments have organized, classified and consumed petascale datasets to produce their physics results. The results include analysis of the correlations in dataset/file overlap, data usage patterns, data popularity, dataset dependency and temporary dataset consumption. The results provide critical insight into how workflows and data delivery schemes can be combined with different caching strategies to more efficiently perform the work required to mine these large HEP data volumes and to understand the physics analysis requirements for the next generation of HEP computing facilities. In particular we present a detailed analysis of the NOvA data organization and consumption model corresponding to their first and second oscillation results (2014-2016) and the first look at the analysis of the Tevatron Run II experiments. We present statistical distributions for the characterization of these data and data driven models describing their consumption.
1988-03-31
radar operation and data - collection activities, a large data -analysis effort has been under way in support of automatic wind-shear detection algorithm ...REDUCTION AND ALGORITHM DEVELOPMENT 49 A. General-Purpose Software 49 B. Concurrent Computer Systems 49 C. Sun Workstations 51 D. Radar Data Analysis 52...1. Algorithm Verification 52 2. Other Studies 53 3. Translations 54 4. Outside Distributions 55 E. Mesonet/LLWAS Data Analysis 55 1. 1985 Data 55 2
Flexible data-management system
NASA Technical Reports Server (NTRS)
Pelouch, J. J., Jr.
1977-01-01
Combined ASRDI Data-Management and Analysis Technique (CADMAT) is system of computer programs and procedures that can be used to conduct data-management tasks. System was developed specifically for use by scientists and engineers who are confronted with management and analysis of large quantities of data organized into records of events and parametric fields. CADMAT is particularly useful when data are continually accumulated, such as when the need of retrieval and analysis is ongoing.
NASA Astrophysics Data System (ADS)
Petry, Dirk
2018-03-01
CASA is the standard science data analysis package for ALMA and VLA but it can also be used for the analysis of data from other observatories. In this talk, I will give an overview of the structure and features of CASA, who develops it, and the present status and plans, and then show typical analysis workflows for ALMA data with special emphasis on the handling of single dish data and its combination with interferometric data.
ERIC Educational Resources Information Center
Yamaguchi, Yusuke; Sakamoto, Wataru; Goto, Masashi; Staessen, Jan A.; Wang, Jiguang; Gueyffier, Francois; Riley, Richard D.
2014-01-01
When some trials provide individual patient data (IPD) and the others provide only aggregate data (AD), meta-analysis methods for combining IPD and AD are required. We propose a method that reconstructs the missing IPD for AD trials by a Bayesian sampling procedure and then applies an IPD meta-analysis model to the mixture of simulated IPD and…
Knowledge-Based Decision Support in Department of Defense Acquisitions
2010-09-01
from the analysis framework developed by Miles and Huberman (1994). The framework describes the major phases of data analysis as data reduction, data... Miles and Huberman , 1994) Survey Effort For this research effort, the survey data was obtained from SAF/ACPO (Air Force Acquisition Chief...rank O-6/GS-15 or above. Data Reduction and Content Analysis Within the Miles and Huberman (1994) framework, the researcher used Microsoft
NASA Technical Reports Server (NTRS)
Hoffer, R. M. (Principal Investigator)
1980-01-01
The column normalizing technique was used to adjust the data for variations in the amplitude of the signal due to look angle effects with respect to solar zenith angle along the scan lines (i.e., across columns). Evaluation of the data set containing the geometric and radiometric adjustments, indicates that the data set should be satisfactory for further processing and analysis. Software was developed for degrading the spatial resolution of the aircraft data to produce a total of four data sets for further analysis. The quality of LANDSAT 2 CCT data for the test site is good for channels four, five, and six. Channel seven was not present on the tape. The data received were reformatted and analysis of the test site area was initiated.
msBiodat analysis tool, big data analysis for high-throughput experiments.
Muñoz-Torres, Pau M; Rokć, Filip; Belužic, Robert; Grbeša, Ivana; Vugrek, Oliver
2016-01-01
Mass spectrometry (MS) are a group of a high-throughput techniques used to increase knowledge about biomolecules. They produce a large amount of data which is presented as a list of hundreds or thousands of proteins. Filtering those data efficiently is the first step for extracting biologically relevant information. The filtering may increase interest by merging previous data with the data obtained from public databases, resulting in an accurate list of proteins which meet the predetermined conditions. In this article we present msBiodat Analysis Tool, a web-based application thought to approach proteomics to the big data analysis. With this tool, researchers can easily select the most relevant information from their MS experiments using an easy-to-use web interface. An interesting feature of msBiodat analysis tool is the possibility of selecting proteins by its annotation on Gene Ontology using its Gene Id, ensembl or UniProt codes. The msBiodat analysis tool is a web-based application that allows researchers with any programming experience to deal with efficient database querying advantages. Its versatility and user-friendly interface makes easy to perform fast and accurate data screening by using complex queries. Once the analysis is finished, the result is delivered by e-mail. msBiodat analysis tool is freely available at http://msbiodata.irb.hr.
Challenges in combining different data sets during analysis when using grounded theory.
Rintala, Tuula-Maria; Paavilainen, Eija; Astedt-Kurki, Päivi
2014-05-01
To describe the challenges in combining two data sets during grounded theory analysis. The use of grounded theory in nursing research is common. It is a suitable method for studying human action and interaction. It is recommended that many alternative sources of data are collected to create as rich a dataset as possible. Data from interviews with people with diabetes (n=19) and their family members (n=19). Combining two data sets. When using grounded theory, there are numerous challenges in collecting and managing data, especially for the novice researcher. One challenge is to combine different data sets during the analysis. There are many methodological textbooks about grounded theory but there is little written in the literature about combining different data sets. Discussion is needed on the management of data and the challenges of grounded theory. This article provides a means for combining different data sets in the grounded theory analysis process.
Visual interface for space and terrestrial analysis
NASA Technical Reports Server (NTRS)
Dombrowski, Edmund G.; Williams, Jason R.; George, Arthur A.; Heckathorn, Harry M.; Snyder, William A.
1995-01-01
The management of large geophysical and celestial data bases is now, more than ever, the most critical path to timely data analysis. With today's large volume data sets from multiple satellite missions, analysts face the task of defining useful data bases from which data and metadata (information about data) can be extracted readily in a meaningful way. Visualization, following an object-oriented design, is a fundamental method of organizing and handling data. Humans, by nature, easily accept pictorial representations of data. Therefore graphically oriented user interfaces are appealing, as long as they remain simple to produce and use. The Visual Interface for Space and Terrestrial Analysis (VISTA) system, currently under development at the Naval Research Laboratory's Backgrounds Data Center (BDC), has been designed with these goals in mind. Its graphical user interface (GUI) allows the user to perform queries, visualization, and analysis of atmospheric and celestial backgrounds data.
Data Streaming for Metabolomics: Accelerating Data Processing and Analysis from Days to Minutes
2016-01-01
The speed and throughput of analytical platforms has been a driving force in recent years in the “omics” technologies and while great strides have been accomplished in both chromatography and mass spectrometry, data analysis times have not benefited at the same pace. Even though personal computers have become more powerful, data transfer times still represent a bottleneck in data processing because of the increasingly complex data files and studies with a greater number of samples. To meet the demand of analyzing hundreds to thousands of samples within a given experiment, we have developed a data streaming platform, XCMS Stream, which capitalizes on the acquisition time to compress and stream recently acquired data files to data processing servers, mimicking just-in-time production strategies from the manufacturing industry. The utility of this XCMS Online-based technology is demonstrated here in the analysis of T cell metabolism and other large-scale metabolomic studies. A large scale example on a 1000 sample data set demonstrated a 10 000-fold time savings, reducing data analysis time from days to minutes. Further, XCMS Stream has the capability to increase the efficiency of downstream biochemical dependent data acquisition (BDDA) analysis by initiating data conversion and data processing on subsets of data acquired, expanding its application beyond data transfer to smart preliminary data decision-making prior to full acquisition. PMID:27983788
Data streaming for metabolomics: Accelerating data processing and analysis from days to minutes
Montenegro-Burke, J. Rafael; Aisporna, Aries E.; Benton, H. Paul; ...
2016-12-16
The speed and throughput of analytical platforms has been a driving force in recent years in the “omics” technologies and while great strides have been accomplished in both chromatography and mass spectrometry, data analysis times have not benefited at the same pace. Even though personal computers have become more powerful, data transfer times still represent a bottleneck in data processing because of the increasingly complex data files and studies with a greater number of samples. To meet the demand of analyzing hundreds to thousands of samples within a given experiment, we have developed a data streaming platform, XCMS Stream, whichmore » capitalizes on the acquisition time to compress and stream recently acquired data files to data processing servers, mimicking just-in-time production strategies from the manufacturing industry. The utility of this XCMS Online-based technology is demonstrated here in the analysis of T cell metabolism and other large-scale metabolomic studies. A large scale example on a 1000 sample data set demonstrated a 10 000-fold time savings, reducing data analysis time from days to minutes. Here, XCMS Stream has the capability to increase the efficiency of downstream biochemical dependent data acquisition (BDDA) analysis by initiating data conversion and data processing on subsets of data acquired, expanding its application beyond data transfer to smart preliminary data decision-making prior to full acquisition.« less
Data Streaming for Metabolomics: Accelerating Data Processing and Analysis from Days to Minutes.
Montenegro-Burke, J Rafael; Aisporna, Aries E; Benton, H Paul; Rinehart, Duane; Fang, Mingliang; Huan, Tao; Warth, Benedikt; Forsberg, Erica; Abe, Brian T; Ivanisevic, Julijana; Wolan, Dennis W; Teyton, Luc; Lairson, Luke; Siuzdak, Gary
2017-01-17
The speed and throughput of analytical platforms has been a driving force in recent years in the "omics" technologies and while great strides have been accomplished in both chromatography and mass spectrometry, data analysis times have not benefited at the same pace. Even though personal computers have become more powerful, data transfer times still represent a bottleneck in data processing because of the increasingly complex data files and studies with a greater number of samples. To meet the demand of analyzing hundreds to thousands of samples within a given experiment, we have developed a data streaming platform, XCMS Stream, which capitalizes on the acquisition time to compress and stream recently acquired data files to data processing servers, mimicking just-in-time production strategies from the manufacturing industry. The utility of this XCMS Online-based technology is demonstrated here in the analysis of T cell metabolism and other large-scale metabolomic studies. A large scale example on a 1000 sample data set demonstrated a 10 000-fold time savings, reducing data analysis time from days to minutes. Further, XCMS Stream has the capability to increase the efficiency of downstream biochemical dependent data acquisition (BDDA) analysis by initiating data conversion and data processing on subsets of data acquired, expanding its application beyond data transfer to smart preliminary data decision-making prior to full acquisition.
Organization and Visualization for Initial Analysis of Forced-Choice Ipsative Data
ERIC Educational Resources Information Center
Cochran, Jill A.
2015-01-01
Forced-choice ipsative data are common in personality, philosophy and other preference-based studies. However, this type of data inherently contains dependencies that are challenging for usual statistical analysis. In order to utilize the structure of the data as a guide for analysis rather than as a challenge to manage, a visualisation tool was…
40 CFR 86.884-13 - Data analysis.
Code of Federal Regulations, 2010 CFR
2010-07-01
... 40 Protection of Environment 19 2010-07-01 2010-07-01 false Data analysis. 86.884-13 Section 86... New Diesel Heavy-Duty Engines; Smoke Exhaust Test Procedure § 86.884-13 Data analysis. The following procedure shall be used to analyze the test data: (a) Locate the modes specified in § 86.884-7(a)(1) through...
40 CFR 86.884-13 - Data analysis.
Code of Federal Regulations, 2011 CFR
2011-07-01
... 40 Protection of Environment 19 2011-07-01 2011-07-01 false Data analysis. 86.884-13 Section 86... New Diesel Heavy-Duty Engines; Smoke Exhaust Test Procedure § 86.884-13 Data analysis. The following procedure shall be used to analyze the test data: (a) Locate the modes specified in § 86.884-7(a)(1) through...
Environmental Fluctuations and Acoustic Data Communications
2015-09-30
July 2011 along with subsequent analysis of the experiment data. KAM11 Experiment (2011) A shallow water acoustic communications experiment...packet and packet-to-packet variability. Algorithm Design and Experiment Data Analysis Communication receiver algorithm design for shallow water is...exhibited substantial daily oceanographic variability. Analysis of the KAM11 experiment data this past year has focused on fixed source transmissions
40 CFR 86.884-13 - Data analysis.
Code of Federal Regulations, 2014 CFR
2014-07-01
... 40 Protection of Environment 19 2014-07-01 2014-07-01 false Data analysis. 86.884-13 Section 86... Heavy-Duty Engines; Smoke Exhaust Test Procedure § 86.884-13 Data analysis. The following procedure shall be used to analyze the test data: (a) Locate the modes specified in § 86.884-7(a)(1) through (a)(4...
40 CFR 86.884-13 - Data analysis.
Code of Federal Regulations, 2013 CFR
2013-07-01
... 40 Protection of Environment 20 2013-07-01 2013-07-01 false Data analysis. 86.884-13 Section 86... New Diesel Heavy-Duty Engines; Smoke Exhaust Test Procedure § 86.884-13 Data analysis. The following procedure shall be used to analyze the test data: (a) Locate the modes specified in § 86.884-7(a)(1) through...
40 CFR 86.884-13 - Data analysis.
Code of Federal Regulations, 2012 CFR
2012-07-01
... 40 Protection of Environment 20 2012-07-01 2012-07-01 false Data analysis. 86.884-13 Section 86... New Diesel Heavy-Duty Engines; Smoke Exhaust Test Procedure § 86.884-13 Data analysis. The following procedure shall be used to analyze the test data: (a) Locate the modes specified in § 86.884-7(a)(1) through...
Teaching Data Analysis with Interactive Visual Narratives
ERIC Educational Resources Information Center
Saundage, Dilal; Cybulski, Jacob L.; Keller, Susan; Dharmasena, Lasitha
2016-01-01
Data analysis is a major part of business analytics (BA), which refers to the skills, methods, and technologies that enable managers to make swift, quality decisions based on large amounts of data. BA has become a major component of Information Systems (IS) courses all over the world. The challenge for IS educators is to teach data analysis--the…
Chandra Interactive Analysis of Observations (CIAO)
NASA Technical Reports Server (NTRS)
Dobrzycki, Adam
2000-01-01
The Chandra (formerly AXAF) telescope, launched on July 23, 1999, provides X-rays data with unprecedented spatial and spectral resolution. As part of the Chandra scientific support, the Chandra X-ray Observatory Center provides a new data analysis system, CIAO ("Chandra Interactive Analysis of Observations"). We will present the main components of the system: "First Look" analysis; SHERPA: a multi-dimensional, multi-mission modeling and fitting application; Chandra Imaging and Plotting System; Detect package-source detection algorithms; and DM package generic data manipulation tools, We will set up a demonstration of the portable version of the system and show examples of Chandra Data Analysis.
Simulation-based sensitivity analysis for non-ignorably missing data.
Yin, Peng; Shi, Jian Q
2017-01-01
Sensitivity analysis is popular in dealing with missing data problems particularly for non-ignorable missingness, where full-likelihood method cannot be adopted. It analyses how sensitively the conclusions (output) may depend on assumptions or parameters (input) about missing data, i.e. missing data mechanism. We call models with the problem of uncertainty sensitivity models. To make conventional sensitivity analysis more useful in practice we need to define some simple and interpretable statistical quantities to assess the sensitivity models and make evidence based analysis. We propose a novel approach in this paper on attempting to investigate the possibility of each missing data mechanism model assumption, by comparing the simulated datasets from various MNAR models with the observed data non-parametrically, using the K-nearest-neighbour distances. Some asymptotic theory has also been provided. A key step of this method is to plug in a plausibility evaluation system towards each sensitivity parameter, to select plausible values and reject unlikely values, instead of considering all proposed values of sensitivity parameters as in the conventional sensitivity analysis method. The method is generic and has been applied successfully to several specific models in this paper including meta-analysis model with publication bias, analysis of incomplete longitudinal data and mean estimation with non-ignorable missing data.
MEA-Tools: an open source toolbox for the analysis of multi-electrode data with MATLAB.
Egert, U; Knott, Th; Schwarz, C; Nawrot, M; Brandt, A; Rotter, S; Diesmann, M
2002-05-30
Recent advances in electrophysiological techniques have created new tools for the acquisition and storage of neuronal activity recorded simultaneously with numerous electrodes. These techniques support the analysis of the function as well as the structure of individual electrogenic cells in the context of surrounding neuronal or cardiac network. Commercially available tools for the analysis of such data, however, cannot be easily adapted to newly emerging requirements for data analysis and visualization, and cross compatibility between them is limited. In this report we introduce a free open source toolbox called microelectrode array tools (MEA-Tools) for the analysis of multi-electrode data based on the common data analysis environment MATLAB (version 5.3-6.1, The Mathworks, Natick, MA). The toolbox itself is platform independent. The file interface currently supports files recorded with MCRack (Multi Channel Systems, Reutlingen, Germany) under Microsoft Windows 95, 98, NT, and 2000, but can be adapted to other data acquisition systems. Functions are controlled via command line input and graphical user interfaces, and support common requirements for the analysis of local field potentials, extracellular spike activity, and continuous recordings, in addition to supplementary data acquired by additional instruments, e.g. intracellular amplifiers. Data may be processed as continuous recordings or time windows triggered to some event.
Rubel, Oliver; Bowen, Benjamin P
2018-01-01
Mass spectrometry imaging (MSI) is a transformative imaging method that supports the untargeted, quantitative measurement of the chemical composition and spatial heterogeneity of complex samples with broad applications in life sciences, bioenergy, and health. While MSI data can be routinely collected, its broad application is currently limited by the lack of easily accessible analysis methods that can process data of the size, volume, diversity, and complexity generated by MSI experiments. The development and application of cutting-edge analytical methods is a core driver in MSI research for new scientific discoveries, medical diagnostics, and commercial-innovation. However, the lack of means to share, apply, and reproduce analyses hinders the broad application, validation, and use of novel MSI analysis methods. To address this central challenge, we introduce the Berkeley Analysis and Storage Toolkit (BASTet), a novel framework for shareable and reproducible data analysis that supports standardized data and analysis interfaces, integrated data storage, data provenance, workflow management, and a broad set of integrated tools. Based on BASTet, we describe the extension of the OpenMSI mass spectrometry imaging science gateway to enable web-based sharing, reuse, analysis, and visualization of data analyses and derived data products. We demonstrate the application of BASTet and OpenMSI in practice to identify and compare characteristic substructures in the mouse brain based on their chemical composition measured via MSI.
The Comparison of VLBI Data Analysis Using Software Globl and Globk
NASA Astrophysics Data System (ADS)
Guangli, W.; Xiaoya, W.; Jinling, L.; Wenyao, Z.
The comparison of different geodetic data analysis software is one of the quite of- ten mentioned topics. In this paper we try to find out the difference between software GLOBL and GLOBK when use them to process the same set of VLBI data. GLOBL is a software developed by VLBI team, geodesy branch, GSFC/NASA to process geode- tic VLBI data using algorithm of arc-parameter-elimination, while GLOBK using al- gorithm of kalman filtering is mainly used in GPS data analysis, and it is also used in VLBI data analysis. Our work focus on whether there are significant difference when use the two softwares to analyze the same VLBI data set and investigate the reasons caused the difference.
Data management, archiving, visualization and analysis of space physics data
NASA Technical Reports Server (NTRS)
Russell, C. T.
1995-01-01
A series of programs for the visualization and analysis of space physics data has been developed at UCLA. In the course of those developments, a number of lessons have been learned regarding data management and data archiving, as well as data analysis. The issues now facing those wishing to develop such software, as well as the lessons learned, are reviewed. Modern media have eased many of the earlier problems of the physical volume required to store data, the speed of access, and the permanence of the records. However, the ultimate longevity of these media is still a question of debate. Finally, while software development has become easier, cost is still a limiting factor in developing visualization and analysis software.
Environmental science applications with Rapid Integrated Mapping and analysis System (RIMS)
NASA Astrophysics Data System (ADS)
Shiklomanov, A.; Prusevich, A.; Gordov, E.; Okladnikov, I.; Titov, A.
2016-11-01
The Rapid Integrated Mapping and analysis System (RIMS) has been developed at the University of New Hampshire as an online instrument for multidisciplinary data visualization, analysis and manipulation with a focus on hydrological applications. Recently it was enriched with data and tools to allow more sophisticated analysis of interdisciplinary data. Three different examples of specific scientific applications with RIMS are demonstrated and discussed. Analysis of historical changes in major components of the Eurasian pan-Arctic water budget is based on historical discharge data, gridded observational meteorological fields, and remote sensing data for sea ice area. Express analysis of the extremely hot and dry summer of 2010 across European Russia is performed using a combination of near-real time and historical data to evaluate the intensity and spatial distribution of this event and its socioeconomic impacts. Integrative analysis of hydrological, water management, and population data for Central Asia over the last 30 years provides an assessment of regional water security due to changes in climate, water use and demography. The presented case studies demonstrate the capabilities of RIMS as a powerful instrument for hydrological and coupled human-natural systems research.
Analysis of longitudinal data from animals where some data are missing in SPSS
Duricki, DA; Soleman, S; Moon, LDF
2017-01-01
Testing of therapies for disease or injury often involves analysis of longitudinal data from animals. Modern analytical methods have advantages over conventional methods (particularly where some data are missing) yet are not used widely by pre-clinical researchers. We provide here an easy to use protocol for analysing longitudinal data from animals and present a click-by-click guide for performing suitable analyses using the statistical package SPSS. We guide readers through analysis of a real-life data set obtained when testing a therapy for brain injury (stroke) in elderly rats. We show that repeated measures analysis of covariance failed to detect a treatment effect when a few data points were missing (due to animal drop-out) whereas analysis using an alternative method detected a beneficial effect of treatment; specifically, we demonstrate the superiority of linear models (with various covariance structures) analysed using Restricted Maximum Likelihood estimation (to include all available data). This protocol takes two hours to follow. PMID:27196723
Software for the Integration of Multiomics Experiments in Bioconductor.
Ramos, Marcel; Schiffer, Lucas; Re, Angela; Azhar, Rimsha; Basunia, Azfar; Rodriguez, Carmen; Chan, Tiffany; Chapman, Phil; Davis, Sean R; Gomez-Cabrero, David; Culhane, Aedin C; Haibe-Kains, Benjamin; Hansen, Kasper D; Kodali, Hanish; Louis, Marie S; Mer, Arvind S; Riester, Markus; Morgan, Martin; Carey, Vince; Waldron, Levi
2017-11-01
Multiomics experiments are increasingly commonplace in biomedical research and add layers of complexity to experimental design, data integration, and analysis. R and Bioconductor provide a generic framework for statistical analysis and visualization, as well as specialized data classes for a variety of high-throughput data types, but methods are lacking for integrative analysis of multiomics experiments. The MultiAssayExperiment software package, implemented in R and leveraging Bioconductor software and design principles, provides for the coordinated representation of, storage of, and operation on multiple diverse genomics data. We provide the unrestricted multiple 'omics data for each cancer tissue in The Cancer Genome Atlas as ready-to-analyze MultiAssayExperiment objects and demonstrate in these and other datasets how the software simplifies data representation, statistical analysis, and visualization. The MultiAssayExperiment Bioconductor package reduces major obstacles to efficient, scalable, and reproducible statistical analysis of multiomics data and enhances data science applications of multiple omics datasets. Cancer Res; 77(21); e39-42. ©2017 AACR . ©2017 American Association for Cancer Research.
Klukas, Christian; Chen, Dijun; Pape, Jean-Michel
2014-01-01
High-throughput phenotyping is emerging as an important technology to dissect phenotypic components in plants. Efficient image processing and feature extraction are prerequisites to quantify plant growth and performance based on phenotypic traits. Issues include data management, image analysis, and result visualization of large-scale phenotypic data sets. Here, we present Integrated Analysis Platform (IAP), an open-source framework for high-throughput plant phenotyping. IAP provides user-friendly interfaces, and its core functions are highly adaptable. Our system supports image data transfer from different acquisition environments and large-scale image analysis for different plant species based on real-time imaging data obtained from different spectra. Due to the huge amount of data to manage, we utilized a common data structure for efficient storage and organization of data for both input data and result data. We implemented a block-based method for automated image processing to extract a representative list of plant phenotypic traits. We also provide tools for build-in data plotting and result export. For validation of IAP, we performed an example experiment that contains 33 maize (Zea mays ‘Fernandez’) plants, which were grown for 9 weeks in an automated greenhouse with nondestructive imaging. Subsequently, the image data were subjected to automated analysis with the maize pipeline implemented in our system. We found that the computed digital volume and number of leaves correlate with our manually measured data in high accuracy up to 0.98 and 0.95, respectively. In summary, IAP provides a multiple set of functionalities for import/export, management, and automated analysis of high-throughput plant phenotyping data, and its analysis results are highly reliable. PMID:24760818
Iterative categorization (IC): a systematic technique for analysing qualitative data
2016-01-01
Abstract The processes of analysing qualitative data, particularly the stage between coding and publication, are often vague and/or poorly explained within addiction science and research more broadly. A simple but rigorous and transparent technique for analysing qualitative textual data, developed within the field of addiction, is described. The technique, iterative categorization (IC), is suitable for use with inductive and deductive codes and can support a range of common analytical approaches, e.g. thematic analysis, Framework, constant comparison, analytical induction, content analysis, conversational analysis, discourse analysis, interpretative phenomenological analysis and narrative analysis. Once the data have been coded, the only software required is a standard word processing package. Worked examples are provided. PMID:26806155
Columbia River Component Data Gap Analysis
DOE Office of Scientific and Technical Information (OSTI.GOV)
L. C. Hulstrom
2007-10-23
This Data Gap Analysis report documents the results of a study conducted by Washington Closure Hanford (WCH) to compile and reivew the currently available surface water and sediment data for the Columbia River near and downstream of the Hanford Site. This Data Gap Analysis study was conducted to review the adequacy of the existing surface water and sediment data set from the Columbia River, with specific reference to the use of the data in future site characterization and screening level risk assessments.
PyMICE: APython library for analysis of IntelliCage data.
Dzik, Jakub M; Puścian, Alicja; Mijakowska, Zofia; Radwanska, Kasia; Łęski, Szymon
2018-04-01
IntelliCage is an automated system for recording the behavior of a group of mice housed together. It produces rich, detailed behavioral data calling for new methods and software for their analysis. Here we present PyMICE, a free and open-source library for analysis of IntelliCage data in the Python programming language. We describe the design and demonstrate the use of the library through a series of examples. PyMICE provides easy and intuitive access to IntelliCage data, and thus facilitates the possibility of using numerous other Python scientific libraries to form a complete data analysis workflow.
NASA Astrophysics Data System (ADS)
Otake, H.; Ohtake, M.; Ishihara, Y.; Masuda, K.; Sato, H.; Inoue, H.; Yamamoto, M.; Hoshino, T.; Wakabayashi, S.; Hashimoto, T.
2018-04-01
JAXA established JAXA Lunar and Planetary Exploration Data Analysis Group (JLPEDA) at 2016. Our group has been analyzing lunar and planetary data for various missions. Here, we introduce one of our activities.
Collection and analysis of 2013-2014 travel time data.
DOT National Transportation Integrated Search
2017-07-04
This report documents the findings of Planning Study 27, Collection and Analysis of 2013-2014 Travel Time Data, which is a continuation of Planning Study 24, Analysis of Historical Travel Time Data. The main scope is to analyze newly acquired link-re...
Space-based surface wind vectors to aid understanding of air-sea interactions
NASA Technical Reports Server (NTRS)
Atlas, R.; Bloom, S. C.; Hoffman, R. N.; Ardizzone, J. V.; Brin, G.
1991-01-01
A novel and unique ocean-surface wind data-set has been derived by combining the Defense Meteorological Satellite Program Special Sensor Microwave Imager data with additional conventional data. The variational analysis used generates a gridded surface wind analysis that minimizes an objective function measuring the misfit of the analysis to the background, the data, and certain a priori constraints. In the present case, the European Center for Medium-Range Weather Forecasts surface-wind analysis is used as the background.
2015-06-01
occasioned by numerous, rhythmic high pressure non-voiding contractions (NVC) during normal bladder filling. These NVC are responsible for incontinence...month experiment period) 2c. Final data analysis (data analysis will be ongoing throughout, this will represent the finalization of data period, 0.25...Sub-Tasks 2a and 2b. As of 9/2/14, we have completed much of the data analysis (Sub-Task 2c; see below for results). Analysis of abdominal
NASA Technical Reports Server (NTRS)
Rummler, D. R.
1976-01-01
The results are presented of investigations to apply regression techniques to the development of methodology for creep-rupture data analysis. Regression analysis techniques are applied to the explicit description of the creep behavior of materials for space shuttle thermal protection systems. A regression analysis technique is compared with five parametric methods for analyzing three simulated and twenty real data sets, and a computer program for the evaluation of creep-rupture data is presented.
Gupta, Surya; De Puysseleyr, Veronic; Van der Heyden, José; Maddelein, Davy; Lemmens, Irma; Lievens, Sam; Degroeve, Sven; Tavernier, Jan; Martens, Lennart
2017-05-01
Protein-protein interaction (PPI) studies have dramatically expanded our knowledge about cellular behaviour and development in different conditions. A multitude of high-throughput PPI techniques have been developed to achieve proteome-scale coverage for PPI studies, including the microarray based Mammalian Protein-Protein Interaction Trap (MAPPIT) system. Because such high-throughput techniques typically report thousands of interactions, managing and analysing the large amounts of acquired data is a challenge. We have therefore built the MAPPIT cell microArray Protein Protein Interaction-Data management & Analysis Tool (MAPPI-DAT) as an automated data management and analysis tool for MAPPIT cell microarray experiments. MAPPI-DAT stores the experimental data and metadata in a systematic and structured way, automates data analysis and interpretation, and enables the meta-analysis of MAPPIT cell microarray data across all stored experiments. MAPPI-DAT is developed in Python, using R for data analysis and MySQL as data management system. MAPPI-DAT is cross-platform and can be ran on Microsoft Windows, Linux and OS X/macOS. The source code and a Microsoft Windows executable are freely available under the permissive Apache2 open source license at https://github.com/compomics/MAPPI-DAT. jan.tavernier@vib-ugent.be or lennart.martens@vib-ugent.be. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press.
Tebel, Katrin; Boldt, Vivien; Steininger, Anne; Port, Matthias; Ebert, Grit; Ullmann, Reinhard
2017-01-06
The analysis of DNA copy number variants (CNV) has increasing impact in the field of genetic diagnostics and research. However, the interpretation of CNV data derived from high resolution array CGH or NGS platforms is complicated by the considerable variability of the human genome. Therefore, tools for multidimensional data analysis and comparison of patient cohorts are needed to assist in the discrimination of clinically relevant CNVs from others. We developed GenomeCAT, a standalone Java application for the analysis and integrative visualization of CNVs. GenomeCAT is composed of three modules dedicated to the inspection of single cases, comparative analysis of multidimensional data and group comparisons aiming at the identification of recurrent aberrations in patients sharing the same phenotype, respectively. Its flexible import options ease the comparative analysis of own results derived from microarray or NGS platforms with data from literature or public depositories. Multidimensional data obtained from different experiment types can be merged into a common data matrix to enable common visualization and analysis. All results are stored in the integrated MySQL database, but can also be exported as tab delimited files for further statistical calculations in external programs. GenomeCAT offers a broad spectrum of visualization and analysis tools that assist in the evaluation of CNVs in the context of other experiment data and annotations. The use of GenomeCAT does not require any specialized computer skills. The various R packages implemented for data analysis are fully integrated into GenomeCATs graphical user interface and the installation process is supported by a wizard. The flexibility in terms of data import and export in combination with the ability to create a common data matrix makes the program also well suited as an interface between genomic data from heterogeneous sources and external software tools. Due to the modular architecture the functionality of GenomeCAT can be easily extended by further R packages or customized plug-ins to meet future requirements.
Multivariate time series analysis of neuroscience data: some challenges and opportunities.
Pourahmadi, Mohsen; Noorbaloochi, Siamak
2016-04-01
Neuroimaging data may be viewed as high-dimensional multivariate time series, and analyzed using techniques from regression analysis, time series analysis and spatiotemporal analysis. We discuss issues related to data quality, model specification, estimation, interpretation, dimensionality and causality. Some recent research areas addressing aspects of some recurring challenges are introduced. Copyright © 2015 Elsevier Ltd. All rights reserved.
COMAN: a web server for comprehensive metatranscriptomics analysis.
Ni, Yueqiong; Li, Jun; Panagiotou, Gianni
2016-08-11
Microbiota-oriented studies based on metagenomic or metatranscriptomic sequencing have revolutionised our understanding on microbial ecology and the roles of both clinical and environmental microbes. The analysis of massive metatranscriptomic data requires extensive computational resources, a collection of bioinformatics tools and expertise in programming. We developed COMAN (Comprehensive Metatranscriptomics Analysis), a web-based tool dedicated to automatically and comprehensively analysing metatranscriptomic data. COMAN pipeline includes quality control of raw reads, removal of reads derived from non-coding RNA, followed by functional annotation, comparative statistical analysis, pathway enrichment analysis, co-expression network analysis and high-quality visualisation. The essential data generated by COMAN are also provided in tabular format for additional analysis and integration with other software. The web server has an easy-to-use interface and detailed instructions, and is freely available at http://sbb.hku.hk/COMAN/ CONCLUSIONS: COMAN is an integrated web server dedicated to comprehensive functional analysis of metatranscriptomic data, translating massive amount of reads to data tables and high-standard figures. It is expected to facilitate the researchers with less expertise in bioinformatics in answering microbiota-related biological questions and to increase the accessibility and interpretation of microbiota RNA-Seq data.
Advances in Risk Analysis with Big Data.
Choi, Tsan-Ming; Lambert, James H
2017-08-01
With cloud computing, Internet-of-things, wireless sensors, social media, fast storage and retrieval, etc., organizations and enterprises have access to unprecedented amounts and varieties of data. Current risk analysis methodology and applications are experiencing related advances and breakthroughs. For example, highway operations data are readily available, and making use of them reduces risks of traffic crashes and travel delays. Massive data of financial and enterprise systems support decision making under risk by individuals, industries, regulators, etc. In this introductory article, we first discuss the meaning of big data for risk analysis. We then examine recent advances in risk analysis with big data in several topic areas. For each area, we identify and introduce the relevant articles that are featured in the special issue. We conclude with a discussion on future research opportunities. © 2017 Society for Risk Analysis.
Huang, Zhenzhen; Duan, Huilong; Li, Haomin
2015-01-01
Large-scale human cancer genomics projects, such as TCGA, generated large genomics data for further study. Exploring and mining these data to obtain meaningful analysis results can help researchers find potential genomics alterations that intervene the development and metastasis of tumors. We developed a web-based gene analysis platform, named TCGA4U, which used statistics methods and models to help translational investigators explore, mine and visualize human cancer genomic characteristic information from the TCGA datasets. Furthermore, through Gene Ontology (GO) annotation and clinical data integration, the genomic data were transformed into biological process, molecular function, cellular component and survival curves to help researchers identify potential driver genes. Clinical researchers without expertise in data analysis will benefit from such a user-friendly genomic analysis platform.
Carroll, Adam J; Badger, Murray R; Harvey Millar, A
2010-07-14
Standardization of analytical approaches and reporting methods via community-wide collaboration can work synergistically with web-tool development to result in rapid community-driven expansion of online data repositories suitable for data mining and meta-analysis. In metabolomics, the inter-laboratory reproducibility of gas-chromatography/mass-spectrometry (GC/MS) makes it an obvious target for such development. While a number of web-tools offer access to datasets and/or tools for raw data processing and statistical analysis, none of these systems are currently set up to act as a public repository by easily accepting, processing and presenting publicly submitted GC/MS metabolomics datasets for public re-analysis. Here, we present MetabolomeExpress, a new File Transfer Protocol (FTP) server and web-tool for the online storage, processing, visualisation and statistical re-analysis of publicly submitted GC/MS metabolomics datasets. Users may search a quality-controlled database of metabolite response statistics from publicly submitted datasets by a number of parameters (eg. metabolite, species, organ/biofluid etc.). Users may also perform meta-analysis comparisons of multiple independent experiments or re-analyse public primary datasets via user-friendly tools for t-test, principal components analysis, hierarchical cluster analysis and correlation analysis. They may interact with chromatograms, mass spectra and peak detection results via an integrated raw data viewer. Researchers who register for a free account may upload (via FTP) their own data to the server for online processing via a novel raw data processing pipeline. MetabolomeExpress https://www.metabolome-express.org provides a new opportunity for the general metabolomics community to transparently present online the raw and processed GC/MS data underlying their metabolomics publications. Transparent sharing of these data will allow researchers to assess data quality and draw their own insights from published metabolomics datasets.
Big Data in HEP: A comprehensive use case study
NASA Astrophysics Data System (ADS)
Gutsche, Oliver; Cremonesi, Matteo; Elmer, Peter; Jayatilaka, Bo; Kowalkowski, Jim; Pivarski, Jim; Sehrish, Saba; Mantilla Surez, Cristina; Svyatkovskiy, Alexey; Tran, Nhan
2017-10-01
Experimental Particle Physics has been at the forefront of analyzing the worlds largest datasets for decades. The HEP community was the first to develop suitable software and computing tools for this task. In recent times, new toolkits and systems collectively called Big Data technologies have emerged to support the analysis of Petabyte and Exabyte datasets in industry. While the principles of data analysis in HEP have not changed (filtering and transforming experiment-specific data formats), these new technologies use different approaches and promise a fresh look at analysis of very large datasets and could potentially reduce the time-to-physics with increased interactivity. In this talk, we present an active LHC Run 2 analysis, searching for dark matter with the CMS detector, as a testbed for Big Data technologies. We directly compare the traditional NTuple-based analysis with an equivalent analysis using Apache Spark on the Hadoop ecosystem and beyond. In both cases, we start the analysis with the official experiment data formats and produce publication physics plots. We will discuss advantages and disadvantages of each approach and give an outlook on further studies needed.
Preprocessing Raw Data in Clinical Medicine for a Data Mining Purpose
NASA Astrophysics Data System (ADS)
Peterková, Andrea; Michaľčonok, German
2016-12-01
Dealing with data from the field of medicine is nowadays very current and difficult. On a global scale, a large amount of medical data is produced on an everyday basis. For the purpose of our research, we understand medical data as data about patients like results from laboratory analysis, results from screening examinations (CT, ECHO) and clinical parameters. This data is usually in a raw format, difficult to understand, non-standard and not suitable for further processing or analysis. This paper aims to describe the possible method of data preparation and preprocessing of such raw medical data into a form, where further analysis algorithms can be applied.
Collection, processing and dissemination of data for the national solar demonstration program
NASA Technical Reports Server (NTRS)
Day, R. E.; Murphy, L. J.; Smok, J. T.
1978-01-01
A national solar data system developed for the DOE by IBM provides for automatic gathering, conversion, transfer, and analysis of demonstration site data. NASA requirements for this system include providing solar site hardware, engineering, data collection, and analysis. The specific tasks include: (1) solar energy system design/integration; (2) developing a site data acquisition subsystem; (3) developing a central data processing system; (4) operating the test facility at Marshall Space Flight Center; (5) collecting and analyzing data. The systematic analysis and evaluation of the data from the National Solar Data System is reflected in a monthly performance report and a solar energy system performance evaluation report.
The impact of mother's literacy on child dental caries: Individual data or aggregate data analysis?
Haghdoost, Ali-Akbar; Hessari, Hossein; Baneshi, Mohammad Reza; Rad, Maryam; Shahravan, Arash
2017-01-01
To evaluate the impact of mother's literacy on child dental caries based on a national oral health survey in Iran and to investigate the possibility of ecological fallacy in aggregate data analysis. Existing data were from second national oral health survey that was carried out in 2004, which including 8725 6 years old participants. The association of mother's literacy with caries occurrence (DMF (Decayed, Missing, Filling) total score >0) of her child was assessed using individual data by logistic regression model. Then the association of the percentages of mother's literacy and the percentages of decayed teeth in each 30 provinces of Iran was assessed using aggregated data retrieved from the data of second national oral health survey of Iran and alternatively from census of "Statistical Center of Iran" using linear regression model. The significance level was set at 0.05 for all analysis. Individual data analysis showed a statistically significant association between mother's literacy and decayed teeth of children ( P = 0.02, odds ratio = 0.83). There were not statistical significant association between mother's literacy and child dental caries in aggregate data analysis of oral health survey ( P = 0.79, B = 0.03) and census of "Statistical Center of Statistics" ( P = 0.60, B = 0.14). Literate mothers have a preventive effect on occurring dental caries of children. According to the high percentage of illiterate parents in Iran, it's logical to consider suitable methods of oral health education which do not need reading or writing. Aggregate data analysis and individual data analysis had completely different results in this study.
A new method for correlation analysis of compositional (environmental) data - a worked example.
Reimann, C; Filzmoser, P; Hron, K; Kynčlová, P; Garrett, R G
2017-12-31
Most data in environmental sciences and geochemistry are compositional. Already the unit used to report the data (e.g., μg/l, mg/kg, wt%) implies that the analytical results for each element are not free to vary independently of the other measured variables. This is often neglected in statistical analysis, where a simple log-transformation of the single variables is insufficient to put the data into an acceptable geometry. This is also important for bivariate data analysis and for correlation analysis, for which the data need to be appropriately log-ratio transformed. A new approach based on the isometric log-ratio (ilr) transformation, leading to so-called symmetric coordinates, is presented here. Summarizing the correlations in a heat-map gives a powerful tool for bivariate data analysis. Here an application of the new method using a data set from a regional geochemical mapping project based on soil O and C horizon samples is demonstrated. Differences to 'classical' correlation analysis based on log-transformed data are highlighted. The fact that some expected strong positive correlations appear and remain unchanged even following a log-ratio transformation has probably led to the misconception that the special nature of compositional data can be ignored when working with trace elements. The example dataset is employed to demonstrate that using 'classical' correlation analysis and plotting XY diagrams, scatterplots, based on the original or simply log-transformed data can easily lead to severe misinterpretations of the relationships between elements. Copyright © 2017 Elsevier B.V. All rights reserved.
Clustering analysis for muon tomography data elaboration in the Muon Portal project
NASA Astrophysics Data System (ADS)
Bandieramonte, M.; Antonuccio-Delogu, V.; Becciani, U.; Costa, A.; La Rocca, P.; Massimino, P.; Petta, C.; Pistagna, C.; Riggi, F.; Riggi, S.; Sciacca, E.; Vitello, F.
2015-05-01
Clustering analysis is one of multivariate data analysis techniques which allows to gather statistical data units into groups, in order to minimize the logical distance within each group and to maximize the one between different groups. In these proceedings, the authors present a novel approach to the muontomography data analysis based on clustering algorithms. As a case study we present the Muon Portal project that aims to build and operate a dedicated particle detector for the inspection of harbor containers to hinder the smuggling of nuclear materials. Clustering techniques, working directly on scattering points, help to detect the presence of suspicious items inside the container, acting, as it will be shown, as a filter for a preliminary analysis of the data.
Use of Controller Area Network (CAN) Data to Support Performance Testing
2015-07-16
examples below highlight some common CAN data that have been recorded and utilized for vehicle analysis . This is not an exhaustive list. 3.1 Vehicle...sensor integrated into the data acquisition system. The acceptable error for engine speed data used in a system performance analysis is typically...data the test engineer was able to determine that the system was not functioning properly, and which test runs were invalid for analysis purposes
Understanding Democracy and Violence in Africa: An Analysis of the Data
2016-06-10
UNDERSTANDING DEMOCRACY AND VIOLENCE IN AFRICA: AN ANALYSIS OF THE DATA A thesis presented to the Faculty of the U.S. Army Command...for reviewing instructions, searching existing data sources, gathering and maintaining the data needed, and completing and reviewing this collection of...AUG 2015 – JUNE 2016 4. TITLE AND SUBTITLE Understanding Democracy and Violence in Africa: An Analysis of the Data 5a. CONTRACT NUMBER 5b
NASA Technical Reports Server (NTRS)
Alston, D. W.
1981-01-01
The considered research had the objective to design a statistical model that could perform an error analysis of curve fits of wind tunnel test data using analysis of variance and regression analysis techniques. Four related subproblems were defined, and by solving each of these a solution to the general research problem was obtained. The capabilities of the evolved true statistical model are considered. The least squares fit is used to determine the nature of the force, moment, and pressure data. The order of the curve fit is increased in order to delete the quadratic effect in the residuals. The analysis of variance is used to determine the magnitude and effect of the error factor associated with the experimental data.
Oasis: online analysis of small RNA deep sequencing data.
Capece, Vincenzo; Garcia Vizcaino, Julio C; Vidal, Ramon; Rahman, Raza-Ur; Pena Centeno, Tonatiuh; Shomroni, Orr; Suberviola, Irantzu; Fischer, Andre; Bonn, Stefan
2015-07-01
Oasis is a web application that allows for the fast and flexible online analysis of small-RNA-seq (sRNA-seq) data. It was designed for the end user in the lab, providing an easy-to-use web frontend including video tutorials, demo data and best practice step-by-step guidelines on how to analyze sRNA-seq data. Oasis' exclusive selling points are a differential expression module that allows for the multivariate analysis of samples, a classification module for robust biomarker detection and an advanced programming interface that supports the batch submission of jobs. Both modules include the analysis of novel miRNAs, miRNA targets and functional analyses including GO and pathway enrichment. Oasis generates downloadable interactive web reports for easy visualization, exploration and analysis of data on a local system. Finally, Oasis' modular workflow enables for the rapid (re-) analysis of data. Oasis is implemented in Python, R, Java, PHP, C++ and JavaScript. It is freely available at http://oasis.dzne.de. stefan.bonn@dzne.de Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press.
PuffinPlot: A versatile, user-friendly program for paleomagnetic analysis
NASA Astrophysics Data System (ADS)
Lurcock, P. C.; Wilson, G. S.
2012-06-01
PuffinPlot is a user-friendly desktop application for analysis of paleomagnetic data, offering a unique combination of features. It runs on several operating systems, including Windows, Mac OS X, and Linux; supports both discrete and long core data; and facilitates analysis of very weakly magnetic samples. As well as interactive graphical operation, PuffinPlot offers batch analysis for large volumes of data, and a Python scripting interface for programmatic control of its features. Available data displays include demagnetization/intensity, Zijderveld, equal-area (for sample, site, and suite level demagnetization data, and for magnetic susceptibility anisotropy data), a demagnetization data table, and a natural remanent magnetization intensity histogram. Analysis types include principal component analysis, Fisherian statistics, and great-circle path intersections. The results of calculations can be exported as CSV (comma-separated value) files; graphs can be printed, and can also be saved as publication-quality vector files in SVG or PDF format. PuffinPlot is free, and the program, user manual, and fully documented source code may be downloaded from http://code.google.com/p/puffinplot/.
Limitations in Using Multiple Imputation to Harmonize Individual Participant Data for Meta-Analysis.
Siddique, Juned; de Chavez, Peter J; Howe, George; Cruden, Gracelyn; Brown, C Hendricks
2018-02-01
Individual participant data (IPD) meta-analysis is a meta-analysis in which the individual-level data for each study are obtained and used for synthesis. A common challenge in IPD meta-analysis is when variables of interest are measured differently in different studies. The term harmonization has been coined to describe the procedure of placing variables on the same scale in order to permit pooling of data from a large number of studies. Using data from an IPD meta-analysis of 19 adolescent depression trials, we describe a multiple imputation approach for harmonizing 10 depression measures across the 19 trials by treating those depression measures that were not used in a study as missing data. We then apply diagnostics to address the fit of our imputation model. Even after reducing the scale of our application, we were still unable to produce accurate imputations of the missing values. We describe those features of the data that made it difficult to harmonize the depression measures and provide some guidelines for using multiple imputation for harmonization in IPD meta-analysis.
Analysis of Big Data in Gait Biomechanics: Current Trends and Future Directions.
Phinyomark, Angkoon; Petri, Giovanni; Ibáñez-Marcelo, Esther; Osis, Sean T; Ferber, Reed
2018-01-01
The increasing amount of data in biomechanics research has greatly increased the importance of developing advanced multivariate analysis and machine learning techniques, which are better able to handle "big data". Consequently, advances in data science methods will expand the knowledge for testing new hypotheses about biomechanical risk factors associated with walking and running gait-related musculoskeletal injury. This paper begins with a brief introduction to an automated three-dimensional (3D) biomechanical gait data collection system: 3D GAIT, followed by how the studies in the field of gait biomechanics fit the quantities in the 5 V's definition of big data: volume, velocity, variety, veracity, and value. Next, we provide a review of recent research and development in multivariate and machine learning methods-based gait analysis that can be applied to big data analytics. These modern biomechanical gait analysis methods include several main modules such as initial input features, dimensionality reduction (feature selection and extraction), and learning algorithms (classification and clustering). Finally, a promising big data exploration tool called "topological data analysis" and directions for future research are outlined and discussed.
NASA Technical Reports Server (NTRS)
Handley, Thomas H., Jr.; Collins, Donald J.; Doyle, Richard J.; Jacobson, Allan S.
1991-01-01
Viewgraphs on DataHub knowledge based assistance for science visualization and analysis using large distributed databases. Topics covered include: DataHub functional architecture; data representation; logical access methods; preliminary software architecture; LinkWinds; data knowledge issues; expert systems; and data management.
Regional environmental analysis and management: New techniques for current problems
NASA Technical Reports Server (NTRS)
Honea, R. B.; Paludan, C. T. N.
1974-01-01
Advances in data acquisition and processing procedures for regional environmental analysis are discussed. Automated and semi-automated techniques employing Earth Resources Technology Satellite data and conventional data sources are presented. Experiences are summarized. The ERTS computer compatible tapes provide a very complete and flexible record of earth resources data and represent a viable medium to enhance regional environmental analysis research.
Katherine P O' Neill; Michael C. Amacher; Charles H. Perry
2005-01-01
Documents the types of data collected as part of the Forest Inventory and Analysis soil indicator, the field and laboratory methods used, and the rationale behind these data collection procedures. Guides analysts and researchers on incorporating soil indicator data into reports and research studies.
DataToText: A Consumer-Oriented Approach to Data Analysis
ERIC Educational Resources Information Center
Kenny, David A.
2010-01-01
DataToText is a project developed where the user communicates the relevant information for an analysis and DataToText computer routine produces text output that describes in words, tables, and figures the results from the analyses. Two extended examples are given, one an example of a moderator analysis and the other an example of a dyadic data…
Chemical Engineering Data Analysis Made Easy with DataFit
ERIC Educational Resources Information Center
Brenner, James R.
2006-01-01
The outline for half of a one-credit-hour course in analysis of chemical engineering data is presented, along with a range of typical problems encountered later on in the chemical engineering curriculum that can be used to reinforce the data analysis skills learned in the course. This mini course allows students to be exposed to a variety of ChE…
NASA Technical Reports Server (NTRS)
Dunn, A. R.
1975-01-01
Computer techniques for data analysis of sunspot observations are presented. Photographic spectra were converted to digital form and analyzed. Methods of determining magnetic field strengths, i.e., the Zeeman effect, are discussed. Errors originating with telescope equipment and the magnetograph are treated. Flow charts of test programs and procedures of the data analysis are shown.
ERIC Educational Resources Information Center
Abrams, Neal M.
2012-01-01
A cloud network system is combined with standard computing applications and a course management system to provide a robust method for sharing data among students. This system provides a unique method to improve data analysis by easily increasing the amount of sampled data available for analysis. The data can be shared within one course as well as…
Maione, Camila; Barbosa, Rommel Melgaço
2018-01-24
Rice is one of the most important staple foods around the world. Authentication of rice is one of the most addressed concerns in the present literature, which includes recognition of its geographical origin and variety, certification of organic rice and many other issues. Good results have been achieved by multivariate data analysis and data mining techniques when combined with specific parameters for ascertaining authenticity and many other useful characteristics of rice, such as quality, yield and others. This paper brings a review of the recent research projects on discrimination and authentication of rice using multivariate data analysis and data mining techniques. We found that data obtained from image processing, molecular and atomic spectroscopy, elemental fingerprinting, genetic markers, molecular content and others are promising sources of information regarding geographical origin, variety and other aspects of rice, being widely used combined with multivariate data analysis techniques. Principal component analysis and linear discriminant analysis are the preferred methods, but several other data classification techniques such as support vector machines, artificial neural networks and others are also frequently present in some studies and show high performance for discrimination of rice.
iTemplate: A template-based eye movement data analysis approach.
Xiao, Naiqi G; Lee, Kang
2018-02-08
Current eye movement data analysis methods rely on defining areas of interest (AOIs). Due to the fact that AOIs are created and modified manually, variances in their size, shape, and location are unavoidable. These variances affect not only the consistency of the AOI definitions, but also the validity of the eye movement analyses based on the AOIs. To reduce the variances in AOI creation and modification and achieve a procedure to process eye movement data with high precision and efficiency, we propose a template-based eye movement data analysis method. Using a linear transformation algorithm, this method registers the eye movement data from each individual stimulus to a template. Thus, users only need to create one set of AOIs for the template in order to analyze eye movement data, rather than creating a unique set of AOIs for all individual stimuli. This change greatly reduces the error caused by the variance from manually created AOIs and boosts the efficiency of the data analysis. Furthermore, this method can help researchers prepare eye movement data for some advanced analysis approaches, such as iMap. We have developed software (iTemplate) with a graphic user interface to make this analysis method available to researchers.
Data Curation and Visualization for MuSIASEM Analysis of the Nexus
NASA Astrophysics Data System (ADS)
Renner, Ansel
2017-04-01
A novel software-based approach to relational analysis applying recent theoretical advancements of the Multi-Scale Integrated Analysis of Societal and Ecosystem Metabolism (MuSIASEM) accounting framework is presented. This research explores and explains underutilized ways software can assist complex system analysis across the stages of data collection, exploration, analysis and dissemination and in a transparent and collaborative manner. This work is being conducted as part of, and in support of, the four-year European Commission H2020 project: Moving Towards Adaptive Governance in Complexity: Informing Nexus Security (MAGIC). In MAGIC, theoretical advancements to MuSIASEM propose a powerful new approach to spatial-temporal WEFC relational analysis in accordance with a structural-functional scaling mechanism appropriate for biophysically relevant complex system analyses. Software is designed primarily with JavaScript using the Angular2 model-view-controller framework and the Data-Driven Documents (D3) library. These design choices clarify and modularize data flow, simplify research practitioner's work, allow for and assist stakeholder involvement and advance collaboration at all stages. Data requirements and scalable, robust yet light-weight structuring will first be explained. Following, algorithms to process this data will be explored. Data interfaces and data visualization approaches will lastly be presented and described.
Meta-analysis of randomized clinical trials in the era of individual patient data sharing.
Kawahara, Takuya; Fukuda, Musashi; Oba, Koji; Sakamoto, Junichi; Buyse, Marc
2018-06-01
Individual patient data (IPD) meta-analysis is considered to be a gold standard when the results of several randomized trials are combined. Recent initiatives on sharing IPD from clinical trials offer unprecedented opportunities for using such data in IPD meta-analyses. First, we discuss the evidence generated and the benefits obtained by a long-established prospective IPD meta-analysis in early breast cancer. Next, we discuss a data-sharing system that has been adopted by several pharmaceutical sponsors. We review a number of retrospective IPD meta-analyses that have already been proposed using this data-sharing system. Finally, we discuss the role of data sharing in IPD meta-analysis in the future. Treatment effects can be more reliably estimated in both types of IPD meta-analyses than with summary statistics extracted from published papers. Specifically, with rich covariate information available on each patient, prognostic and predictive factors can be identified or confirmed. Also, when several endpoints are available, surrogate endpoints can be assessed statistically. Although there are difficulties in conducting, analyzing, and interpreting retrospective IPD meta-analysis utilizing the currently available data-sharing systems, data sharing will play an important role in IPD meta-analysis in the future.
DOT National Transportation Integrated Search
2006-11-01
This report discusses data acquisition and analysis for grade crossing risk analysis at the proposed San Joaquin High-Speed Rail Corridor in San Joaquin, California, and documents the data acquisition and analysis methodologies used to collect and an...
XRP -- SMM XRP Data Analysis & Reduction
NASA Astrophysics Data System (ADS)
McSherry, M.; Lawden, M. D.
This manual describes the various programs that are available for the reduction and analysis of XRP data. These programs have been developed under the VAX operating system. The original programs are resident on a VaxStation 3100 at the Solar Data Analysis Center (NASA/GSFC Greenbelt MD).
Use of an engineering data management system in the analysis of Space Shuttle Orbiter tiles
NASA Technical Reports Server (NTRS)
Giles, G. L.; Vallas, M.
1981-01-01
This paper demonstrates the use of an engineering data management system to facilitate the extensive stress analyses of the Space Shuttle Orbiter thermal protection system. Descriptions are given of the approach and methods used (1) to gather, organize, and store the data, (2) to query data interactively, (3) to generate graphic displays of the data, and (4) to access, transform, and prepare the data for input to a stress analysis program. The relational information management system was found to be well suited to the tile analysis problem because information related to many separate tiles could be accessed individually from a data base having a natural organization from an engineering viewpoint. The flexible user features of the system facilitated changes in data content and organization which occurred during the development and refinement of the tile analysis procedure. Additionally, the query language supported retrieval of data to satisfy a variety of user-specified conditions.
Privacy-preserving data cube for electronic medical records: An experimental evaluation.
Kim, Soohyung; Lee, Hyukki; Chung, Yon Dohn
2017-01-01
The aim of this study is to evaluate the effectiveness and efficiency of privacy-preserving data cubes of electronic medical records (EMRs). An EMR data cube is a complex of EMR statistics that are summarized or aggregated by all possible combinations of attributes. Data cubes are widely utilized for efficient big data analysis and also have great potential for EMR analysis. For safe data analysis without privacy breaches, we must consider the privacy preservation characteristics of the EMR data cube. In this paper, we introduce a design for a privacy-preserving EMR data cube and the anonymization methods needed to achieve data privacy. We further focus on changes in efficiency and effectiveness that are caused by the anonymization process for privacy preservation. Thus, we experimentally evaluate various types of privacy-preserving EMR data cubes using several practical metrics and discuss the applicability of each anonymization method with consideration for the EMR analysis environment. We construct privacy-preserving EMR data cubes from anonymized EMR datasets. A real EMR dataset and demographic dataset are used for the evaluation. There are a large number of anonymization methods to preserve EMR privacy, and the methods are classified into three categories (i.e., global generalization, local generalization, and bucketization) by anonymization rules. According to this classification, three types of privacy-preserving EMR data cubes were constructed for the evaluation. We perform a comparative analysis by measuring the data size, cell overlap, and information loss of the EMR data cubes. Global generalization considerably reduced the size of the EMR data cube and did not cause the data cube cells to overlap, but incurred a large amount of information loss. Local generalization maintained the data size and generated only moderate information loss, but there were cell overlaps that could decrease the search performance. Bucketization did not cause cells to overlap and generated little information loss; however, the method considerably inflated the size of the EMR data cubes. The utility of anonymized EMR data cubes varies widely according to the anonymization method, and the applicability of the anonymization method depends on the features of the EMR analysis environment. The findings help to adopt the optimal anonymization method considering the EMR analysis environment and goal of the EMR analysis. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Data Analysis and Data Mining: Current Issues in Biomedical Informatics
Bellazzi, Riccardo; Diomidous, Marianna; Sarkar, Indra Neil; Takabayashi, Katsuhiko; Ziegler, Andreas; McCray, Alexa T.
2011-01-01
Summary Background Medicine and biomedical sciences have become data-intensive fields, which, at the same time, enable the application of data-driven approaches and require sophisticated data analysis and data mining methods. Biomedical informatics provides a proper interdisciplinary context to integrate data and knowledge when processing available information, with the aim of giving effective decision-making support in clinics and translational research. Objectives To reflect on different perspectives related to the role of data analysis and data mining in biomedical informatics. Methods On the occasion of the 50th year of Methods of Information in Medicine a symposium was organized, that reflected on opportunities, challenges and priorities of organizing, representing and analysing data, information and knowledge in biomedicine and health care. The contributions of experts with a variety of backgrounds in the area of biomedical data analysis have been collected as one outcome of this symposium, in order to provide a broad, though coherent, overview of some of the most interesting aspects of the field. Results The paper presents sections on data accumulation and data-driven approaches in medical informatics, data and knowledge integration, statistical issues for the evaluation of data mining models, translational bioinformatics and bioinformatics aspects of genetic epidemiology. Conclusions Biomedical informatics represents a natural framework to properly and effectively apply data analysis and data mining methods in a decision-making context. In the future, it will be necessary to preserve the inclusive nature of the field and to foster an increasing sharing of data and methods between researchers. PMID:22146916
Sensitivity Analysis of Multiple Informant Models When Data are Not Missing at Random
Blozis, Shelley A.; Ge, Xiaojia; Xu, Shu; Natsuaki, Misaki N.; Shaw, Daniel S.; Neiderhiser, Jenae; Scaramella, Laura; Leve, Leslie; Reiss, David
2014-01-01
Missing data are common in studies that rely on multiple informant data to evaluate relationships among variables for distinguishable individuals clustered within groups. Estimation of structural equation models using raw data allows for incomplete data, and so all groups may be retained even if only one member of a group contributes data. Statistical inference is based on the assumption that data are missing completely at random or missing at random. Importantly, whether or not data are missing is assumed to be independent of the missing data. A saturated correlates model that incorporates correlates of the missingness or the missing data into an analysis and multiple imputation that may also use such correlates offer advantages over the standard implementation of SEM when data are not missing at random because these approaches may result in a data analysis problem for which the missingness is ignorable. This paper considers these approaches in an analysis of family data to assess the sensitivity of parameter estimates to assumptions about missing data, a strategy that may be easily implemented using SEM software. PMID:25221420
Status of MTP Data Analysis for TCSP
NASA Technical Reports Server (NTRS)
Mahoney, Michael J.
2006-01-01
Topics covered include: a) MTP temperature calibration and data analysis; b) Background for interpreting MTP data; c) Large amplitude temperature structure; d) Gravity waves (GWs) in MTP data; and e) Subsidence over hurricanes.
Scidac-Data: Enabling Data Driven Modeling of Exascale Computing
Mubarak, Misbah; Ding, Pengfei; Aliaga, Leo; ...
2017-11-23
Here, the SciDAC-Data project is a DOE-funded initiative to analyze and exploit two decades of information and analytics that have been collected by the Fermilab data center on the organization, movement, and consumption of high energy physics (HEP) data. The project analyzes the analysis patterns and data organization that have been used by NOvA, MicroBooNE, MINERvA, CDF, D0, and other experiments to develop realistic models of HEP analysis workflows and data processing. The SciDAC-Data project aims to provide both realistic input vectors and corresponding output data that can be used to optimize and validate simulations of HEP analysis. These simulationsmore » are designed to address questions of data handling, cache optimization, and workflow structures that are the prerequisites for modern HEP analysis chains to be mapped and optimized to run on the next generation of leadership-class exascale computing facilities. We present the use of a subset of the SciDAC-Data distributions, acquired from analysis of approximately 71,000 HEP workflows run on the Fermilab data center and corresponding to over 9 million individual analysis jobs, as the input to detailed queuing simulations that model the expected data consumption and caching behaviors of the work running in high performance computing (HPC) and high throughput computing (HTC) environments. In particular we describe how the Sequential Access via Metadata (SAM) data-handling system in combination with the dCache/Enstore-based data archive facilities has been used to develop radically different models for analyzing the HEP data. We also show how the simulations may be used to assess the impact of design choices in archive facilities.« less
NASA Astrophysics Data System (ADS)
McGuire, M. P.; Welty, C.; Gangopadhyay, A.; Karabatis, G.; Chen, Z.
2006-05-01
The urban environment is formed by complex interactions between natural and human dominated systems, the study of which requires the collection and analysis of very large datasets that span many disciplines. Recent advances in sensor technology and automated data collection have improved the ability to monitor urban environmental systems and are making the idea of an urban environmental observatory a reality. This in turn has created a number of potential challenges in data management and analysis. We present the design of an end-to-end system to store, analyze, and visualize data from a prototype urban environmental observatory based at the Baltimore Ecosystem Study, a National Science Foundation Long Term Ecological Research site (BES LTER). We first present an object-relational design of an operational database to store high resolution spatial datasets as well as data from sensor networks, archived data from the BES LTER, data from external sources such as USGS NWIS, EPA Storet, and metadata. The second component of the system design includes a spatiotemporal data warehouse consisting of a data staging plan and a multidimensional data model designed for the spatiotemporal analysis of monitoring data. The system design also includes applications for multi-resolution exploratory data analysis, multi-resolution data mining, and spatiotemporal visualization based on the spatiotemporal data warehouse. Also the system design includes interfaces with water quality models such as HSPF, SWMM, and SWAT, and applications for real-time sensor network visualization, data discovery, data download, QA/QC, and backup and recovery, all of which are based on the operational database. The system design includes both internet and workstation-based interfaces. Finally we present the design of a laboratory for spatiotemporal analysis and visualization as well as real-time monitoring of the sensor network.
Scidac-Data: Enabling Data Driven Modeling of Exascale Computing
NASA Astrophysics Data System (ADS)
Mubarak, Misbah; Ding, Pengfei; Aliaga, Leo; Tsaris, Aristeidis; Norman, Andrew; Lyon, Adam; Ross, Robert
2017-10-01
The SciDAC-Data project is a DOE-funded initiative to analyze and exploit two decades of information and analytics that have been collected by the Fermilab data center on the organization, movement, and consumption of high energy physics (HEP) data. The project analyzes the analysis patterns and data organization that have been used by NOvA, MicroBooNE, MINERvA, CDF, D0, and other experiments to develop realistic models of HEP analysis workflows and data processing. The SciDAC-Data project aims to provide both realistic input vectors and corresponding output data that can be used to optimize and validate simulations of HEP analysis. These simulations are designed to address questions of data handling, cache optimization, and workflow structures that are the prerequisites for modern HEP analysis chains to be mapped and optimized to run on the next generation of leadership-class exascale computing facilities. We present the use of a subset of the SciDAC-Data distributions, acquired from analysis of approximately 71,000 HEP workflows run on the Fermilab data center and corresponding to over 9 million individual analysis jobs, as the input to detailed queuing simulations that model the expected data consumption and caching behaviors of the work running in high performance computing (HPC) and high throughput computing (HTC) environments. In particular we describe how the Sequential Access via Metadata (SAM) data-handling system in combination with the dCache/Enstore-based data archive facilities has been used to develop radically different models for analyzing the HEP data. We also show how the simulations may be used to assess the impact of design choices in archive facilities.
Scidac-Data: Enabling Data Driven Modeling of Exascale Computing
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mubarak, Misbah; Ding, Pengfei; Aliaga, Leo
Here, the SciDAC-Data project is a DOE-funded initiative to analyze and exploit two decades of information and analytics that have been collected by the Fermilab data center on the organization, movement, and consumption of high energy physics (HEP) data. The project analyzes the analysis patterns and data organization that have been used by NOvA, MicroBooNE, MINERvA, CDF, D0, and other experiments to develop realistic models of HEP analysis workflows and data processing. The SciDAC-Data project aims to provide both realistic input vectors and corresponding output data that can be used to optimize and validate simulations of HEP analysis. These simulationsmore » are designed to address questions of data handling, cache optimization, and workflow structures that are the prerequisites for modern HEP analysis chains to be mapped and optimized to run on the next generation of leadership-class exascale computing facilities. We present the use of a subset of the SciDAC-Data distributions, acquired from analysis of approximately 71,000 HEP workflows run on the Fermilab data center and corresponding to over 9 million individual analysis jobs, as the input to detailed queuing simulations that model the expected data consumption and caching behaviors of the work running in high performance computing (HPC) and high throughput computing (HTC) environments. In particular we describe how the Sequential Access via Metadata (SAM) data-handling system in combination with the dCache/Enstore-based data archive facilities has been used to develop radically different models for analyzing the HEP data. We also show how the simulations may be used to assess the impact of design choices in archive facilities.« less
Scotch, Matthew; Parmanto, Bambang; Monaco, Valerie
2008-06-09
Data analysis in community health assessment (CHA) involves the collection, integration, and analysis of large numerical and spatial data sets in order to identify health priorities. Geographic Information Systems (GIS) enable for management and analysis using spatial data, but have limitations in performing analysis of numerical data because of its traditional database architecture.On-Line Analytical Processing (OLAP) is a multidimensional datawarehouse designed to facilitate querying of large numerical data. Coupling the spatial capabilities of GIS with the numerical analysis of OLAP, might enhance CHA data analysis. OLAP-GIS systems have been developed by university researchers and corporations, yet their potential for CHA data analysis is not well understood. To evaluate the potential of an OLAP-GIS decision support system for CHA problem solving, we compared OLAP-GIS to the standard information technology (IT) currently used by many public health professionals. SOVAT, an OLAP-GIS decision support system developed at the University of Pittsburgh, was compared against current IT for data analysis for CHA. For this study, current IT was considered the combined use of SPSS and GIS ("SPSS-GIS"). Graduate students, researchers, and faculty in the health sciences at the University of Pittsburgh were recruited. Each round consisted of: an instructional video of the system being evaluated, two practice tasks, five assessment tasks, and one post-study questionnaire. Objective and subjective measurement included: task completion time, success in answering the tasks, and system satisfaction. Thirteen individuals participated. Inferential statistics were analyzed using linear mixed model analysis. SOVAT was statistically significant (alpha = .01) from SPSS-GIS for satisfaction and time (p < .002). Descriptive results indicated that participants had greater success in answering the tasks when using SOVAT as compared to SPSS-GIS. Using SOVAT, tasks were completed more efficiently, with a higher rate of success, and with greater satisfaction, than the combined use of SPSS and GIS. The results from this study indicate a potential for OLAP-GIS decision support systems as a valuable tool for CHA data analysis.
Scotch, Matthew; Parmanto, Bambang; Monaco, Valerie
2008-01-01
Background Data analysis in community health assessment (CHA) involves the collection, integration, and analysis of large numerical and spatial data sets in order to identify health priorities. Geographic Information Systems (GIS) enable for management and analysis using spatial data, but have limitations in performing analysis of numerical data because of its traditional database architecture. On-Line Analytical Processing (OLAP) is a multidimensional datawarehouse designed to facilitate querying of large numerical data. Coupling the spatial capabilities of GIS with the numerical analysis of OLAP, might enhance CHA data analysis. OLAP-GIS systems have been developed by university researchers and corporations, yet their potential for CHA data analysis is not well understood. To evaluate the potential of an OLAP-GIS decision support system for CHA problem solving, we compared OLAP-GIS to the standard information technology (IT) currently used by many public health professionals. Methods SOVAT, an OLAP-GIS decision support system developed at the University of Pittsburgh, was compared against current IT for data analysis for CHA. For this study, current IT was considered the combined use of SPSS and GIS ("SPSS-GIS"). Graduate students, researchers, and faculty in the health sciences at the University of Pittsburgh were recruited. Each round consisted of: an instructional video of the system being evaluated, two practice tasks, five assessment tasks, and one post-study questionnaire. Objective and subjective measurement included: task completion time, success in answering the tasks, and system satisfaction. Results Thirteen individuals participated. Inferential statistics were analyzed using linear mixed model analysis. SOVAT was statistically significant (α = .01) from SPSS-GIS for satisfaction and time (p < .002). Descriptive results indicated that participants had greater success in answering the tasks when using SOVAT as compared to SPSS-GIS. Conclusion Using SOVAT, tasks were completed more efficiently, with a higher rate of success, and with greater satisfaction, than the combined use of SPSS and GIS. The results from this study indicate a potential for OLAP-GIS decision support systems as a valuable tool for CHA data analysis. PMID:18541037
methylPipe and compEpiTools: a suite of R packages for the integrative analysis of epigenomics data.
Kishore, Kamal; de Pretis, Stefano; Lister, Ryan; Morelli, Marco J; Bianchi, Valerio; Amati, Bruno; Ecker, Joseph R; Pelizzola, Mattia
2015-09-29
Numerous methods are available to profile several epigenetic marks, providing data with different genome coverage and resolution. Large epigenomic datasets are then generated, and often combined with other high-throughput data, including RNA-seq, ChIP-seq for transcription factors (TFs) binding and DNase-seq experiments. Despite the numerous computational tools covering specific steps in the analysis of large-scale epigenomics data, comprehensive software solutions for their integrative analysis are still missing. Multiple tools must be identified and combined to jointly analyze histone marks, TFs binding and other -omics data together with DNA methylation data, complicating the analysis of these data and their integration with publicly available datasets. To overcome the burden of integrating various data types with multiple tools, we developed two companion R/Bioconductor packages. The former, methylPipe, is tailored to the analysis of high- or low-resolution DNA methylomes in several species, accommodating (hydroxy-)methyl-cytosines in both CpG and non-CpG sequence context. The analysis of multiple whole-genome bisulfite sequencing experiments is supported, while maintaining the ability of integrating targeted genomic data. The latter, compEpiTools, seamlessly incorporates the results obtained with methylPipe and supports their integration with other epigenomics data. It provides a number of methods to score these data in regions of interest, leading to the identification of enhancers, lncRNAs, and RNAPII stalling/elongation dynamics. Moreover, it allows a fast and comprehensive annotation of the resulting genomic regions, and the association of the corresponding genes with non-redundant GeneOntology terms. Finally, the package includes a flexible method based on heatmaps for the integration of various data types, combining annotation tracks with continuous or categorical data tracks. methylPipe and compEpiTools provide a comprehensive Bioconductor-compliant solution for the integrative analysis of heterogeneous epigenomics data. These packages are instrumental in providing biologists with minimal R skills a complete toolkit facilitating the analysis of their own data, or in accelerating the analyses performed by more experienced bioinformaticians.
Analysis Resistant Cipher Method and Apparatus
NASA Technical Reports Server (NTRS)
Oakley, Ernest C. (Inventor)
2009-01-01
A system for encoding and decoding data words including an anti-analysis encoder unit for receiving an original plaintext and producing a recoded data, a data compression unit for receiving the recoded data and producing a compressed recoded data, and an encryption unit for receiving the compressed recoded data and producing an encrypted data. The recoded data has an increased non-correlatable data redundancy compared with the original plaintext in order to mask the statistical distribution of characters in the plaintext data. The system of the present invention further includes a decryption unit for receiving the encrypted data and producing a decrypted data, a data decompression unit for receiving the decrypted data and producing an uncompressed recoded data, and an anti-analysis decoder unit for receiving the uncompressed recoded data and producing a recovered plaintext that corresponds with the original plaintext.
Data Model Performance in Data Warehousing
NASA Astrophysics Data System (ADS)
Rorimpandey, G. C.; Sangkop, F. I.; Rantung, V. P.; Zwart, J. P.; Liando, O. E. S.; Mewengkang, A.
2018-02-01
Data Warehouses have increasingly become important in organizations that have large amount of data. It is not a product but a part of a solution for the decision support system in those organizations. Data model is the starting point for designing and developing of data warehouses architectures. Thus, the data model needs stable interfaces and consistent for a longer period of time. The aim of this research is to know which data model in data warehousing has the best performance. The research method is descriptive analysis, which has 3 main tasks, such as data collection and organization, analysis of data and interpretation of data. The result of this research is discussed in a statistic analysis method, represents that there is no statistical difference among data models used in data warehousing. The organization can utilize four data model proposed when designing and developing data warehouse.
Preparing Laboratory and Real-World EEG Data for Large-Scale Analysis: A Containerized Approach
Bigdely-Shamlo, Nima; Makeig, Scott; Robbins, Kay A.
2016-01-01
Large-scale analysis of EEG and other physiological measures promises new insights into brain processes and more accurate and robust brain–computer interface models. However, the absence of standardized vocabularies for annotating events in a machine understandable manner, the welter of collection-specific data organizations, the difficulty in moving data across processing platforms, and the unavailability of agreed-upon standards for preprocessing have prevented large-scale analyses of EEG. Here we describe a “containerized” approach and freely available tools we have developed to facilitate the process of annotating, packaging, and preprocessing EEG data collections to enable data sharing, archiving, large-scale machine learning/data mining and (meta-)analysis. The EEG Study Schema (ESS) comprises three data “Levels,” each with its own XML-document schema and file/folder convention, plus a standardized (PREP) pipeline to move raw (Data Level 1) data to a basic preprocessed state (Data Level 2) suitable for application of a large class of EEG analysis methods. Researchers can ship a study as a single unit and operate on its data using a standardized interface. ESS does not require a central database and provides all the metadata data necessary to execute a wide variety of EEG processing pipelines. The primary focus of ESS is automated in-depth analysis and meta-analysis EEG studies. However, ESS can also encapsulate meta-information for the other modalities such as eye tracking, that are increasingly used in both laboratory and real-world neuroimaging. ESS schema and tools are freely available at www.eegstudy.org and a central catalog of over 850 GB of existing data in ESS format is available at studycatalog.org. These tools and resources are part of a larger effort to enable data sharing at sufficient scale for researchers to engage in truly large-scale EEG analysis and data mining (BigEEG.org). PMID:27014048
A Seat Around the Table: Participatory Data Analysis With People Living With Dementia.
Clarke, Charlotte L; Wilkinson, Heather; Watson, Julie; Wilcockson, Jane; Kinnaird, Lindsay; Williamson, Toby
2018-05-01
The involvement of "people with experience" in research has developed considerably in the last decade. However, involvement as co-analysts at the point of data analysis and synthesis has received very little attention-in particular, there is very little work that involves people living with dementia as co-analysts. In this qualitative secondary data analysis project, we (a) analyzed data through two theoretical lenses: Douglas's cultural theory of risk and Tronto's Ethic of Care, and (b) analyzed data in workshops with people living with dementia. The design involved cycles of presenting, interpreting, representing and reinterpreting the data, and findings between multiple stakeholders. We explore ways of involving people with experience as co-analysts and explore the role of reflexivity, multiple voicing, literary styling, and performance in participatory data analysis.
Application of Open Source Technologies for Oceanographic Data Analysis
NASA Astrophysics Data System (ADS)
Huang, T.; Gangl, M.; Quach, N. T.; Wilson, B. D.; Chang, G.; Armstrong, E. M.; Chin, T. M.; Greguska, F.
2015-12-01
NEXUS is a data-intensive analysis solution developed with a new approach for handling science data that enables large-scale data analysis by leveraging open source technologies such as Apache Cassandra, Apache Spark, Apache Solr, and Webification. NEXUS has been selected to provide on-the-fly time-series and histogram generation for the Soil Moisture Active Passive (SMAP) mission for Level 2 and Level 3 Active, Passive, and Active Passive products. It also provides an on-the-fly data subsetting capability. NEXUS is designed to scale horizontally, enabling it to handle massive amounts of data in parallel. It takes a new approach on managing time and geo-referenced array data by dividing data artifacts into chunks and stores them in an industry-standard, horizontally scaled NoSQL database. This approach enables the development of scalable data analysis services that can infuse and leverage the elastic computing infrastructure of the Cloud. It is equipped with a high-performance geospatial and indexed data search solution, coupled with a high-performance data Webification solution free from file I/O bottlenecks, as well as a high-performance, in-memory data analysis engine. In this talk, we will focus on the recently funded AIST 2014 project by using NEXUS as the core for oceanographic anomaly detection service and web portal. We call it, OceanXtremes
Genetic data analysis for plant and animal breeding
USDA-ARS?s Scientific Manuscript database
This book is an advanced textbook covering the application of quantitative genetics theory to analysis of actual data (both trait and DNA marker information) for breeding populations of crops, trees, and animals. Chapter 1 is an introduction to basic software used for trait data analysis. Chapter 2 ...
Power Grid Data Analysis with R and Hadoop
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hafen, Ryan P.; Gibson, Tara D.; Kleese van Dam, Kerstin
This book chapter presents an approach to analysis of large-scale time-series sensor information based on our experience with power grid data. We use the R-Hadoop Integrated Programming Environment (RHIPE) to analyze a 2TB data set and present code and results for this analysis.
Hollunder, Jens; Friedel, Maik; Kuiper, Martin; Wilhelm, Thomas
2010-04-01
Many large 'omics' datasets have been published and many more are expected in the near future. New analysis methods are needed for best exploitation. We have developed a graphical user interface (GUI) for easy data analysis. Our discovery of all significant substructures (DASS) approach elucidates the underlying modularity, a typical feature of complex biological data. It is related to biclustering and other data mining approaches. Importantly, DASS-GUI also allows handling of multi-sets and calculation of statistical significances. DASS-GUI contains tools for further analysis of the identified patterns: analysis of the pattern hierarchy, enrichment analysis, module validation, analysis of additional numerical data, easy handling of synonymous names, clustering, filtering and merging. Different export options allow easy usage of additional tools such as Cytoscape. Source code, pre-compiled binaries for different systems, a comprehensive tutorial, case studies and many additional datasets are freely available at http://www.ifr.ac.uk/dass/gui/. DASS-GUI is implemented in Qt.
Integrated Structural Analysis and Test Program
NASA Technical Reports Server (NTRS)
Kaufman, Daniel
2005-01-01
An integrated structural-analysis and structure-testing computer program is being developed in order to: Automate repetitive processes in testing and analysis; Accelerate pre-test analysis; Accelerate reporting of tests; Facilitate planning of tests; Improve execution of tests; Create a vibration, acoustics, and shock test database; and Integrate analysis and test data. The software package includes modules pertaining to sinusoidal and random vibration, shock and time replication, acoustics, base-driven modal survey, and mass properties and static/dynamic balance. The program is commanded by use of ActiveX controls. There is minimal need to generate command lines. Analysis or test files are selected by opening a Windows Explorer display. After selecting the desired input file, the program goes to a so-called analysis data process or test data process, depending on the type of input data. The status of the process is given by a Windows status bar, and when processing is complete, the data are reported in graphical, tubular, and matrix form.
Visualizing nD Point Clouds as Topological Landscape Profiles to Guide Local Data Analysis
DOE Office of Scientific and Technical Information (OSTI.GOV)
Oesterling, Patrick; Heine, Christian; Weber, Gunther H.
2012-05-04
Analyzing high-dimensional point clouds is a classical challenge in visual analytics. Traditional techniques, such as projections or axis-based techniques, suffer from projection artifacts, occlusion, and visual complexity.We propose to split data analysis into two parts to address these shortcomings. First, a structural overview phase abstracts data by its density distribution. This phase performs topological analysis to support accurate and non-overlapping presentation of the high-dimensional cluster structure as a topological landscape profile. Utilizing a landscape metaphor, it presents clusters and their nesting as hills whose height, width, and shape reflect cluster coherence, size, and stability, respectively. A second local analysis phasemore » utilizes this global structural knowledge to select individual clusters or point sets for further, localized data analysis. Focusing on structural entities significantly reduces visual clutter in established geometric visualizations and permits a clearer, more thorough data analysis. In conclusion, this analysis complements the global topological perspective and enables the user to study subspaces or geometric properties, such as shape.« less
Regularized Generalized Canonical Correlation Analysis
ERIC Educational Resources Information Center
Tenenhaus, Arthur; Tenenhaus, Michel
2011-01-01
Regularized generalized canonical correlation analysis (RGCCA) is a generalization of regularized canonical correlation analysis to three or more sets of variables. It constitutes a general framework for many multi-block data analysis methods. It combines the power of multi-block data analysis methods (maximization of well identified criteria) and…
Analysis of space shuttle main engine data using Beacon-based exception analysis for multi-missions
NASA Technical Reports Server (NTRS)
Park, H.; Mackey, R.; James, M.; Zak, M.; Kynard, M.; Sebghati, J.; Greene, W.
2002-01-01
This paper describes analysis of the Space Shuttle Main Engine (SSME) sensor data using Beacon-based exception analysis for multimissions (BEAM), a new technology developed for sensor analysis and diagnostics in autonomous space systems by the Jet Propulsion Laboratory (JPL).
Structured Analysis and the Data Flow Diagram: Tools for Library Analysis.
ERIC Educational Resources Information Center
Carlson, David H.
1986-01-01
This article discusses tools developed to aid the systems analysis process (program evaluation and review technique, Gantt charts, organizational charts, decision tables, flowcharts, hierarchy plus input-process-output). Similarities and differences among techniques, library applications of analysis, structured systems analysis, and the data flow…
Webinar: Airborne Data Discovery and Analysis with Toolsets for Airborne Data (TAD)
Atmospheric Science Data Center
2016-10-18
Webinar: Airborne Data Discovery and Analysis with Toolsets for Airborne Data (TAD) Wednesday, October 26, 2016 Join us on ... and flight data ranges are available. Registration is now open. Access the full announcement For TAD Information, ...
Extreme Ultraviolet Imaging Telescope (EIT)
NASA Technical Reports Server (NTRS)
Lemen, J. R.; Freeland, S. L.
1997-01-01
Efforts concentrated on development and implementation of the SolarSoft (SSW) data analysis system. From an EIT analysis perspective, this system was designed to facilitate efficient reuse and conversion of software developed for Yohkoh/SXT and to take advantage of a large existing body of software developed by the SDAC, Yohkoh, and SOHO instrument teams. Another strong motivation for this system was to provide an EIT analysis environment which permits coordinated analysis of EIT data in conjunction with data from important supporting instruments, including Yohkoh/SXT and the other SOHO coronal instruments; CDS, SUMER, and LASCO. In addition, the SSW system will support coordinated EIT/TRACE analysis (by design) when TRACE data is available; TRACE launch is currently planned for March 1998. Working with Jeff Newmark, the Chianti software package (K.P. Dere et al) and UV /EUV data base was fully integrated into the SSW system to facilitate EIT temperature and emission analysis.
A streamlined Python framework for AT-TPC data analysis
NASA Astrophysics Data System (ADS)
Taylor, J. Z.; Bradt, J.; Bazin, D.; Kuchera, M. P.
2017-09-01
User-friendly data analysis software has been developed for the Active-Target Time Projection Chamber (AT-TPC) experiment at the National Superconducting Cyclotron Laboratory at Michigan State University. The AT-TPC, commissioned in 2014, is a gas-filled detector that acts as both the detector and target for high-efficiency detection of low-intensity, exotic nuclear reactions. The pytpc framework is a Python package for analyzing AT-TPC data. The package was developed for the analysis of 46Ar(p, p) data. The existing software was used to analyze data produced by the 40Ar(p, p) experiment that ran in August, 2015. Usage of the package was documented in an analysis manual both to improve analysis steps and aid in the work of future AT-TPC users. Software features and analysis methods in the pytpc framework will be presented along with the 40Ar results.
Metabolomic Analysis and Visualization Engine for LC–MS Data
Melamud, Eugene; Vastag, Livia; Rabinowitz, Joshua D.
2017-01-01
Metabolomic analysis by liquid chromatography–high-resolution mass spectrometry results in data sets with thousands of features arising from metabolites, fragments, isotopes, and adducts. Here we describe a software package, Metabolomic Analysis and Visualization ENgine (MAVEN), designed for efficient interactive analysis of LC–MS data, including in the presence of isotope labeling. The software contains tools for all aspects of the data analysis process, from feature extraction to pathway-based graphical data display. To facilitate data validation, a machine learning algorithm automatically assesses peak quality. Users interact with raw data primarily in the form of extracted ion chromatograms, which are displayed with overlaid circles indicating peak quality, and bar graphs of peak intensities for both unlabeled and isotope-labeled metabolite forms. Click-based navigation leads to additional information, such as raw data for specific isotopic forms or for metabolites changing significantly between conditions. Fast data processing algorithms result in nearly delay-free browsing. Drop-down menus provide tools for the overlay of data onto pathway maps. These tools enable animating series of pathway graphs, e.g., to show propagation of labeled forms through a metabolic network. MAVEN is released under an open source license at http://maven.princeton.edu. PMID:21049934
Management and Analysis of Radiation Portal Monitor Data
DOE Office of Scientific and Technical Information (OSTI.GOV)
Rowe, Nathan C; Alcala, Scott; Crye, Jason Michael
2014-01-01
Oak Ridge National Laboratory (ORNL) receives, archives, and analyzes data from radiation portal monitors (RPMs). Over time the amount of data submitted for analysis has grown significantly, and in fiscal year 2013, ORNL received 545 gigabytes of data representing more than 230,000 RPM operating days. This data comes from more than 900 RPMs. ORNL extracts this data into a relational database, which is accessed through a custom software solution called the Desktop Analysis and Reporting Tool (DART). DART is used by data analysts to complete a monthly lane-by-lane review of RPM status. Recently ORNL has begun to extend its datamore » analysis based on program-wide data processing in addition to the lane-by-lane review. Program-wide data processing includes the use of classification algorithms designed to identify RPMs with specific known issues and clustering algorithms intended to identify as-yet-unknown issues or new methods and measures for use in future classification algorithms. This paper provides an overview of the architecture used in the management of this data, performance aspects of the system, and additional requirements and methods used in moving toward an increased program-wide analysis paradigm.« less
Sequential Dictionary Learning From Correlated Data: Application to fMRI Data Analysis.
Seghouane, Abd-Krim; Iqbal, Asif
2017-03-22
Sequential dictionary learning via the K-SVD algorithm has been revealed as a successful alternative to conventional data driven methods such as independent component analysis (ICA) for functional magnetic resonance imaging (fMRI) data analysis. fMRI datasets are however structured data matrices with notions of spatio-temporal correlation and temporal smoothness. This prior information has not been included in the K-SVD algorithm when applied to fMRI data analysis. In this paper we propose three variants of the K-SVD algorithm dedicated to fMRI data analysis by accounting for this prior information. The proposed algorithms differ from the K-SVD in their sparse coding and dictionary update stages. The first two algorithms account for the known correlation structure in the fMRI data by using the squared Q, R-norm instead of the Frobenius norm for matrix approximation. The third and last algorithm account for both the known correlation structure in the fMRI data and the temporal smoothness. The temporal smoothness is incorporated in the dictionary update stage via regularization of the dictionary atoms obtained with penalization. The performance of the proposed dictionary learning algorithms are illustrated through simulations and applications on real fMRI data.
1990-06-01
design and component technologies are reviewed against a background of accident data analysis , resulting in grounds for confidence in higher safety levels...constructors or operators taking voluntary actions based on accident investigations and their own data . Analysis of the CAA Summaty data (Appendix 3...of engines. In the accident data analysis in Appendix 3, insufficient data was available to determine whether rotor configuration or associated
Power Spectrum Analysis of BNL Decay-Rate Data
2010-01-01
1/31 Power Spectrum Analysis of BNL Decay-Rate Data P.A. Sturrocka,*, J.B. Buncherb, E. Fischbachb, J.T. Gruenwaldb, D. Javorsek...Power Spectrum Analysis of BNL Decay-Rate Data 5a. CONTRACT NUMBER 5b. GRANT NUMBER 5c. PROGRAM ELEMENT NUMBER 6. AUTHOR(S) 5d. PROJECT NUMBER 5e...spectra in the rotational search band formed from BNL data and from ACRIM total solar irradiance data. Since rotation rate estimates derived from
Bohler, Anwesha; Eijssen, Lars M T; van Iersel, Martijn P; Leemans, Christ; Willighagen, Egon L; Kutmon, Martina; Jaillard, Magali; Evelo, Chris T
2015-08-23
Biological pathways are descriptive diagrams of biological processes widely used for functional analysis of differentially expressed genes or proteins. Primary data analysis, such as quality control, normalisation, and statistical analysis, is often performed in scripting languages like R, Perl, and Python. Subsequent pathway analysis is usually performed using dedicated external applications. Workflows involving manual use of multiple environments are time consuming and error prone. Therefore, tools are needed that enable pathway analysis directly within the same scripting languages used for primary data analyses. Existing tools have limited capability in terms of available pathway content, pathway editing and visualisation options, and export file formats. Consequently, making the full-fledged pathway analysis tool PathVisio available from various scripting languages will benefit researchers. We developed PathVisioRPC, an XMLRPC interface for the pathway analysis software PathVisio. PathVisioRPC enables creating and editing biological pathways, visualising data on pathways, performing pathway statistics, and exporting results in several image formats in multiple programming environments. We demonstrate PathVisioRPC functionalities using examples in Python. Subsequently, we analyse a publicly available NCBI GEO gene expression dataset studying tumour bearing mice treated with cyclophosphamide in R. The R scripts demonstrate how calls to existing R packages for data processing and calls to PathVisioRPC can directly work together. To further support R users, we have created RPathVisio simplifying the use of PathVisioRPC in this environment. We have also created a pathway module for the microarray data analysis portal ArrayAnalysis.org that calls the PathVisioRPC interface to perform pathway analysis. This module allows users to use PathVisio functionality online without having to download and install the software and exemplifies how the PathVisioRPC interface can be used by data analysis pipelines for functional analysis of processed genomics data. PathVisioRPC enables data visualisation and pathway analysis directly from within various analytical environments used for preliminary analyses. It supports the use of existing pathways from WikiPathways or pathways created using the RPC itself. It also enables automation of tasks performed using PathVisio, making it useful to PathVisio users performing repeated visualisation and analysis tasks. PathVisioRPC is freely available for academic and commercial use at http://projects.bigcat.unimaas.nl/pathvisiorpc.
Data Prospecting Framework - a new approach to explore "big data" in Earth Science
NASA Astrophysics Data System (ADS)
Ramachandran, R.; Rushing, J.; Lin, A.; Kuo, K.
2012-12-01
Due to advances in sensors, computation and storage, cost and effort required to produce large datasets have been significantly reduced. As a result, we are seeing a proliferation of large-scale data sets being assembled in almost every science field, especially in geosciences. Opportunities to exploit the "big data" are enormous as new hypotheses can be generated by combining and analyzing large amounts of data. However, such a data-driven approach to science discovery assumes that scientists can find and isolate relevant subsets from vast amounts of available data. Current Earth Science data systems only provide data discovery through simple metadata and keyword-based searches and are not designed to support data exploration capabilities based on the actual content. Consequently, scientists often find themselves downloading large volumes of data, struggling with large amounts of storage and learning new analysis technologies that will help them separate the wheat from the chaff. New mechanisms of data exploration are needed to help scientists discover the relevant subsets We present data prospecting, a new content-based data analysis paradigm to support data-intensive science. Data prospecting allows the researchers to explore big data in determining and isolating data subsets for further analysis. This is akin to geo-prospecting in which mineral sites of interest are determined over the landscape through screening methods. The resulting "data prospects" only provide an interaction with and feel for the data through first-look analytics; the researchers would still have to download the relevant datasets and analyze them deeply using their favorite analytical tools to determine if the datasets will yield new hypotheses. Data prospecting combines two traditional categories of data analysis, data exploration and data mining within the discovery step. Data exploration utilizes manual/interactive methods for data analysis such as standard statistical analysis and visualization, usually on small datasets. On the other hand, data mining utilizes automated algorithms to extract useful information. Humans guide these automated algorithms and specify algorithm parameters (training samples, clustering size, etc.). Data Prospecting combines these two approaches using high performance computing and the new techniques for efficient distributed file access.
Freud: a software suite for high-throughput simulation analysis
NASA Astrophysics Data System (ADS)
Harper, Eric; Spellings, Matthew; Anderson, Joshua; Glotzer, Sharon
Computer simulation is an indispensable tool for the study of a wide variety of systems. As simulations scale to fill petascale and exascale supercomputing clusters, so too does the size of the data produced, as well as the difficulty in analyzing these data. We present Freud, an analysis software suite for efficient analysis of simulation data. Freud makes no assumptions about the system being analyzed, allowing for general analysis methods to be applied to nearly any type of simulation. Freud includes standard analysis methods such as the radial distribution function, as well as new methods including the potential of mean force and torque and local crystal environment analysis. Freud combines a Python interface with fast, parallel C + + analysis routines to run efficiently on laptops, workstations, and supercomputing clusters. Data analysis on clusters reduces data transfer requirements, a prohibitive cost for petascale computing. Used in conjunction with simulation software, Freud allows for smart simulations that adapt to the current state of the system, enabling the study of phenomena such as nucleation and growth, intelligent investigation of phases and phase transitions, and determination of effective pair potentials.
The integrated analysis capability (IAC Level 2.0)
NASA Technical Reports Server (NTRS)
Frisch, Harold P.; Vos, Robert G.
1988-01-01
The critical data management issues involved in the development of the integral analysis capability (IAC), Level 2, to support the design analysis and performance evaluation of large space structures, are examined. In particular, attention is given to the advantages and disadvantages of the formalized data base; merging of the matrix and relational data concepts; data types, query operators, and data handling; sequential versus direct-access files; local versus global data access; programming languages and host machines; and data flow techniques. The discussion also covers system architecture, recent system level enhancements, executive/user interface capabilities, and technology applications.
Offroy, Marc; Duponchel, Ludovic
2016-03-03
An important feature of experimental science is that data of various kinds is being produced at an unprecedented rate. This is mainly due to the development of new instrumental concepts and experimental methodologies. It is also clear that the nature of acquired data is significantly different. Indeed in every areas of science, data take the form of always bigger tables, where all but a few of the columns (i.e. variables) turn out to be irrelevant to the questions of interest, and further that we do not necessary know which coordinates are the interesting ones. Big data in our lab of biology, analytical chemistry or physical chemistry is a future that might be closer than any of us suppose. It is in this sense that new tools have to be developed in order to explore and valorize such data sets. Topological data analysis (TDA) is one of these. It was developed recently by topologists who discovered that topological concept could be useful for data analysis. The main objective of this paper is to answer the question why topology is well suited for the analysis of big data set in many areas and even more efficient than conventional data analysis methods. Raman analysis of single bacteria should be providing a good opportunity to demonstrate the potential of TDA for the exploration of various spectroscopic data sets considering different experimental conditions (with high noise level, with/without spectral preprocessing, with wavelength shift, with different spectral resolution, with missing data). Copyright © 2016 Elsevier B.V. All rights reserved.
NASA Technical Reports Server (NTRS)
1988-01-01
A flight program was completed in June of 1985 using the Boeing 757 flight research aircraft with an NLF glove installed on the right wing just outboard of the engine. The objectives of this program were to measure noise levels on the wing and to investigate the effect of engine noise on the extent of laminar flow on the glove. Details of the flight test program and results are contained in Volume 1 of this document. Tabulations and plots of the measured data are contained in Volume 2. The present volume contains the results of additional engineering analysis of the data. The latter includes analysis of the measured noise data, a comparison of predicted and measured noise data, a boundary layer stability analysis of 21 flight data cases, and an analysis of the effect of noise on boundary layer transition.
Visual modeling in an analysis of multidimensional data
NASA Astrophysics Data System (ADS)
Zakharova, A. A.; Vekhter, E. V.; Shklyar, A. V.; Pak, A. J.
2018-01-01
The article proposes an approach to solve visualization problems and the subsequent analysis of multidimensional data. Requirements to the properties of visual models, which were created to solve analysis problems, are described. As a perspective direction for the development of visual analysis tools for multidimensional and voluminous data, there was suggested an active use of factors of subjective perception and dynamic visualization. Practical results of solving the problem of multidimensional data analysis are shown using the example of a visual model of empirical data on the current state of studying processes of obtaining silicon carbide by an electric arc method. There are several results of solving this problem. At first, an idea of possibilities of determining the strategy for the development of the domain, secondly, the reliability of the published data on this subject, and changes in the areas of attention of researchers over time.
A case study for cloud based high throughput analysis of NGS data using the globus genomics system
Bhuvaneshwar, Krithika; Sulakhe, Dinanath; Gauba, Robinder; ...
2015-01-01
Next generation sequencing (NGS) technologies produce massive amounts of data requiring a powerful computational infrastructure, high quality bioinformatics software, and skilled personnel to operate the tools. We present a case study of a practical solution to this data management and analysis challenge that simplifies terabyte scale data handling and provides advanced tools for NGS data analysis. These capabilities are implemented using the “Globus Genomics” system, which is an enhanced Galaxy workflow system made available as a service that offers users the capability to process and transfer data easily, reliably and quickly to address end-to-end NGS analysis requirements. The Globus Genomicsmore » system is built on Amazon's cloud computing infrastructure. The system takes advantage of elastic scaling of compute resources to run multiple workflows in parallel and it also helps meet the scale-out analysis needs of modern translational genomics research.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zhou, Ning; Huang, Zhenyu; Tuffner, Francis K.
2010-07-31
Small signal stability problems are one of the major threats to grid stability and reliability. Prony analysis has been successfully applied on ringdown data to monitor electromechanical modes of a power system using phasor measurement unit (PMU) data. To facilitate an on-line application of mode estimation, this paper developed a recursive algorithm for implementing Prony analysis and proposed an oscillation detection method to detect ringdown data in real time. By automatically detect ringdown data, the proposed method helps guarantee that Prony analysis is applied properly and timely on the ringdown data. Thus, the mode estimation results can be performed reliablymore » and timely. The proposed method is tested using Monte Carlo simulations based on a 17-machine model and is shown to be able to properly identify the oscillation data for on-line application of Prony analysis.« less
Analysis of Mars Pathfinder Entry Data, Aerothermal Heating, and Heat Shield Material Response
NASA Technical Reports Server (NTRS)
Milos, Frank; Chen, Y. K.; Tran, H. K.; Rasky, Daniel J. (Technical Monitor)
1997-01-01
The Mars Pathfinder heatshield contained several thermocouples and resistance thermometers. A description of the experiment, the entry data, and analysis of the entry environment and material response is presented. In particular, the analysis addresses uncertainties of the data and the fluid dynamics and material response models. The calculations use the latest trajectory and atmosphere reconstructions for the Pathfinder entry. A modified version of the GIANTS code is used for CFD (computational fluid dynamics) analyses, and FIAT is used for material response. The material response and flowfield are coupled appropriately. Three different material response models are considered. The analysis of Pathfinder entry data for validation of aerothermal heating and material response models is complicated by model uncertainties and unanticipated data-acquisition and processing problems. We will discuss these issues as well as ramifications of the data and analysis for future Mars missions.
IDIMS/GEOPAK: Users manual for a geophysical data display and analysis system
NASA Technical Reports Server (NTRS)
Libert, J. M.
1982-01-01
The application of an existing image analysis system to the display and analysis of geophysical data is described, the potential for expanding the capabilities of such a system toward more advanced computer analytic and modeling functions is investigated. The major features of the IDIMS (Interactive Display and Image Manipulation System) and its applicability for image type analysis of geophysical data are described. Development of a basic geophysical data processing system to permit the image representation, coloring, interdisplay and comparison of geophysical data sets using existing IDIMS functions and to provide for the production of hard copies of processed images was described. An instruction manual and documentation for the GEOPAK subsystem was produced. A training course for personnel in the use of the IDIMS/GEOPAK was conducted. The effectiveness of the current IDIMS/GEOPAK system for geophysical data analysis was evaluated.
NASA Technical Reports Server (NTRS)
Ulbrich, N.; Volden, T.
2018-01-01
Analysis and use of temperature-dependent wind tunnel strain-gage balance calibration data are discussed in the paper. First, three different methods are presented and compared that may be used to process temperature-dependent strain-gage balance data. The first method uses an extended set of independent variables in order to process the data and predict balance loads. The second method applies an extended load iteration equation during the analysis of balance calibration data. The third method uses temperature-dependent sensitivities for the data analysis. Physical interpretations of the most important temperature-dependent regression model terms are provided that relate temperature compensation imperfections and the temperature-dependent nature of the gage factor to sets of regression model terms. Finally, balance calibration recommendations are listed so that temperature-dependent calibration data can be obtained and successfully processed using the reviewed analysis methods.
A case study for cloud based high throughput analysis of NGS data using the globus genomics system
Bhuvaneshwar, Krithika; Sulakhe, Dinanath; Gauba, Robinder; Rodriguez, Alex; Madduri, Ravi; Dave, Utpal; Lacinski, Lukasz; Foster, Ian; Gusev, Yuriy; Madhavan, Subha
2014-01-01
Next generation sequencing (NGS) technologies produce massive amounts of data requiring a powerful computational infrastructure, high quality bioinformatics software, and skilled personnel to operate the tools. We present a case study of a practical solution to this data management and analysis challenge that simplifies terabyte scale data handling and provides advanced tools for NGS data analysis. These capabilities are implemented using the “Globus Genomics” system, which is an enhanced Galaxy workflow system made available as a service that offers users the capability to process and transfer data easily, reliably and quickly to address end-to-endNGS analysis requirements. The Globus Genomics system is built on Amazon 's cloud computing infrastructure. The system takes advantage of elastic scaling of compute resources to run multiple workflows in parallel and it also helps meet the scale-out analysis needs of modern translational genomics research. PMID:26925205
DOE Office of Scientific and Technical Information (OSTI.GOV)
None
Volume VIII of the documentation for the Phase I Data Analysis Task performed in support of the current Regional Flow Model, Transport Model, and Risk Assessment for the Nevada Test Site Underground Test Area Subproject contains the risk assessment documentation. Because of the size and complexity of the model area, a considerable quantity of data was collected and analyzed in support of the modeling efforts. The data analysis task was consequently broken into eight subtasks, and descriptions of each subtask's activities are contained in one of the eight volumes that comprise the Phase I Data Analysis Documentation.
DOE Office of Scientific and Technical Information (OSTI.GOV)
None
Volume VII of the documentation for the Phase I Data Analysis Task performed in support of the current Regional Flow Model, Transport Model, and Risk Assessment for the Nevada Test Site Underground Test Area Subproject contains the tritium transport model documentation. Because of the size and complexity of the model area, a considerable quantity of data was collected and analyzed in support of the modeling efforts. The data analysis task was consequently broken into eight subtasks, and descriptions of each subtask's activities are contained in one of the eight volumes that comprise the Phase I Data Analysis Documentation.
Landsat analysis of tropical forest succession employing a terrain model
NASA Technical Reports Server (NTRS)
Barringer, T. H.; Robinson, V. B.; Coiner, J. C.; Bruce, R. C.
1980-01-01
Landsat multispectral scanner (MSS) data have yielded a dual classification of rain forest and shadow in an analysis of a semi-deciduous forest on Mindonoro Island, Philippines. Both a spatial terrain model, using a fifth side polynomial trend surface analysis for quantitatively estimating the general spatial variation in the data set, and a spectral terrain model, based on the MSS data, have been set up. A discriminant analysis, using both sets of data, has suggested that shadowing effects may be due primarily to local variations in the spectral regions and can therefore be compensated for through the decomposition of the spatial variation in both elevation and MSS data.
Tensorial extensions of independent component analysis for multisubject FMRI analysis.
Beckmann, C F; Smith, S M
2005-03-01
We discuss model-free analysis of multisubject or multisession FMRI data by extending the single-session probabilistic independent component analysis model (PICA; Beckmann and Smith, 2004. IEEE Trans. on Medical Imaging, 23 (2) 137-152) to higher dimensions. This results in a three-way decomposition that represents the different signals and artefacts present in the data in terms of their temporal, spatial, and subject-dependent variations. The technique is derived from and compared with parallel factor analysis (PARAFAC; Harshman and Lundy, 1984. In Research methods for multimode data analysis, chapter 5, pages 122-215. Praeger, New York). Using simulated data as well as data from multisession and multisubject FMRI studies we demonstrate that the tensor PICA approach is able to efficiently and accurately extract signals of interest in the spatial, temporal, and subject/session domain. The final decompositions improve upon PARAFAC results in terms of greater accuracy, reduced interference between the different estimated sources (reduced cross-talk), robustness (against deviations of the data from modeling assumptions and against overfitting), and computational speed. On real FMRI 'activation' data, the tensor PICA approach is able to extract plausible activation maps, time courses, and session/subject modes as well as provide a rich description of additional processes of interest such as image artefacts or secondary activation patterns. The resulting data decomposition gives simple and useful representations of multisubject/multisession FMRI data that can aid the interpretation and optimization of group FMRI studies beyond what can be achieved using model-based analysis techniques.
NASA Astrophysics Data System (ADS)
Shiklomanov, A. I.; Proussevitch, A. A.; Gordov, E. P.; Okladnikov, I.; Titov, A. G.
2016-12-01
The volume of georeferenced datasets used for hydrology and climate research is growing immensely due to recent advances in modeling, high performance computers, and sensor networks, as well as initiation of a set of large scale complex global and regional monitoring experiments. To facilitate the management and analysis of these extensive data pools we developed Web-based data management, visualization, and analysis system - RIMS - http://earthatlas.sr.unh.edu/ (Rapid Integrated Mapping and Analysis System) with a focus on hydrological applications. Recently, under collaboration with Russian colleagues from the Institute of Monitoring of Climatic and Ecological Systems SB RAS, Russia, we significantly re-designed the RIMS to include the latest Web and GIS technologies in compliance with the Open Geospatial Consortium (OGC) standards. An upgraded RIMS can be successfully applied to address multiple research problems using an extensive data archive and embedded tools for data computations, visualizations and distributions. We will demonstrate current possibility of the system providing several results of applied data analysis fulfilled for territory of the Northern Eurasia. These results will include the analysis of historical, contemporary and future changes in climate and hydrology based on station and gridded data, investigations of recent extreme hydrological events, their anomalies, causes and potential impacts, and creation and analysis of new data sets through integration of social and geophysical data.
A compilation and analysis of helicopter handling qualities data. Volume 2: Data analysis
NASA Technical Reports Server (NTRS)
Heffley, R. K.
1979-01-01
A compilation and an analysis of helicopter handling qualities data are presented. Multiloop manual control methods are used to analyze the descriptive data, stability derivatives, and transfer functions for a six degrees of freedom, quasi static model. A compensatory loop structure is applied to coupled longitudinal, lateral and directional equations in such a way that key handling qualities features are examined directly.
High volume data storage architecture analysis
NASA Technical Reports Server (NTRS)
Malik, James M.
1990-01-01
A High Volume Data Storage Architecture Analysis was conducted. The results, presented in this report, will be applied to problems of high volume data requirements such as those anticipated for the Space Station Control Center. High volume data storage systems at several different sites were analyzed for archive capacity, storage hierarchy and migration philosophy, and retrieval capabilities. Proposed architectures were solicited from the sites selected for in-depth analysis. Model architectures for a hypothetical data archiving system, for a high speed file server, and for high volume data storage are attached.
NASA Technical Reports Server (NTRS)
Grew, G. W.
1985-01-01
Characteristic vector analysis applied to inflection ratio spectra is a new approach to analyzing spectral data. The technique applied to remote data collected with the multichannel ocean color sensor (MOCS), a passive sensor, simultaneously maps the distribution of two different phytopigments, chlorophyll alpha and phycoerythrin, the ocean. The data set presented is from a series of warm core ring missions conducted during 1982. The data compare favorably with a theoretical model and with data collected on the same mission by an active sensor, the airborne oceanographic lidar (AOL).
NASA Technical Reports Server (NTRS)
Berrios, William M.
1990-01-01
A post flight mission thermal environment for the Long Duration Exposure Facility was created as part of the thermal analysis data reduction effort. The data included herein is the thermal parameter data used in the calculation of boundary temperatures. This boundary temperature data is to be released in the near future for use by the LDEF principal investigators in the final analysis of their particular experiment temperatures. Also included is the flight temperature data as recorded by the LDEF Thermal Measurements System (THERM) for the first 90 days of flight.
NASA Astrophysics Data System (ADS)
Khodachenko, Maxim; Miller, Steven; Stoeckler, Robert; Topf, Florian
2010-05-01
Computational modeling and observational data analysis are two major aspects of the modern scientific research. Both appear nowadays under extensive development and application. Many of the scientific goals of planetary space missions require robust models of planetary objects and environments as well as efficient data analysis algorithms, to predict conditions for mission planning and to interpret the experimental data. Europe has great strength in these areas, but it is insufficiently coordinated; individual groups, models, techniques and algorithms need to be coupled and integrated. Existing level of scientific cooperation and the technical capabilities for operative communication, allow considerable progress in the development of a distributed international Research Infrastructure (RI) which is based on the existing in Europe computational modelling and data analysis centers, providing the scientific community with dedicated services in the fields of their computational and data analysis expertise. These services will appear as a product of the collaborative communication and joint research efforts of the numerical and data analysis experts together with planetary scientists. The major goal of the EUROPLANET-RI / EMDAF is to make computational models and data analysis algorithms associated with particular national RIs and teams, as well as their outputs, more readily available to their potential user community and more tailored to scientific user requirements, without compromising front-line specialized research on model and data analysis algorithms development and software implementation. This objective will be met through four keys subdivisions/tasks of EMAF: 1) an Interactive Catalogue of Planetary Models; 2) a Distributed Planetary Modelling Laboratory; 3) a Distributed Data Analysis Laboratory, and 4) enabling Models and Routines for High Performance Computing Grids. Using the advantages of the coordinated operation and efficient communication between the involved computational modelling, research and data analysis expert teams and their related research infrastructures, EMDAF will provide a 1) flexible, 2) scientific user oriented, 3) continuously developing and fast upgrading computational and data analysis service to support and intensify the European planetary scientific research. At the beginning EMDAF will create a set of demonstrators and operational tests of this service in key areas of European planetary science. This work will aim at the following objectives: (a) Development and implementation of tools for distant interactive communication between the planetary scientists and computing experts (including related RIs); (b) Development of standard routine packages, and user-friendly interfaces for operation of the existing numerical codes and data analysis algorithms by the specialized planetary scientists; (c) Development of a prototype of numerical modelling services "on demand" for space missions and planetary researchers; (d) Development of a prototype of data analysis services "on demand" for space missions and planetary researchers; (e) Development of a prototype of coordinated interconnected simulations of planetary phenomena and objects (global multi-model simulators); (f) Providing the demonstrators of a coordinated use of high performance computing facilities (super-computer networks), done in cooperation with European HPC Grid DEISA.
Stanzel, Sven; Weimer, Marc; Kopp-Schneider, Annette
2013-06-01
High-throughput screening approaches are carried out for the toxicity assessment of a large number of chemical compounds. In such large-scale in vitro toxicity studies several hundred or thousand concentration-response experiments are conducted. The automated evaluation of concentration-response data using statistical analysis scripts saves time and yields more consistent results in comparison to data analysis performed by the use of menu-driven statistical software. Automated statistical analysis requires that concentration-response data are available in a standardised data format across all compounds. To obtain consistent data formats, a standardised data management workflow must be established, including guidelines for data storage, data handling and data extraction. In this paper two procedures for data management within large-scale toxicological projects are proposed. Both procedures are based on Microsoft Excel files as the researcher's primary data format and use a computer programme to automate the handling of data files. The first procedure assumes that data collection has not yet started whereas the second procedure can be used when data files already exist. Successful implementation of the two approaches into the European project ACuteTox is illustrated. Copyright © 2012 Elsevier Ltd. All rights reserved.
Automatic analysis of nuclear-magnetic-resonance-spectroscopy clinical research data
NASA Astrophysics Data System (ADS)
Scott, Katherine N.; Wilson, David C.; Bruner, Angela P.; Lyles, Teresa A.; Underhill, Brandon; Geiser, Edward A.; Ballinger, J. Ray; Scott, James D.; Stopka, Christine B.
1998-03-01
A major problem of P-31 nuclear magnetic spectroscopy (MRS) in vivo applications is that when large data sets are acquired, the time invested in data reduction and analysis with currently available technologies may totally overshadow the time required for data acquisition. An example is out MRS monitoring of exercise therapy for patients with peripheral vascular disease. In these, the spectral acquisition requires 90 minutes per patient study, whereas data analysis and reduction requires 6-8 hours. Our laboratory currently uses the proprietary software SA/GE developed by General Electric. However, other software packages have similar limitations. When data analysis takes this long, the researcher does not have the rapid feedback required to ascertain the quality of data acquired nor the result of the study. This highly undesirable even in a research environment, but becomes intolerable in the clinical setting. The purpose of this report is to outline progress towards the development of an automated method for eliminating the spectral analysis burden on the researcher working in the clinical setting.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zhou, Ning; Huang, Zhenyu; Tuffner, Francis K.
2010-02-28
Small signal stability problems are one of the major threats to grid stability and reliability. Prony analysis has been successfully applied on ringdown data to monitor electromechanical modes of a power system using phasor measurement unit (PMU) data. To facilitate an on-line application of mode estimation, this paper develops a recursive algorithm for implementing Prony analysis and proposed an oscillation detection method to detect ringdown data in real time. By automatically detecting ringdown data, the proposed method helps guarantee that Prony analysis is applied properly and timely on the ringdown data. Thus, the mode estimation results can be performed reliablymore » and timely. The proposed method is tested using Monte Carlo simulations based on a 17-machine model and is shown to be able to properly identify the oscillation data for on-line application of Prony analysis. In addition, the proposed method is applied to field measurement data from WECC to show the performance of the proposed algorithm.« less
In situ visualization and data analysis for turbidity currents simulation
NASA Astrophysics Data System (ADS)
Camata, Jose J.; Silva, Vítor; Valduriez, Patrick; Mattoso, Marta; Coutinho, Alvaro L. G. A.
2018-01-01
Turbidity currents are underflows responsible for sediment deposits that generate geological formations of interest for the oil and gas industry. LibMesh-sedimentation is an application built upon the libMesh library to simulate turbidity currents. In this work, we present the integration of libMesh-sedimentation with in situ visualization and in transit data analysis tools. DfAnalyzer is a solution based on provenance data to extract and relate strategic simulation data in transit from multiple data for online queries. We integrate libMesh-sedimentation and ParaView Catalyst to perform in situ data analysis and visualization. We present a parallel performance analysis for two turbidity currents simulations showing that the overhead for both in situ visualization and in transit data analysis is negligible. We show that our tools enable monitoring the sediments appearance at runtime and steer the simulation based on the solver convergence and visual information on the sediment deposits, thus enhancing the analytical power of turbidity currents simulations.
Flow Cytometry Data Preparation Guidelines for Improved Automated Phenotypic Analysis.
Jimenez-Carretero, Daniel; Ligos, José M; Martínez-López, María; Sancho, David; Montoya, María C
2018-05-15
Advances in flow cytometry (FCM) increasingly demand adoption of computational analysis tools to tackle the ever-growing data dimensionality. In this study, we tested different data input modes to evaluate how cytometry acquisition configuration and data compensation procedures affect the performance of unsupervised phenotyping tools. An analysis workflow was set up and tested for the detection of changes in reference bead subsets and in a rare subpopulation of murine lymph node CD103 + dendritic cells acquired by conventional or spectral cytometry. Raw spectral data or pseudospectral data acquired with the full set of available detectors by conventional cytometry consistently outperformed datasets acquired and compensated according to FCM standards. Our results thus challenge the paradigm of one-fluorochrome/one-parameter acquisition in FCM for unsupervised cluster-based analysis. Instead, we propose to configure instrument acquisition to use all available fluorescence detectors and to avoid integration and compensation procedures, thereby using raw spectral or pseudospectral data for improved automated phenotypic analysis. Copyright © 2018 by The American Association of Immunologists, Inc.
Short-Arc Analysis of Intersatellite Tracking Data in a Gravity Mapping Mission
NASA Technical Reports Server (NTRS)
Rowlands, David D.; Ray, Richard D.; Chinn, Douglas S.; Lemoine, Frank G.; Smith, David E. (Technical Monitor)
2001-01-01
A technique for the analysis of low-low intersatellite range-rate data in a gravity mapping mission is explored. The technique is based on standard tracking data analysis for orbit determination but uses a spherical coordinate representation of the 12 epoch state parameters describing the baseline between the two satellites. This representation of the state parameters is exploited to allow the intersatellite range-rate analysis to benefit from information provided by other tracking data types without large simultaneous multiple data type solutions. The technique appears especially valuable for estimating gravity from short arcs (e.g., less than 15 minutes) of data. Gravity recovery simulations which use short arcs are compared with those using arcs a day in length. For a high-inclination orbit, the short-arc analysis recovers low-order gravity coefficients remarkably well, although higher order terms, especially sectorial terms, are less accurate. Simulations suggest that either long or short arcs of GRACE data are likely to improve parts of the geopotential spectrum by orders of magnitude.
BESIU Physical Analysis on Hadoop Platform
NASA Astrophysics Data System (ADS)
Huo, Jing; Zang, Dongsong; Lei, Xiaofeng; Li, Qiang; Sun, Gongxing
2014-06-01
In the past 20 years, computing cluster has been widely used for High Energy Physics data processing. The jobs running on the traditional cluster with a Data-to-Computing structure, have to read large volumes of data via the network to the computing nodes for analysis, thereby making the I/O latency become a bottleneck of the whole system. The new distributed computing technology based on the MapReduce programming model has many advantages, such as high concurrency, high scalability and high fault tolerance, and it can benefit us in dealing with Big Data. This paper brings the idea of using MapReduce model to do BESIII physical analysis, and presents a new data analysis system structure based on Hadoop platform, which not only greatly improve the efficiency of data analysis, but also reduces the cost of system building. Moreover, this paper establishes an event pre-selection system based on the event level metadata(TAGs) database to optimize the data analyzing procedure.
An Integrated Analysis of the Physiological Effects of Space Flight: Executive Summary
NASA Technical Reports Server (NTRS)
Leonard, J. I.
1985-01-01
A large array of models were applied in a unified manner to solve problems in space flight physiology. Mathematical simulation was used as an alternative way of looking at physiological systems and maximizing the yield from previous space flight experiments. A medical data analysis system was created which consist of an automated data base, a computerized biostatistical and data analysis system, and a set of simulation models of physiological systems. Five basic models were employed: (1) a pulsatile cardiovascular model; (2) a respiratory model; (3) a thermoregulatory model; (4) a circulatory, fluid, and electrolyte balance model; and (5) an erythropoiesis regulatory model. Algorithms were provided to perform routine statistical tests, multivariate analysis, nonlinear regression analysis, and autocorrelation analysis. Special purpose programs were prepared for rank correlation, factor analysis, and the integration of the metabolic balance data.
NASA standard: Trend analysis techniques
NASA Technical Reports Server (NTRS)
1990-01-01
Descriptive and analytical techniques for NASA trend analysis applications are presented in this standard. Trend analysis is applicable in all organizational elements of NASA connected with, or supporting, developmental/operational programs. This document should be consulted for any data analysis activity requiring the identification or interpretation of trends. Trend analysis is neither a precise term nor a circumscribed methodology: it generally connotes quantitative analysis of time-series data. For NASA activities, the appropriate and applicable techniques include descriptive and graphical statistics, and the fitting or modeling of data by linear, quadratic, and exponential models. Usually, but not always, the data is time-series in nature. Concepts such as autocorrelation and techniques such as Box-Jenkins time-series analysis would only rarely apply and are not included in this document. The basic ideas needed for qualitative and quantitative assessment of trends along with relevant examples are presented.
Planning, Conducting, and Documenting Data Analysis for Program Improvement
ERIC Educational Resources Information Center
Winer, Abby; Taylor, Cornelia; Derrington, Taletha; Lucas, Anne
2015-01-01
This 2015 document was developed to help technical assistance (TA) providers and state staff define and limit the scope of data analysis for program improvement efforts, including the State Systemic Improvement Plan (SSIP); develop a plan for data analysis; document alternative hypotheses and additional analyses as they are generated; and…
Textbooks for Responsible Data Analysis in Excel
ERIC Educational Resources Information Center
Garrett, Nathan
2015-01-01
With 27 million users, Excel (Microsoft Corporation, Seattle, WA) is the most common business data analysis software. However, audits show that almost all complex spreadsheets have errors. The author examined textbooks to understand why responsible data analysis is taught. A purposeful sample of 10 textbooks was coded, and then compared against…
DOE Office of Scientific and Technical Information (OSTI.GOV)
Vixie, Kevin R.
This is the final report for the project "Geometric Analysis for Data Reduction and Structure Discovery" in which insights and tools from geometric analysis were developed and exploited for their potential to large scale data challenges.
Examining Benefits of Dedicated Funding and Process Improvement for Depot Level Technology Insertion
2010-06-17
analysis of tabulated data. This method is also supported by Miles and Huberman (1994) as they list six approaches to case study data analysis, two... Miles , Matthew, B. Huberman , Michael. (1994). Qualitative Data Analysis: An Expanded Sourcebook. Thousand Oaks: Sage Publications. 12. Mohr, Jakki
Cycles till failure of silver-zinc cells with competing failure modes - Preliminary data analysis
NASA Technical Reports Server (NTRS)
Sidik, S. M.; Leibecki, H. F.; Bozek, J. M.
1980-01-01
The data analysis of cycles to failure of silver-zinc electrochemical cells with competing failure modes is presented. The test ran 129 cells through charge-discharge cycles until failure; preliminary data analysis consisted of response surface estimate of life. Batteries fail through low voltage condition and an internal shorting condition; a competing failure modes analysis was made using maximum likelihood estimation for the extreme value life distribution. Extensive residual plotting and probability plotting were used to verify data quality and selection of model.
Data analysis using a combination of independent component analysis and empirical mode decomposition
NASA Astrophysics Data System (ADS)
Lin, Shih-Lin; Tung, Pi-Cheng; Huang, Norden E.
2009-06-01
A combination of independent component analysis and empirical mode decomposition (ICA-EMD) is proposed in this paper to analyze low signal-to-noise ratio data. The advantages of ICA-EMD combination are these: ICA needs few sensory clues to separate the original source from unwanted noise and EMD can effectively separate the data into its constituting parts. The case studies reported here involve original sources contaminated by white Gaussian noise. The simulation results show that the ICA-EMD combination is an effective data analysis tool.
A design for a ground-based data management system
NASA Technical Reports Server (NTRS)
Lambird, Barbara A.; Lavine, David
1988-01-01
An initial design for a ground-based data management system which includes intelligent data abstraction and cataloging is described. The large quantity of data on some current and future NASA missions leads to significant problems in providing scientists with quick access to relevant data. Human screening of data for potential relevance to a particular study is time-consuming and costly. Intelligent databases can provide automatic screening when given relevent scientific parameters and constraints. The data management system would provide, at a minimum, information of availability of the range of data, the type available, specific time periods covered together with data quality information, and related sources of data. The system would inform the user about the primary types of screening, analysis, and methods of presentation available to the user. The system would then aid the user with performing the desired tasks, in such a way that the user need only specify the scientific parameters and objectives, and not worry about specific details for running a particular program. The design contains modules for data abstraction, catalog plan abstraction, a user-friendly interface, and expert systems for data handling, data evaluation, and application analysis. The emphasis is on developing general facilities for data representation, description, analysis, and presentation that will be easily used by scientists directly, thus bypassing the knowledge acquisition bottleneck. Expert system technology is used for many different aspects of the data management system, including the direct user interface, the interface to the data analysis routines, and the analysis of instrument status.
Analysis of Developmental Data: Comparison Among Alternative Methods
ERIC Educational Resources Information Center
Wilson, Ronald S.
1975-01-01
To examine the ability of the correction factor epsilon to counteract statistical bias in univariate analysis, an analysis of variance (adjusted by epsilon) and a multivariate analysis of variance were performed on the same data. The results indicated that univariate analysis is a fully protected design when used with epsilon. (JMB)
NASA Technical Reports Server (NTRS)
Smith, Peter L. (Editor); Wiese, Wolfgang L. (Editor)
1992-01-01
The present volume on atomic and molecular spectroscopic data for space astrophysics discusses scientific problems and laboratory data needs associated with the Hubble Space Telescope, atomic data needed for far ultraviolet astronomy with HUT and FUSE and for analysis of EUV and X-ray spectra, and data for observations of interstellar medium with the Hubble Space Telescope. Attention is also given to atomic and molecular data for analysis of IR spectra from ISO and SIRTF, atomic data from the opacity project, sources of atomic spectroscopic data for astrophysics, and summary of current molecular data bases.
Description of a user-oriented geographic information system - The resource analysis program
NASA Technical Reports Server (NTRS)
Tilmann, S. E.; Mokma, D. L.
1980-01-01
This paper describes the Resource Analysis Program, an applied geographic information system. Several applications are presented which utilized soil, and other natural resource data, to develop integrated maps and data analyses. These applications demonstrate the methods of analysis and the philosophy of approach used in the mapping system. The applications are evaluated in reference to four major needs of a functional mapping system: data capture, data libraries, data analysis, and mapping and data display. These four criteria are then used to describe an effort to develop the next generation of applied mapping systems. This approach uses inexpensive microcomputers for field applications and should prove to be a viable entry point for users heretofore unable or unwilling to venture into applied computer mapping.
Magnetic Field Experiment Data Analysis System
NASA Technical Reports Server (NTRS)
Holland, D. B.; Zanetti, L. J.; Suther, L. L.; Potemra, T. A.; Anderson, B. J.
1995-01-01
The Johns Hopkins University Applied Physics Laboratory (JHU/APL) Magnetic Field Experiment Data Analysis System (MFEDAS) has been developed to process and analyze satellite magnetic field experiment data from the TRIAD, MAGSAT, AMPTE/CCE, Viking, Polar BEAR, DMSP, HILAT, UARS, and Freja satellites. The MFEDAS provides extensive data management and analysis capabilities. The system is based on standard data structures and a standard user interface. The MFEDAS has two major elements: (1) a set of satellite unique telemetry processing programs for uniform and rapid conversion of the raw data to a standard format and (2) the program Magplot which has file handling, data analysis, and data display sections. This system is an example of software reuse, allowing new data sets and software extensions to be added in a cost effective and timely manner. Future additions to the system will include the addition of standard format file import routines, modification of the display routines to use a commercial graphics package based on X-Window protocols, and a generic utility for telemetry data access and conversion.
Depth data research of GIS based on clustering analysis algorithm
NASA Astrophysics Data System (ADS)
Xiong, Yan; Xu, Wenli
2018-03-01
The data of GIS have spatial distribution. Geographic data has both spatial characteristics and attribute characteristics, and also changes with time. Therefore, the amount of data is very large. Nowadays, many industries and departments in the society are using GIS. However, without proper data analysis and mining scheme, GIS will not exert its maximum effectiveness and will waste a lot of data. In this paper, we use the geographic information demand of a national security department as the experimental object, combining the characteristics of GIS data, taking into account the characteristics of time, space, attributes and so on, and using cluster analysis algorithm. We further study the mining scheme for depth data, and get the algorithm model. This algorithm can automatically classify sample data, and then carry out exploratory analysis. The research shows that the algorithm model and the information mining scheme can quickly find hidden depth information from the surface data of GIS, thus improving the efficiency of the security department. This algorithm can also be extended to other fields.
Developing web-based data analysis tools for precision farming using R and Shiny
NASA Astrophysics Data System (ADS)
Jahanshiri, Ebrahim; Mohd Shariff, Abdul Rashid
2014-06-01
Technologies that are set to increase the productivity of agricultural practices require more and more data. Nevertheless, farming data is also being increasingly cheap to collect and maintain. Bulk of data that are collected by the sensors and samples need to be analysed in an efficient and transparent manner. Web technologies have long being used to develop applications that can assist the farmers and managers. However until recently, analysing the data in an online environment has not been an easy task especially in the eyes of data analysts. This barrier is now overcome by the availability of new application programming interfaces that can provide real-time web based data analysis. In this paper developing a prototype web based application for data analysis using new facilities in R statistical package and its web development facility, Shiny is explored. The pros and cons of this type of data analysis environment for precision farming are enumerated and future directions in web application development for agricultural data are discussed.
An integrated GIS application system for soil moisture data assimilation
NASA Astrophysics Data System (ADS)
Wang, Di; Shen, Runping; Huang, Xiaolong; Shi, Chunxiang
2014-11-01
The gaps in knowledge and existing challenges in precisely describing the land surface process make it critical to represent the massive soil moisture data visually and mine the data for further research.This article introduces a comprehensive soil moisture assimilation data analysis system, which is instructed by tools of C#, IDL, ArcSDE, Visual Studio 2008 and SQL Server 2005. The system provides integrated service, management of efficient graphics visualization and analysis of land surface data assimilation. The system is not only able to improve the efficiency of data assimilation management, but also comprehensively integrate the data processing and analysis tools into GIS development environment. So analyzing the soil moisture assimilation data and accomplishing GIS spatial analysis can be realized in the same system. This system provides basic GIS map functions, massive data process and soil moisture products analysis etc. Besides,it takes full advantage of a spatial data engine called ArcSDE to effeciently manage, retrieve and store all kinds of data. In the system, characteristics of temporal and spatial pattern of soil moiture will be plotted. By analyzing the soil moisture impact factors, it is possible to acquire the correlation coefficients between soil moisture value and its every single impact factor. Daily and monthly comparative analysis of soil moisture products among observations, simulation results and assimilations can be made in this system to display the different trends of these products. Furthermore, soil moisture map production function is realized for business application.
Redman-MacLaren, Michelle; Mills, Jane; Tommbe, Rachael
2014-01-01
Participatory approaches to qualitative research practice constantly change in response to evolving research environments. Researchers are increasingly encouraged to undertake secondary analysis of qualitative data, despite epistemological and ethical challenges. Interpretive focus groups can be described as a more participative method for groups to analyse qualitative data. To facilitate interpretive focus groups with women in Papua New Guinea to extend analysis of existing qualitative data and co-create new primary data. The purpose of this was to inform a transformational grounded theory and subsequent health promoting action. A two-step approach was used in a grounded theory study about how women experience male circumcision in Papua New Guinea. Participants analysed portions or 'chunks' of existing qualitative data in story circles and built upon this analysis by using the visual research method of storyboarding. New understandings of the data were evoked when women in interpretive focus groups analysed the data 'chunks'. Interpretive focus groups encouraged women to share their personal experiences about male circumcision. The visual method of storyboarding enabled women to draw pictures to represent their experiences. This provided an additional focus for whole-of-group discussions about the research topic. Interpretive focus groups offer opportunity to enhance trustworthiness of findings when researchers undertake secondary analysis of qualitative data. The co-analysis of existing data and co-generation of new data between research participants and researchers informed an emergent transformational grounded theory and subsequent health promoting action.
Redman-MacLaren, Michelle; Mills, Jane; Tommbe, Rachael
2014-01-01
Background Participatory approaches to qualitative research practice constantly change in response to evolving research environments. Researchers are increasingly encouraged to undertake secondary analysis of qualitative data, despite epistemological and ethical challenges. Interpretive focus groups can be described as a more participative method for groups to analyse qualitative data. Objective To facilitate interpretive focus groups with women in Papua New Guinea to extend analysis of existing qualitative data and co-create new primary data. The purpose of this was to inform a transformational grounded theory and subsequent health promoting action. Design A two-step approach was used in a grounded theory study about how women experience male circumcision in Papua New Guinea. Participants analysed portions or ‘chunks’ of existing qualitative data in story circles and built upon this analysis by using the visual research method of storyboarding. Results New understandings of the data were evoked when women in interpretive focus groups analysed the data ‘chunks’. Interpretive focus groups encouraged women to share their personal experiences about male circumcision. The visual method of storyboarding enabled women to draw pictures to represent their experiences. This provided an additional focus for whole-of-group discussions about the research topic. Conclusions Interpretive focus groups offer opportunity to enhance trustworthiness of findings when researchers undertake secondary analysis of qualitative data. The co-analysis of existing data and co-generation of new data between research participants and researchers informed an emergent transformational grounded theory and subsequent health promoting action. PMID:25138532
Brix, Tobias Johannes; Bruland, Philipp; Sarfraz, Saad; Ernsting, Jan; Neuhaus, Philipp; Storck, Michael; Doods, Justin; Ständer, Sonja; Dugas, Martin
2018-01-01
A required step for presenting results of clinical studies is the declaration of participants demographic and baseline characteristics as claimed by the FDAAA 801. The common workflow to accomplish this task is to export the clinical data from the used electronic data capture system and import it into statistical software like SAS software or IBM SPSS. This software requires trained users, who have to implement the analysis individually for each item. These expenditures may become an obstacle for small studies. Objective of this work is to design, implement and evaluate an open source application, called ODM Data Analysis, for the semi-automatic analysis of clinical study data. The system requires clinical data in the CDISC Operational Data Model format. After uploading the file, its syntax and data type conformity of the collected data is validated. The completeness of the study data is determined and basic statistics, including illustrative charts for each item, are generated. Datasets from four clinical studies have been used to evaluate the application's performance and functionality. The system is implemented as an open source web application (available at https://odmanalysis.uni-muenster.de) and also provided as Docker image which enables an easy distribution and installation on local systems. Study data is only stored in the application as long as the calculations are performed which is compliant with data protection endeavors. Analysis times are below half an hour, even for larger studies with over 6000 subjects. Medical experts have ensured the usefulness of this application to grant an overview of their collected study data for monitoring purposes and to generate descriptive statistics without further user interaction. The semi-automatic analysis has its limitations and cannot replace the complex analysis of statisticians, but it can be used as a starting point for their examination and reporting.
QUAGOL: a guide for qualitative data analysis.
Dierckx de Casterlé, Bernadette; Gastmans, Chris; Bryon, Els; Denier, Yvonne
2012-03-01
Data analysis is a complex and contested part of the qualitative research process, which has received limited theoretical attention. Researchers are often in need of useful instructions or guidelines on how to analyze the mass of qualitative data, but face the lack of clear guidance for using particular analytic methods. The aim of this paper is to propose and discuss the Qualitative Analysis Guide of Leuven (QUAGOL), a guide that was developed in order to be able to truly capture the rich insights of qualitative interview data. The article describes six major problems researchers are often struggling with during the process of qualitative data analysis. Consequently, the QUAGOL is proposed as a guide to facilitate the process of analysis. Challenges emerged and lessons learned from own extensive experiences with qualitative data analysis within the Grounded Theory Approach, as well as from those of other researchers (as described in the literature), were discussed and recommendations were presented. Strengths and pitfalls of the proposed method were discussed in detail. The Qualitative Analysis Guide of Leuven (QUAGOL) offers a comprehensive method to guide the process of qualitative data analysis. The process consists of two parts, each consisting of five stages. The method is systematic but not rigid. It is characterized by iterative processes of digging deeper, constantly moving between the various stages of the process. As such, it aims to stimulate the researcher's intuition and creativity as optimal as possible. The QUAGOL guide is a theory and practice-based guide that supports and facilitates the process of analysis of qualitative interview data. Although the method can facilitate the process of analysis, it cannot guarantee automatic quality. The skills of the researcher and the quality of the research team remain the most crucial components of a successful process of analysis. Additionally, the importance of constantly moving between the various stages throughout the research process cannot be overstated. Copyright © 2011 Elsevier Ltd. All rights reserved.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Banerjee, Kaushik; Clarity, Justin B; Cumberland, Riley M
This will be licensed via RSICC. A new, integrated data and analysis system has been designed to simplify and automate the performance of accurate and efficient evaluations for characterizing the input to the overall nuclear waste management system -UNF-Storage, Transportation & Disposal Analysis Resource and Data System (UNF-ST&DARDS). A relational database within UNF-ST&DARDS provides a standard means by which UNF-ST&DARDS can succinctly store and retrieve modeling and simulation (M&S) parameters for specific spent nuclear fuel analysis. A library of various analysis model templates provides the ability to communicate the various set of M&S parameters to the most appropriate M&S application.more » Interactive visualization capabilities facilitate data analysis and results interpretation. UNF-ST&DARDS current analysis capabilities include (1) assembly-specific depletion and decay, (2) and spent nuclear fuel cask-specific criticality and shielding. Currently, UNF-ST&DARDS uses SCALE nuclear analysis code system for performing nuclear analysis.« less
NASA Astrophysics Data System (ADS)
Istvan Etesi, Laszlo; Tolbert, K.; Schwartz, R.; Zarro, D.; Dennis, B.; Csillaghy, A.
2010-05-01
In our project "Extending the Virtual Solar Observatory (VSO)” we have combined some of the features available in Solar Software (SSW) to produce an integrated environment for data analysis, supporting the complete workflow from data location, retrieval, preparation, and analysis to creating publication-quality figures. Our goal is an integrated analysis experience in IDL, easy-to-use but flexible enough to allow more sophisticated procedures such as multi-instrument analysis. To that end, we have made the transition from a locally oriented setting where all the analysis is done on the user's computer, to an extended analysis environment where IDL has access to services available on the Internet. We have implemented a form of Cloud Computing that uses the VSO search and a new data retrieval and pre-processing server (PrepServer) that provides remote execution of instrument-specific data preparation. We have incorporated the interfaces to the VSO search and the PrepServer into an IDL widget (SHOW_SYNOP) that provides user-friendly searching and downloading of raw solar data and optionally sends search results for pre-processing to the PrepServer prior to downloading the data. The raw and pre-processed data can be displayed with our plotting suite, PLOTMAN, which can handle different data types (light curves, images, and spectra) and perform basic data operations such as zooming, image overlays, solar rotation, etc. PLOTMAN is highly configurable and suited for visual data analysis and for creating publishable figures. PLOTMAN and SHOW_SYNOP work hand-in-hand for a convenient working environment. Our environment supports a growing number of solar instruments that currently includes RHESSI, SOHO/EIT, TRACE, SECCHI/EUVI, HINODE/XRT, and HINODE/EIS.
Analysis of event data recorder data for vehicle safety improvement
DOT National Transportation Integrated Search
2008-04-01
The Volpe Center performed a comprehensive engineering analysis of Event Data Recorder (EDR) data supplied by the National Highway Traffic Safety Administration (NHTSA) to assess its accuracy and usefulness in crash reconstruction and improvement of ...
Improved Data Analysis Tools for the Thermal Emission Spectrometer
NASA Astrophysics Data System (ADS)
Rodriguez, K.; Laura, J.; Fergason, R.; Bogle, R.
2017-06-01
We plan to stand up three different database systems for testing of a new datastore for MGS TES data allowing for more accessible tools supporting high throughput data analysis on the high-dimensionality hyperspectral data set.
An automated data management/analysis system for space shuttle orbiter tiles. [stress analysis
NASA Technical Reports Server (NTRS)
Giles, G. L.; Ballas, M.
1982-01-01
An engineering data management system was combined with a nonlinear stress analysis program to provide a capability for analyzing a large number of tiles on the space shuttle orbiter. Tile geometry data and all data necessary of define the tile loads environment accessed automatically as needed for the analysis of a particular tile or a set of tiles. User documentation provided includes: (1) description of computer programs and data files contained in the system; (2) definitions of all engineering data stored in the data base; (3) characteristics of the tile anaytical model; (4) instructions for preparation of user input; and (5) a sample problem to illustrate use of the system. Description of data, computer programs, and analytical models of the tile are sufficiently detailed to guide extension of the system to include additional zones of tiles and/or additional types of analyses
Large data series: Modeling the usual to identify the unusual
DOE Office of Scientific and Technical Information (OSTI.GOV)
Downing, D.J.; Fedorov, V.V.; Lawkins, W.F.
{open_quotes}Standard{close_quotes} approaches such as regression analysis, Fourier analysis, Box-Jenkins procedure, et al., which handle a data series as a whole, are not useful for very large data sets for at least two reasons. First, even with computer hardware available today, including parallel processors and storage devices, there are no effective means for manipulating and analyzing gigabyte, or larger, data files. Second, in general it can not be assumed that a very large data set is {open_quotes}stable{close_quotes} by the usual measures, like homogeneity, stationarity, and ergodicity, that standard analysis techniques require. Both reasons dictate the necessity to use {open_quotes}local{close_quotes} data analysismore » methods whereby the data is segmented and ordered, where order leads to a sense of {open_quotes}neighbor,{close_quotes} and then analyzed segment by segment. The idea of local data analysis is central to the study reported here.« less
Mayo, Charles; Conners, Steve; Warren, Christopher; Miller, Robert; Court, Laurence; Popple, Richard
2013-01-01
Purpose: With emergence of clinical outcomes databases as tools utilized routinely within institutions, comes need for software tools to support automated statistical analysis of these large data sets and intrainstitutional exchange from independent federated databases to support data pooling. In this paper, the authors present a design approach and analysis methodology that addresses both issues. Methods: A software application was constructed to automate analysis of patient outcomes data using a wide range of statistical metrics, by combining use of C#.Net and R code. The accuracy and speed of the code was evaluated using benchmark data sets. Results: The approach provides data needed to evaluate combinations of statistical measurements for ability to identify patterns of interest in the data. Through application of the tools to a benchmark data set for dose-response threshold and to SBRT lung data sets, an algorithm was developed that uses receiver operator characteristic curves to identify a threshold value and combines use of contingency tables, Fisher exact tests, Welch t-tests, and Kolmogorov-Smirnov tests to filter the large data set to identify values demonstrating dose-response. Kullback-Leibler divergences were used to provide additional confirmation. Conclusions: The work demonstrates the viability of the design approach and the software tool for analysis of large data sets. PMID:24320426
Mayo, Charles; Conners, Steve; Warren, Christopher; Miller, Robert; Court, Laurence; Popple, Richard
2013-11-01
With emergence of clinical outcomes databases as tools utilized routinely within institutions, comes need for software tools to support automated statistical analysis of these large data sets and intrainstitutional exchange from independent federated databases to support data pooling. In this paper, the authors present a design approach and analysis methodology that addresses both issues. A software application was constructed to automate analysis of patient outcomes data using a wide range of statistical metrics, by combining use of C#.Net and R code. The accuracy and speed of the code was evaluated using benchmark data sets. The approach provides data needed to evaluate combinations of statistical measurements for ability to identify patterns of interest in the data. Through application of the tools to a benchmark data set for dose-response threshold and to SBRT lung data sets, an algorithm was developed that uses receiver operator characteristic curves to identify a threshold value and combines use of contingency tables, Fisher exact tests, Welch t-tests, and Kolmogorov-Smirnov tests to filter the large data set to identify values demonstrating dose-response. Kullback-Leibler divergences were used to provide additional confirmation. The work demonstrates the viability of the design approach and the software tool for analysis of large data sets.
Spinal Cord Injury-Induced Dysautonomia via Plasticity in Paravertebral Sympathetic Postganglionic
2015-10-01
for future study Data analysis and publications Major Task 3: Data analysis and publications months % completion/ Completion dates Subtask 1...Data analysis 6-36 25% Subtask 2: Manuscript writing and submission 24-36 10% Milestone(s) Achieved: Dissemination of scientific results. b. What...ganglia by computational simulation and dynamic-clamp analysis . Journal of neurophysiology 92, 2659- 2671, doi:10.1152/jn.00470.2004 (2004). 14 Llewellyn
NASA Technical Reports Server (NTRS)
Viezee, W.; Russell, P. B.; Hake, R. D., Jr.
1974-01-01
The matching method of lidar data analysis is explained, and the results from two flights studying the stratospheric aerosol using lidar techniques are summarized and interpreted. Support is lent to the matching method of lidar data analysis by the results, but it is not yet apparent that the analysis technique leads to acceptable results on all nights in all seasons.
Analysis strategies for high-resolution UHF-fMRI data.
Polimeni, Jonathan R; Renvall, Ville; Zaretskaya, Natalia; Fischl, Bruce
2018-03-01
Functional MRI (fMRI) benefits from both increased sensitivity and specificity with increasing magnetic field strength, making it a key application for Ultra-High Field (UHF) MRI scanners. Most UHF-fMRI studies utilize the dramatic increases in sensitivity and specificity to acquire high-resolution data reaching sub-millimeter scales, which enable new classes of experiments to probe the functional organization of the human brain. This review article surveys advanced data analysis strategies developed for high-resolution fMRI at UHF. These include strategies designed to mitigate distortion and artifacts associated with higher fields in ways that attempt to preserve spatial resolution of the fMRI data, as well as recently introduced analysis techniques that are enabled by these extremely high-resolution data. Particular focus is placed on anatomically-informed analyses, including cortical surface-based analysis, which are powerful techniques that can guide each step of the analysis from preprocessing to statistical analysis to interpretation and visualization. New intracortical analysis techniques for laminar and columnar fMRI are also reviewed and discussed. Prospects for single-subject individualized analyses are also presented and discussed. Altogether, there are both specific challenges and opportunities presented by UHF-fMRI, and the use of proper analysis strategies can help these valuable data reach their full potential. Copyright © 2017 Elsevier Inc. All rights reserved.
Advancing our thinking in presence-only and used-available analysis.
Warton, David; Aarts, Geert
2013-11-01
1. The problems of analysing used-available data and presence-only data are equivalent, and this paper uses this equivalence as a platform for exploring opportunities for advancing analysis methodology. 2. We suggest some potential methodological advances in used-available analysis, made possible via lessons learnt in the presence-only literature, for example, using modern methods to improve predictive performance. We also consider the converse - potential advances in presence-only analysis inspired by used-available methodology. 3. Notwithstanding these potential advances in methodology, perhaps a greater opportunity is in advancing our thinking about how to apply a given method to a particular data set. 4. It is shown by example that strikingly different results can be achieved for a single data set by applying a given method of analysis in different ways - hence having chosen a method of analysis, the next step of working out how to apply it is critical to performance. 5. We review some key issues to consider in deciding how to apply an analysis method: apply the method in a manner that reflects the study design; consider data properties; and use diagnostic tools to assess how reasonable a given analysis is for the data at hand. © 2013 The Authors. Journal of Animal Ecology © 2013 British Ecological Society.
Evaluation of Graph Pattern Matching Workloads in Graph Analysis Systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hong, Seokyong; Lee, Sangkeun; Lim, Seung-Hwan
2016-01-01
Graph analysis has emerged as a powerful method for data scientists to represent, integrate, query, and explore heterogeneous data sources. As a result, graph data management and mining became a popular area of research, and led to the development of plethora of systems in recent years. Unfortunately, the number of emerging graph analysis systems and the wide range of applications, coupled with a lack of apples-to-apples comparisons, make it difficult to understand the trade-offs between different systems and the graph operations for which they are designed. A fair comparison of these systems is a challenging task for the following reasons:more » multiple data models, non-standardized serialization formats, various query interfaces to users, and diverse environments they operate in. To address these key challenges, in this paper we present a new benchmark suite by extending the Lehigh University Benchmark (LUBM) to cover the most common capabilities of various graph analysis systems. We provide the design process of the benchmark, which generalizes the workflow for data scientists to conduct the desired graph analysis on different graph analysis systems. Equipped with this extended benchmark suite, we present performance comparison for nine subgraph pattern retrieval operations over six graph analysis systems, namely NetworkX, Neo4j, Jena, Titan, GraphX, and uRiKA. Through the proposed benchmark suite, this study reveals both quantitative and qualitative findings in (1) implications in loading data into each system; (2) challenges in describing graph patterns for each query interface; and (3) different sensitivity of each system to query selectivity. We envision that this study will pave the road for: (i) data scientists to select the suitable graph analysis systems, and (ii) data management system designers to advance graph analysis systems.« less
Reliability of Long-Term Wave Conditions Predicted with Data Sets of Short Duration
1985-03-01
the validity and reliability of predicted probable wave heights obtained from data of limited duration. BACKGROUND: The basic steps listed by...interest to perform the analysis outlined in steps 2 to 5, the prediction would only be reliable for up to a 3year return period. For a 5-year data set...for long-term hindcast data . The data retrieval and analysis program known as the Sea State Engineering Analysis System (SEAS) makes handling of the
NASA Technical Reports Server (NTRS)
1972-01-01
The tug design and performance data base for the economic analysis of space tug operation are presented. A compendium of the detailed design and performance information from the data base is developed. The design data are parametric across a range of reusable space tug sizes. The performance curves are generated for selected point designs of expendable orbit injection stages and reusable tugs. Data are presented in the form of graphs for various modes of operation.
Multi-sensor analysis of urban ecosystems
Gallo, Kevin P.; Ji, Lei
2004-01-01
This study examines the synthesis of multiple space-based sensors to characterize the urban environment Single scene data (e.g., ASTER visible and near-IR surface reflectance, and land surface temperature data), multi-temporal data (e.g., one year of 16-day MODIS and AVHRR vegetation index data), and DMSP-OLS nighttime light data acquired in the early 1990s and 2000 were evaluated for urban ecosystem analysis. The advantages of a multi-sensor approach for the analysis of urban ecosystem processes are discussed.
A modeling analysis program for the JPL table mountain Io sodium cloud data
NASA Technical Reports Server (NTRS)
Smyth, W. H.; Goldberg, B. A.
1984-01-01
A detailed review of 110 of the 263 Region B/C images of the 1981 data set is undertaken and a preliminary assessment of 39 images of the 1976-79 data set is presented. The basic spatial characteristics of these images are discussed. Modeling analysis of these images after further data processing will provide useful information about Io and the planetary magnetosphere. Plans for data processing and modeling analysis are outlined. Results of very preliminary modeling activities are presented.
The integration of a LANDSAT analysis capability with a geographic information system
NASA Technical Reports Server (NTRS)
Nordstrand, E. A.
1981-01-01
The integration of LANDSAT data was achieved through the development of a flexible, compatible analysis tool and using an existing data base to select the usable data from a LANDSAT analysis. The software package allows manipulation of grid cell data plus the flexibility to allow the user to include FORTRAN statements for special functions. Using this combination of capabilities the user can classify a LANDSAT image and then selectivity merge the results with other data that may exist for the study area.
EEG source analysis of data from paralysed subjects
NASA Astrophysics Data System (ADS)
Carabali, Carmen A.; Willoughby, John O.; Fitzgibbon, Sean P.; Grummett, Tyler; Lewis, Trent; DeLosAngeles, Dylan; Pope, Kenneth J.
2015-12-01
One of the limitations of Encephalography (EEG) data is its quality, as it is usually contaminated with electric signal from muscle. This research intends to study results of two EEG source analysis methods applied to scalp recordings taken in paralysis and in normal conditions during the performance of a cognitive task. The aim is to determinate which types of analysis are appropriate for dealing with EEG data containing myogenic components. The data used are the scalp recordings of six subjects in normal conditions and during paralysis while performing different cognitive tasks including the oddball task which is the object of this research. The data were pre-processed by filtering it and correcting artefact, then, epochs of one second long for targets and distractors were extracted. Distributed source analysis was performed in BESA Research 6.0, using its results and information from the literature, 9 ideal locations for source dipoles were identified. The nine dipoles were used to perform discrete source analysis, fitting them to the averaged epochs for obtaining source waveforms. The results were statistically analysed comparing the outcomes before and after the subjects were paralysed. Finally, frequency analysis was performed for better explain the results. The findings were that distributed source analysis could produce confounded results for EEG contaminated with myogenic signals, conversely, statistical analysis of the results from discrete source analysis showed that this method could help for dealing with EEG data contaminated with muscle electrical signal.
netCDF Operators for Rapid Analysis of Measured and Modeled Swath-like Data
NASA Astrophysics Data System (ADS)
Zender, C. S.
2015-12-01
Swath-like data (hereafter SLD) are defined by non-rectangular and/or time-varying spatial grids in which one or more coordinates are multi-dimensional. It is often challenging and time-consuming to work with SLD, including all Level 2 satellite-retrieved data, non-rectangular subsets of Level 3 data, and model data on curvilinear grids. Researchers and data centers want user-friendly, fast, and powerful methods to specify, extract, serve, manipulate, and thus analyze, SLD. To meet these needs, large research-oriented agencies and modeling center such as NASA, DOE, and NOAA increasingly employ the netCDF Operators (NCO), an open-source scientific data analysis software package applicable to netCDF and HDF data. NCO includes extensive, fast, parallelized regridding features to facilitate analysis and intercomparison of SLD and model data. Remote sensing, weather and climate modeling and analysis communities face similar problems in handling SLD including how to easily: 1. Specify and mask irregular regions such as ocean basins and political boundaries in SLD (and rectangular) grids. 2. Bin, interpolate, average, or re-map SLD to regular grids. 3. Derive secondary data from given quality levels of SLD. These common tasks require a data extraction and analysis toolkit that is SLD-friendly and, like NCO, familiar in all these communities. With NCO users can 1. Quickly project SLD onto the most useful regular grids for intercomparison. 2. Access sophisticated statistical and regridding functions that are robust to missing data and allow easy specification of quality control metrics. These capabilities improve interoperability, software-reuse, and, because they apply to SLD, minimize transmission, storage, and handling of unwanted data. While SLD analysis still poses many challenges compared to regularly gridded, rectangular data, the custom analyses scripts SLD once required are now shorter, more powerful, and user-friendly.
Navigating complex sample analysis using national survey data.
Saylor, Jennifer; Friedmann, Erika; Lee, Hyeon Joo
2012-01-01
The National Center for Health Statistics conducts the National Health and Nutrition Examination Survey and other national surveys with probability-based complex sample designs. Goals of national surveys are to provide valid data for the population of the United States. Analyses of data from population surveys present unique challenges in the research process but are valuable avenues to study the health of the United States population. The aim of this study was to demonstrate the importance of using complex data analysis techniques for data obtained with complex multistage sampling design and provide an example of analysis using the SPSS Complex Samples procedure. Illustration of challenges and solutions specific to secondary data analysis of national databases are described using the National Health and Nutrition Examination Survey as the exemplar. Oversampling of small or sensitive groups provides necessary estimates of variability within small groups. Use of weights without complex samples accurately estimates population means and frequency from the sample after accounting for over- or undersampling of specific groups. Weighting alone leads to inappropriate population estimates of variability, because they are computed as if the measures were from the entire population rather than a sample in the data set. The SPSS Complex Samples procedure allows inclusion of all sampling design elements, stratification, clusters, and weights. Use of national data sets allows use of extensive, expensive, and well-documented survey data for exploratory questions but limits analysis to those variables included in the data set. The large sample permits examination of multiple predictors and interactive relationships. Merging data files, availability of data in several waves of surveys, and complex sampling are techniques used to provide a representative sample but present unique challenges. In sophisticated data analysis techniques, use of these data is optimized.
Striped Data Server for Scalable Parallel Data Analysis
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chang, Jin; Gutsche, Oliver; Mandrichenko, Igor
A columnar data representation is known to be an efficient way for data storage, specifically in cases when the analysis is often done based only on a small fragment of the available data structures. A data representation like Apache Parquet is a step forward from a columnar representation, which splits data horizontally to allow for easy parallelization of data analysis. Based on the general idea of columnar data storage, working on the [LDRD Project], we have developed a striped data representation, which, we believe, is better suited to the needs of High Energy Physics data analysis. A traditional columnar approachmore » allows for efficient data analysis of complex structures. While keeping all the benefits of columnar data representations, the striped mechanism goes further by enabling easy parallelization of computations without requiring special hardware. We will present an implementation and some performance characteristics of such a data representation mechanism using a distributed no-SQL database or a local file system, unified under the same API and data representation model. The representation is efficient and at the same time simple so that it allows for a common data model and APIs for wide range of underlying storage mechanisms such as distributed no-SQL databases and local file systems. Striped storage adopts Numpy arrays as its basic data representation format, which makes it easy and efficient to use in Python applications. The Striped Data Server is a web service, which allows to hide the server implementation details from the end user, easily exposes data to WAN users, and allows to utilize well known and developed data caching solutions to further increase data access efficiency. We are considering the Striped Data Server as the core of an enterprise scale data analysis platform for High Energy Physics and similar areas of data processing. We have been testing this architecture with a 2TB dataset from a CMS dark matter search and plan to expand it to multiple 100 TB or even PB scale. We will present the striped format, Striped Data Server architecture and performance test results.« less
Fully automatic and precise data analysis developed for time-of-flight mass spectrometry.
Meyer, Stefan; Riedo, Andreas; Neuland, Maike B; Tulej, Marek; Wurz, Peter
2017-09-01
Scientific objectives of current and future space missions are focused on the investigation of the origin and evolution of the solar system with the particular emphasis on habitability and signatures of past and present life. For in situ measurements of the chemical composition of solid samples on planetary surfaces, the neutral atmospheric gas and the thermal plasma of planetary atmospheres, the application of mass spectrometers making use of time-of-flight mass analysers is a technique widely used. However, such investigations imply measurements with good statistics and, thus, a large amount of data to be analysed. Therefore, faster and especially robust automated data analysis with enhanced accuracy is required. In this contribution, an automatic data analysis software, which allows fast and precise quantitative data analysis of time-of-flight mass spectrometric data, is presented and discussed in detail. A crucial part of this software is a robust and fast peak finding algorithm with a consecutive numerical integration method allowing precise data analysis. We tested our analysis software with data from different time-of-flight mass spectrometers and different measurement campaigns thereof. The quantitative analysis of isotopes, using automatic data analysis, yields results with an accuracy of isotope ratios up to 100 ppm for a signal-to-noise ratio (SNR) of 10 4 . We show that the accuracy of isotope ratios is in fact proportional to SNR -1 . Furthermore, we observe that the accuracy of isotope ratios is inversely proportional to the mass resolution. Additionally, we show that the accuracy of isotope ratios is depending on the sample width T s by T s 0.5 . Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.
Christley, Scott; Scarborough, Walter; Salinas, Eddie; Rounds, William H; Toby, Inimary T; Fonner, John M; Levin, Mikhail K; Kim, Min; Mock, Stephen A; Jordan, Christopher; Ostmeyer, Jared; Buntzman, Adam; Rubelt, Florian; Davila, Marco L; Monson, Nancy L; Scheuermann, Richard H; Cowell, Lindsay G
2018-01-01
Recent technological advances in immune repertoire sequencing have created tremendous potential for advancing our understanding of adaptive immune response dynamics in various states of health and disease. Immune repertoire sequencing produces large, highly complex data sets, however, which require specialized methods and software tools for their effective analysis and interpretation. VDJServer is a cloud-based analysis portal for immune repertoire sequence data that provide access to a suite of tools for a complete analysis workflow, including modules for preprocessing and quality control of sequence reads, V(D)J gene segment assignment, repertoire characterization, and repertoire comparison. VDJServer also provides sophisticated visualizations for exploratory analysis. It is accessible through a standard web browser via a graphical user interface designed for use by immunologists, clinicians, and bioinformatics researchers. VDJServer provides a data commons for public sharing of repertoire sequencing data, as well as private sharing of data between users. We describe the main functionality and architecture of VDJServer and demonstrate its capabilities with use cases from cancer immunology and autoimmunity. VDJServer provides a complete analysis suite for human and mouse T-cell and B-cell receptor repertoire sequencing data. The combination of its user-friendly interface and high-performance computing allows large immune repertoire sequencing projects to be analyzed with no programming or software installation required. VDJServer is a web-accessible cloud platform that provides access through a graphical user interface to a data management infrastructure, a collection of analysis tools covering all steps in an analysis, and an infrastructure for sharing data along with workflows, results, and computational provenance. VDJServer is a free, publicly available, and open-source licensed resource.
Big Data in HEP: A comprehensive use case study
Gutsche, Oliver; Cremonesi, Matteo; Elmer, Peter; ...
2017-11-23
Experimental Particle Physics has been at the forefront of analyzing the worlds largest datasets for decades. The HEP community was the first to develop suitable software and computing tools for this task. In recent times, new toolkits and systems collectively called Big Data technologies have emerged to support the analysis of Petabyte and Exabyte datasets in industry. While the principles of data analysis in HEP have not changed (filtering and transforming experiment-specific data formats), these new technologies use different approaches and promise a fresh look at analysis of very large datasets and could potentially reduce the time-to-physics with increased interactivity.more » In this talk, we present an active LHC Run 2 analysis, searching for dark matter with the CMS detector, as a testbed for Big Data technologies. We directly compare the traditional NTuple-based analysis with an equivalent analysis using Apache Spark on the Hadoop ecosystem and beyond. In both cases, we start the analysis with the official experiment data formats and produce publication physics plots. Lastly, we will discuss advantages and disadvantages of each approach and give an outlook on further studies needed.« less
Big Data in HEP: A comprehensive use case study
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gutsche, Oliver; Cremonesi, Matteo; Elmer, Peter
Experimental Particle Physics has been at the forefront of analyzing the worlds largest datasets for decades. The HEP community was the first to develop suitable software and computing tools for this task. In recent times, new toolkits and systems collectively called Big Data technologies have emerged to support the analysis of Petabyte and Exabyte datasets in industry. While the principles of data analysis in HEP have not changed (filtering and transforming experiment-specific data formats), these new technologies use different approaches and promise a fresh look at analysis of very large datasets and could potentially reduce the time-to-physics with increased interactivity.more » In this talk, we present an active LHC Run 2 analysis, searching for dark matter with the CMS detector, as a testbed for Big Data technologies. We directly compare the traditional NTuple-based analysis with an equivalent analysis using Apache Spark on the Hadoop ecosystem and beyond. In both cases, we start the analysis with the official experiment data formats and produce publication physics plots. Lastly, we will discuss advantages and disadvantages of each approach and give an outlook on further studies needed.« less
Off-the-shelf Control of Data Analysis Software
NASA Astrophysics Data System (ADS)
Wampler, S.
The Gemini Project must provide convenient access to data analysis facilities to a wide user community. The international nature of this community makes the selection of data analysis software particularly interesting, with staunch advocates of systems such as ADAM and IRAF among the users. Additionally, the continuing trends towards increased use of networked systems and distributed processing impose additional complexity. To meet these needs, the Gemini Project is proposing the novel approach of using low-cost, off-the-shelf software to abstract out both the control and distribution of data analysis from the functionality of the data analysis software. For example, the orthogonal nature of control versus function means that users might select analysis routines from both ADAM and IRAF as appropriate, distributing these routines across a network of machines. It is the belief of the Gemini Project that this approach results in a system that is highly flexible, maintainable, and inexpensive to develop. The Khoros visualization system is presented as an example of control software that is currently available for providing the control and distribution within a data analysis system. The visual programming environment provided with Khoros is also discussed as a means to providing convenient access to this control.
ERIC Educational Resources Information Center
Boone, Harry N., Jr.; Boone, Deborah A.
2012-01-01
This article provides information for Extension professionals on the correct analysis of Likert data. The analyses of Likert-type and Likert scale data require unique data analysis procedures, and as a result, misuses and/or mistakes often occur. This article discusses the differences between Likert-type and Likert scale data and provides…
Data synthesis and display programs for wave distribution function analysis
NASA Technical Reports Server (NTRS)
Storey, L. R. O.; Yeh, K. J.
1992-01-01
At the National Space Science Data Center (NSSDC) software was written to synthesize and display artificial data for use in developing the methodology of wave distribution analysis. The software comprises two separate interactive programs, one for data synthesis and the other for data display.
Dai, Yilin; Guo, Ling; Li, Meng; Chen, Yi-Bu
2012-06-08
Microarray data analysis presents a significant challenge to researchers who are unable to use the powerful Bioconductor and its numerous tools due to their lack of knowledge of R language. Among the few existing software programs that offer a graphic user interface to Bioconductor packages, none have implemented a comprehensive strategy to address the accuracy and reliability issue of microarray data analysis due to the well known probe design problems associated with many widely used microarray chips. There is also a lack of tools that would expedite the functional analysis of microarray results. We present Microarray Я US, an R-based graphical user interface that implements over a dozen popular Bioconductor packages to offer researchers a streamlined workflow for routine differential microarray expression data analysis without the need to learn R language. In order to enable a more accurate analysis and interpretation of microarray data, we incorporated the latest custom probe re-definition and re-annotation for Affymetrix and Illumina chips. A versatile microarray results output utility tool was also implemented for easy and fast generation of input files for over 20 of the most widely used functional analysis software programs. Coupled with a well-designed user interface, Microarray Я US leverages cutting edge Bioconductor packages for researchers with no knowledge in R language. It also enables a more reliable and accurate microarray data analysis and expedites downstream functional analysis of microarray results.
Mesoscale and severe storms (Mass) data management and analysis system
NASA Technical Reports Server (NTRS)
Hickey, J. S.; Karitani, S.; Dickerson, M.
1984-01-01
Progress on the Mesoscale and Severe Storms (MASS) data management and analysis system is described. An interactive atmospheric data base management software package to convert four types of data (Sounding, Single Level, Grid, Image) into standard random access formats is implemented and integrated with the MASS AVE80 Series general purpose plotting and graphics display data analysis software package. An interactive analysis and display graphics software package (AVE80) to analyze large volumes of conventional and satellite derived meteorological data is enhanced to provide imaging/color graphics display utilizing color video hardware integrated into the MASS computer system. Local and remote smart-terminal capability is provided by installing APPLE III computer systems within individual scientist offices and integrated with the MASS system, thus providing color video display, graphics, and characters display of the four data types.
Analysis of measured data of human body based on error correcting frequency
NASA Astrophysics Data System (ADS)
Jin, Aiyan; Peipei, Gao; Shang, Xiaomei
2014-04-01
Anthropometry is to measure all parts of human body surface, and the measured data is the basis of analysis and study of the human body, establishment and modification of garment size and formulation and implementation of online clothing store. In this paper, several groups of the measured data are gained, and analysis of data error is gotten by analyzing the error frequency and using analysis of variance method in mathematical statistics method. Determination of the measured data accuracy and the difficulty of measured parts of human body, further studies of the causes of data errors, and summarization of the key points to minimize errors possibly are also mentioned in the paper. This paper analyses the measured data based on error frequency, and in a way , it provides certain reference elements to promote the garment industry development.
National survey on dose data analysis in computed tomography.
Heilmaier, Christina; Treier, Reto; Merkle, Elmar Max; Alkhadi, Hatem; Weishaupt, Dominik; Schindera, Sebastian
2018-05-28
A nationwide survey was performed assessing current practice of dose data analysis in computed tomography (CT). All radiological departments in Switzerland were asked to participate in the on-line survey composed of 19 questions (16 multiple choice, 3 free text). It consisted of four sections: (1) general information on the department, (2) dose data analysis, (3) use of a dose management software (DMS) and (4) radiation protection activities. In total, 152 out of 241 Swiss radiological departments filled in the whole questionnaire (return rate, 63%). Seventy-nine per cent of the departments (n = 120/152) analyse dose data on a regular basis with considerable heterogeneity in the frequency (1-2 times per year, 45%, n = 54/120; every month, 35%, n = 42/120) and method of analysis. Manual analysis is carried out by 58% (n = 70/120) compared with 42% (n = 50/120) of departments using a DMS. Purchase of a DMS is planned by 43% (n = 30/70) of the departments with manual analysis. Real-time analysis of dose data is performed by 42% (n = 21/50) of the departments with a DMS; however, residents can access the DMS in clinical routine only in 20% (n = 10/50) of the departments. An interdisciplinary dose team, which among other things communicates dose data internally (63%, n = 76/120) and externally, is already implemented in 57% (n = 68/120) departments. Swiss radiological departments are committed to radiation safety. However, there is high heterogeneity among them regarding the frequency and method of dose data analysis as well as the use of DMS and radiation protection activities. • Swiss radiological departments are committed to and interest in radiation safety as proven by a 63% return rate of the survey. • Seventy-nine per cent of departments analyse dose data on a regular basis with differences in the frequency and method of analysis: 42% use a dose management software, while 58% currently perform manual dose data analysis. Of the latter, 43% plan to buy a dose management software. • Currently, only 25% of the departments add radiation exposure data to the final CT report.
Randomization Procedures Applied to Analysis of Ballistic Data
1991-06-01
test,;;15. NUMBER OF PAGES data analysis; computationally intensive statistics ; randomization tests; permutation tests; 16 nonparametric statistics ...be 0.13. 8 Any reasonable statistical procedure would fail to support the notion of improvement of dynamic over standard indexing based on this data ...AD-A238 389 TECHNICAL REPORT BRL-TR-3245 iBRL RANDOMIZATION PROCEDURES APPLIED TO ANALYSIS OF BALLISTIC DATA MALCOLM S. TAYLOR BARRY A. BODT - JUNE
Evan Brooks; Valerie Thomas; Wynne Randolph; John Coulston
2012-01-01
With the advent of free Landsat data stretching back decades, there has been a surge of interest in utilizing remotely sensed data in multitemporal analysis for estimation of biophysical parameters. Such analysis is confounded by cloud cover and other image-specific problems, which result in missing data at various aperiodic times of the year. While there is a wealth...
Statistical analysis and interpolation of compositional data in materials science.
Pesenson, Misha Z; Suram, Santosh K; Gregoire, John M
2015-02-09
Compositional data are ubiquitous in chemistry and materials science: analysis of elements in multicomponent systems, combinatorial problems, etc., lead to data that are non-negative and sum to a constant (for example, atomic concentrations). The constant sum constraint restricts the sampling space to a simplex instead of the usual Euclidean space. Since statistical measures such as mean and standard deviation are defined for the Euclidean space, traditional correlation studies, multivariate analysis, and hypothesis testing may lead to erroneous dependencies and incorrect inferences when applied to compositional data. Furthermore, composition measurements that are used for data analytics may not include all of the elements contained in the material; that is, the measurements may be subcompositions of a higher-dimensional parent composition. Physically meaningful statistical analysis must yield results that are invariant under the number of composition elements, requiring the application of specialized statistical tools. We present specifics and subtleties of compositional data processing through discussion of illustrative examples. We introduce basic concepts, terminology, and methods required for the analysis of compositional data and utilize them for the spatial interpolation of composition in a sputtered thin film. The results demonstrate the importance of this mathematical framework for compositional data analysis (CDA) in the fields of materials science and chemistry.
Troy, Karen L; Edwards, W Brent
2018-05-01
Quantitative CT (QCT) analysis involves the calculation of specific parameters such as bone volume and density from CT image data, and can be a powerful tool for understanding bone quality and quantity. However, without careful attention to detail during all steps of the acquisition and analysis process, data can be of poor- to unusable-quality. Good quality QCT for research requires meticulous attention to detail and standardization of all aspects of data collection and analysis to a degree that is uncommon in a clinical setting. Here, we review the literature to summarize practical and technical considerations for obtaining high quality QCT data, and provide examples of how each recommendation affects calculated variables. We also provide an overview of the QCT analysis technique to illustrate additional opportunities to improve data reproducibility and reliability. Key recommendations include: standardizing the scanner and data acquisition settings, minimizing image artifacts, selecting an appropriate reconstruction algorithm, and maximizing repeatability and objectivity during QCT analysis. The goal of the recommendations is to reduce potential sources of error throughout the analysis, from scan acquisition to the interpretation of results. Copyright © 2018 Elsevier Inc. All rights reserved.
The GONG Data Reduction and Analysis System. [solar oscillations
NASA Technical Reports Server (NTRS)
Pintar, James A.; Andersen, Bo Nyborg; Andersen, Edwin R.; Armet, David B.; Brown, Timothy M.; Hathaway, David H.; Hill, Frank; Jones, Harrison P.
1988-01-01
Each of the six GONG observing stations will produce three, 16-bit, 256X256 images of the Sun every 60 sec of sunlight. These data will be transferred from the observing sites to the GONG Data Management and Analysis Center (DMAC), in Tucson, on high-density tapes at a combined rate of over 1 gibabyte per day. The contemporaneous processing of these data will produce several standard data products and will require a sustained throughput in excess of 7 megaflops. Peak rates may exceed 50 megaflops. Archives will accumulate at the rate of approximately 1 terabyte per year, reaching nearly 3 terabytes in 3 yr of observing. Researchers will access the data products with a machine-independent GONG Reduction and Analysis Software Package (GRASP). Based on the Image Reduction and Analysis Facility, this package will include database facilities and helioseismic analysis tools. Users may access the data as visitors in Tucson, or may access DMAC remotely through networks, or may process subsets of the data at their local institutions using GRASP or other systems of their choice. Elements of the system will reach the prototype stage by the end of 1988. Full operation is expected in 1992 when data acquisition begins.
A generic Transcriptomics Reporting Framework (TRF) for 'omics data processing and analysis.
Gant, Timothy W; Sauer, Ursula G; Zhang, Shu-Dong; Chorley, Brian N; Hackermüller, Jörg; Perdichizzi, Stefania; Tollefsen, Knut E; van Ravenzwaay, Ben; Yauk, Carole; Tong, Weida; Poole, Alan
2017-12-01
A generic Transcriptomics Reporting Framework (TRF) is presented that lists parameters that should be reported in 'omics studies used in a regulatory context. The TRF encompasses the processes from transcriptome profiling from data generation to a processed list of differentially expressed genes (DEGs) ready for interpretation. Included within the TRF is a reference baseline analysis (RBA) that encompasses raw data selection; data normalisation; recognition of outliers; and statistical analysis. The TRF itself does not dictate the methodology for data processing, but deals with what should be reported. Its principles are also applicable to sequencing data and other 'omics. In contrast, the RBA specifies a simple data processing and analysis methodology that is designed to provide a comparison point for other approaches and is exemplified here by a case study. By providing transparency on the steps applied during 'omics data processing and analysis, the TRF will increase confidence processing of 'omics data, and regulatory use. Applicability of the TRF is ensured by its simplicity and generality. The TRF can be applied to all types of regulatory 'omics studies, and it can be executed using different commonly available software tools. Crown Copyright © 2017. Published by Elsevier Inc. All rights reserved.
NASA Astrophysics Data System (ADS)
Hassan, A. H.; Fluke, C. J.; Barnes, D. G.
2012-09-01
Upcoming and future astronomy research facilities will systematically generate terabyte-sized data sets moving astronomy into the Petascale data era. While such facilities will provide astronomers with unprecedented levels of accuracy and coverage, the increases in dataset size and dimensionality will pose serious computational challenges for many current astronomy data analysis and visualization tools. With such data sizes, even simple data analysis tasks (e.g. calculating a histogram or computing data minimum/maximum) may not be achievable without access to a supercomputing facility. To effectively handle such dataset sizes, which exceed today's single machine memory and processing limits, we present a framework that exploits the distributed power of GPUs and many-core CPUs, with a goal of providing data analysis and visualizing tasks as a service for astronomers. By mixing shared and distributed memory architectures, our framework effectively utilizes the underlying hardware infrastructure handling both batched and real-time data analysis and visualization tasks. Offering such functionality as a service in a “software as a service” manner will reduce the total cost of ownership, provide an easy to use tool to the wider astronomical community, and enable a more optimized utilization of the underlying hardware infrastructure.
Namkoong, Sun; Hong, Seung Phil; Kim, Myung Hwa; Park, Byung Cheol
2013-02-01
Nowadays, although its clinical value remains controversial institutions utilize hair mineral analysis. Arguments about the reliability of hair mineral analysis persist, and there have been evaluations of commercial laboratories performing hair mineral analysis. The objective of this study was to assess the reliability of intra-laboratory and inter-laboratory data at three commercial laboratories conducting hair mineral analysis, compared to serum mineral analysis. Two divided hair samples taken from near the scalp were submitted for analysis at the same time, to all laboratories, from one healthy volunteer. Each laboratory sent a report consisting of quantitative results and their interpretation of health implications. Differences among intra-laboratory and interlaboratory data were analyzed using SPSS version 12.0 (SPSS Inc., USA). All the laboratories used identical methods for quantitative analysis, and they generated consistent numerical results according to Friedman analysis of variance. However, the normal reference ranges of each laboratory varied. As such, each laboratory interpreted the patient's health differently. On intra-laboratory data, Wilcoxon analysis suggested they generated relatively coherent data, but laboratory B could not in one element, so its reliability was doubtful. In comparison with the blood test, laboratory C generated identical results, but not laboratory A and B. Hair mineral analysis has its limitations, considering the reliability of inter and intra laboratory analysis comparing with blood analysis. As such, clinicians should be cautious when applying hair mineral analysis as an ancillary tool. Each laboratory included in this study requires continuous refinement from now on for inducing standardized normal reference levels.
[Applications of meta-analysis in multi-omics].
Han, Mingfei; Zhu, Yunping
2014-07-01
As a statistical method integrating multi-features and multi-data, meta-analysis was introduced to the field of life science in the 1990s. With the rapid advances in high-throughput technologies, life omics, the core of which are genomics, transcriptomics and proteomics, is becoming the new hot spot of life science. Although the fast output of massive data has promoted the development of omics study, it results in excessive data that are difficult to integrate systematically. In this case, meta-analysis is frequently applied to analyze different types of data and is improved continuously. Here, we first summarize the representative meta-analysis methods systematically, and then study the current applications of meta-analysis in various omics fields, finally we discuss the still-existing problems and the future development of meta-analysis.
Iterative categorization (IC): a systematic technique for analysing qualitative data.
Neale, Joanne
2016-06-01
The processes of analysing qualitative data, particularly the stage between coding and publication, are often vague and/or poorly explained within addiction science and research more broadly. A simple but rigorous and transparent technique for analysing qualitative textual data, developed within the field of addiction, is described. The technique, iterative categorization (IC), is suitable for use with inductive and deductive codes and can support a range of common analytical approaches, e.g. thematic analysis, Framework, constant comparison, analytical induction, content analysis, conversational analysis, discourse analysis, interpretative phenomenological analysis and narrative analysis. Once the data have been coded, the only software required is a standard word processing package. Worked examples are provided. © 2016 The Authors. Addiction published by John Wiley & Sons Ltd on behalf of Society for the Study of Addiction.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Van Benthem, Mark Hilary; Mowry, Curtis Dale; Kotula, Paul Gabriel
Thermal decomposition of poly dimethyl siloxane compounds, Sylgard{reg_sign} 184 and 186, were examined using thermal desorption coupled gas chromatography-mass spectrometry (TD/GC-MS) and multivariate analysis. This work describes a method of producing multiway data using a stepped thermal desorption. The technique involves sequentially heating a sample of the material of interest with subsequent analysis in a commercial GC/MS system. The decomposition chromatograms were analyzed using multivariate analysis tools including principal component analysis (PCA), factor rotation employing the varimax criterion, and multivariate curve resolution. The results of the analysis show seven components related to offgassing of various fractions of siloxanes that varymore » as a function of temperature. Thermal desorption coupled with gas chromatography-mass spectrometry (TD/GC-MS) is a powerful analytical technique for analyzing chemical mixtures. It has great potential in numerous analytic areas including materials analysis, sports medicine, in the detection of designer drugs; and biological research for metabolomics. Data analysis is complicated, far from automated and can result in high false positive or false negative rates. We have demonstrated a step-wise TD/GC-MS technique that removes more volatile compounds from a sample before extracting the less volatile compounds. This creates an additional dimension of separation before the GC column, while simultaneously generating three-way data. Sandia's proven multivariate analysis methods, when applied to these data, have several advantages over current commercial options. It also has demonstrated potential for success in finding and enabling identification of trace compounds. Several challenges remain, however, including understanding the sources of noise in the data, outlier detection, improving the data pretreatment and analysis methods, developing a software tool for ease of use by the chemist, and demonstrating our belief that this multivariate analysis will enable superior differentiation capabilities. In addition, noise and system artifacts challenge the analysis of GC-MS data collected on lower cost equipment, ubiquitous in commercial laboratories. This research has the potential to affect many areas of analytical chemistry including materials analysis, medical testing, and environmental surveillance. It could also provide a method to measure adsorption parameters for chemical interactions on various surfaces by measuring desorption as a function of temperature for mixtures. We have presented results of a novel method for examining offgas products of a common PDMS material. Our method involves utilizing a stepped TD/GC-MS data acquisition scheme that may be almost totally automated, coupled with multivariate analysis schemes. This method of data generation and analysis can be applied to a number of materials aging and thermal degradation studies.« less
Iorgulescu, E; Voicu, V A; Sârbu, C; Tache, F; Albu, F; Medvedovici, A
2016-08-01
The influence of the experimental variability (instrumental repeatability, instrumental intermediate precision and sample preparation variability) and data pre-processing (normalization, peak alignment, background subtraction) on the discrimination power of multivariate data analysis methods (Principal Component Analysis -PCA- and Cluster Analysis -CA-) as well as a new algorithm based on linear regression was studied. Data used in the study were obtained through positive or negative ion monitoring electrospray mass spectrometry (+/-ESI/MS) and reversed phase liquid chromatography/UV spectrometric detection (RPLC/UV) applied to green tea extracts. Extractions in ethanol and heated water infusion were used as sample preparation procedures. The multivariate methods were directly applied to mass spectra and chromatograms, involving strictly a holistic comparison of shapes, without assignment of any structural identity to compounds. An alternative data interpretation based on linear regression analysis mutually applied to data series is also discussed. Slopes, intercepts and correlation coefficients produced by the linear regression analysis applied on pairs of very large experimental data series successfully retain information resulting from high frequency instrumental acquisition rates, obviously better defining the profiles being compared. Consequently, each type of sample or comparison between samples produces in the Cartesian space an ellipsoidal volume defined by the normal variation intervals of the slope, intercept and correlation coefficient. Distances between volumes graphically illustrates (dis)similarities between compared data. The instrumental intermediate precision had the major effect on the discrimination power of the multivariate data analysis methods. Mass spectra produced through ionization from liquid state in atmospheric pressure conditions of bulk complex mixtures resulting from extracted materials of natural origins provided an excellent data basis for multivariate analysis methods, equivalent to data resulting from chromatographic separations. The alternative evaluation of very large data series based on linear regression analysis produced information equivalent to results obtained through application of PCA an CA. Copyright © 2016 Elsevier B.V. All rights reserved.
Funding Opportunity: Genomic Data Centers
Funding Opportunity CCG, Funding Opportunity Center for Cancer Genomics, CCG, Center for Cancer Genomics, CCG RFA, Center for cancer genomics rfa, genomic data analysis network, genomic data analysis network centers,
Meta-Analysis for Primary and Secondary Data Analysis: The Super-Experiment Metaphor.
ERIC Educational Resources Information Center
Jackson, Sally
1991-01-01
Considers the relation between meta-analysis statistics and analysis of variance statistics. Discusses advantages and disadvantages as a primary data analysis tool. Argues that the two approaches are partial paraphrases of one another. Advocates an integrative approach that introduces the best of meta-analytic thinking into primary analysis…
Data Envelopment Analysis: Measurement of Educational Efficiency in Texas
ERIC Educational Resources Information Center
Carter, Lacy
2012-01-01
The purpose of this study was to examine the efficiency of Texas public school districts through Data Envelopment Analysis. The Data Envelopment Analysis estimation method calculated and assigned efficiency scores to each of the 931 school districts considered in the study. The efficiency scores were utilized in two phases. First, the school…
An Introductory Application of Principal Components to Cricket Data
ERIC Educational Resources Information Center
Manage, Ananda B. W.; Scariano, Stephen M.
2013-01-01
Principal Component Analysis is widely used in applied multivariate data analysis, and this article shows how to motivate student interest in this topic using cricket sports data. Here, principal component analysis is successfully used to rank the cricket batsmen and bowlers who played in the 2012 Indian Premier League (IPL) competition. In…
48 CFR 15.404-2 - Data to support proposal analysis.
Code of Federal Regulations, 2010 CFR
2010-10-01
... 48 Federal Acquisition Regulations System 1 2010-10-01 2010-10-01 false Data to support proposal analysis. 15.404-2 Section 15.404-2 Federal Acquisition Regulations System FEDERAL ACQUISITION REGULATION CONTRACTING METHODS AND CONTRACT TYPES CONTRACTING BY NEGOTIATION Contract Pricing 15.404-2 Data to support proposal analysis. (a) Field pricing...
ERIC Educational Resources Information Center
Coad, Jane; Evans, Ruth
2008-01-01
This article reflects on key methodological issues emerging from children and young people's involvement in data analysis processes. We outline a pragmatic framework illustrating different approaches to engaging children, using two case studies of children's experiences of participating in data analysis. The article highlights methods of…
Federal Register 2010, 2011, 2012, 2013, 2014
2013-07-25
... analysis of data submitted by Monsanto, a review of other scientific data, field tests conducted under... EA. Based on APHIS' analysis of field and laboratory data submitted by Monsanto, references provided...-2012-0027, Regulatory Analysis and Development, PPD, APHIS, Station 3A-03.8, 4700 River Road Unit 118...
The Role of Data Analysis Software in Graduate Programs in Education and Post-Graduate Research
ERIC Educational Resources Information Center
Harwell, Michael
2018-01-01
The importance of data analysis software in graduate programs in education and post-graduate educational research is self-evident. However the role of this software in facilitating supererogated statistical practice versus "cookbookery" is unclear. The need to rigorously document the role of data analysis software in students' graduate…
Early Fuel Cell Market Demonstrations | Hydrogen and Fuel Cells | NREL
Handling Equipment Data Collection and Analysis: 2015 Report, DOE Hydrogen and Fuel Cells Program Annual Progress Report (December 2015) Material Handling Equipment Data Collection and Analysis: 2015 Review, DOE Technical Report (March 2015) 2014 Forklift and Backup Power Data Collection and Analysis: 2014 Report, DOE
Water-Resources Information for the Withlacoochee River Region, West-Central Florida.
1981-08-01
116 Water use ................................... ............... 116 Analysis of recent water-use data...secondary artesian aquifers ............................................ 117 Analysis of water-rich areas .................... 117 Effects of mining on...records; no new data were collected. If two or more published reports were found to be in conflict regarding data or analysis no attempts were made to
Data Base Reexamination as Part of IDS Secondary Analysis.
ERIC Educational Resources Information Center
Curry, Blair H.; And Others
Data reexamination is a critical component for any study. The complexity of the study, the time available for data base development and analysis, and the relationship of the study to educational policy-making can all increase the criticality of such reexamination. Analysis of the error levels in the National Institute of Education's Instructional…
Code of Federal Regulations, 2014 CFR
2014-07-01
... initial analysis, processing, or interpretation of any geological data and information. Initial analysis and processing are the stages of analysis or processing where the data and information first become... information are available for submission, inspection, and selection? 580.40 Section 580.40 Mineral Resources...
Code of Federal Regulations, 2012 CFR
2012-07-01
... initial analysis, processing, or interpretation of any geological data and information. Initial analysis and processing are the stages of analysis or processing where the data and information first become... information are available for submission, inspection, and selection? 580.40 Section 580.40 Mineral Resources...
Code of Federal Regulations, 2013 CFR
2013-07-01
... initial analysis, processing, or interpretation of any geological data and information. Initial analysis and processing are the stages of analysis or processing where the data and information first become... information are available for submission, inspection, and selection? 580.40 Section 580.40 Mineral Resources...
Code of Federal Regulations, 2011 CFR
2011-07-01
... complete the initial analysis, processing, or interpretation of any geological data and information. Initial analysis and processing are the stages of analysis or processing where the data and information... information are available for submission, inspection, and selection? 280.40 Section 280.40 Mineral Resources...
Efficient Analysis of Mass Spectrometry Data Using the Isotope Wavelet
NASA Astrophysics Data System (ADS)
Hussong, Rene; Tholey, Andreas; Hildebrandt, Andreas
2007-09-01
Mass spectrometry (MS) has become today's de-facto standard for high-throughput analysis in proteomics research. Its applications range from toxicity analysis to MS-based diagnostics. Often, the time spent on the MS experiment itself is significantly less than the time necessary to interpret the measured signals, since the amount of data can easily exceed several gigabytes. In addition, automated analysis is hampered by baseline artifacts, chemical as well as electrical noise, and an irregular spacing of data points. Thus, filtering techniques originating from signal and image analysis are commonly employed to address these problems. Unfortunately, smoothing, base-line reduction, and in particular a resampling of data points can affect important characteristics of the experimental signal. To overcome these problems, we propose a new family of wavelet functions based on the isotope wavelet, which is hand-tailored for the analysis of mass spectrometry data. The resulting technique is theoretically well-founded and compares very well with standard peak picking tools, since it is highly robust against noise spoiling the data, but at the same time sufficiently sensitive to detect even low-abundant peptides.
A method for generating new datasets based on copy number for cancer analysis.
Kim, Shinuk; Kon, Mark; Kang, Hyunsik
2015-01-01
New data sources for the analysis of cancer data are rapidly supplementing the large number of gene-expression markers used for current methods of analysis. Significant among these new sources are copy number variation (CNV) datasets, which typically enumerate several hundred thousand CNVs distributed throughout the genome. Several useful algorithms allow systems-level analyses of such datasets. However, these rich data sources have not yet been analyzed as deeply as gene-expression data. To address this issue, the extensive toolsets used for analyzing expression data in cancerous and noncancerous tissue (e.g., gene set enrichment analysis and phenotype prediction) could be redirected to extract a great deal of predictive information from CNV data, in particular those derived from cancers. Here we present a software package capable of preprocessing standard Agilent copy number datasets into a form to which essentially all expression analysis tools can be applied. We illustrate the use of this toolset in predicting the survival time of patients with ovarian cancer or glioblastoma multiforme and also provide an analysis of gene- and pathway-level deletions in these two types of cancer.
The Statistical Analysis Techniques to Support the NGNP Fuel Performance Experiments
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bihn T. Pham; Jeffrey J. Einerson
2010-06-01
This paper describes the development and application of statistical analysis techniques to support the AGR experimental program on NGNP fuel performance. The experiments conducted in the Idaho National Laboratory’s Advanced Test Reactor employ fuel compacts placed in a graphite cylinder shrouded by a steel capsule. The tests are instrumented with thermocouples embedded in graphite blocks and the target quantity (fuel/graphite temperature) is regulated by the He-Ne gas mixture that fills the gap volume. Three techniques for statistical analysis, namely control charting, correlation analysis, and regression analysis, are implemented in the SAS-based NGNP Data Management and Analysis System (NDMAS) for automatedmore » processing and qualification of the AGR measured data. The NDMAS also stores daily neutronic (power) and thermal (heat transfer) code simulation results along with the measurement data, allowing for their combined use and comparative scrutiny. The ultimate objective of this work includes (a) a multi-faceted system for data monitoring and data accuracy testing, (b) identification of possible modes of diagnostics deterioration and changes in experimental conditions, (c) qualification of data for use in code validation, and (d) identification and use of data trends to support effective control of test conditions with respect to the test target. Analysis results and examples given in the paper show the three statistical analysis techniques providing a complementary capability to warn of thermocouple failures. It also suggests that the regression analysis models relating calculated fuel temperatures and thermocouple readings can enable online regulation of experimental parameters (i.e. gas mixture content), to effectively maintain the target quantity (fuel temperature) within a given range.« less
DOT National Transportation Integrated Search
1998-01-01
This paper is about a statistical research analysis of 1995-96 classification and weigh in motion : (WIM) data from seventeen continuous traffic-monitoring sites in New England. Data screening is : discussed briefly, and a cusum data quality control ...
Data warehousing as a basis for web-based documentation of data mining and analysis.
Karlsson, J; Eklund, P; Hallgren, C G; Sjödin, J G
1999-01-01
In this paper we present a case study for data warehousing intended to support data mining and analysis. We also describe a prototype for data retrieval. Further we discuss some technical issues related to a particular choice of a patient record environment.
FSSC Science Tools: Pulsar Analysis
NASA Technical Reports Server (NTRS)
Thompson, Dave
2010-01-01
This slide presentation reviews the typical pulsar analysis, giving tips for screening of the data, the use of time series analysis, and utility tools. Specific information about analyzing Vela data is reviewed.
NASA standard: Trend analysis techniques
NASA Technical Reports Server (NTRS)
1988-01-01
This Standard presents descriptive and analytical techniques for NASA trend analysis applications. Trend analysis is applicable in all organizational elements of NASA connected with, or supporting, developmental/operational programs. Use of this Standard is not mandatory; however, it should be consulted for any data analysis activity requiring the identification or interpretation of trends. Trend Analysis is neither a precise term nor a circumscribed methodology, but rather connotes, generally, quantitative analysis of time-series data. For NASA activities, the appropriate and applicable techniques include descriptive and graphical statistics, and the fitting or modeling of data by linear, quadratic, and exponential models. Usually, but not always, the data is time-series in nature. Concepts such as autocorrelation and techniques such as Box-Jenkins time-series analysis would only rarely apply and are not included in this Standard. The document presents the basic ideas needed for qualitative and quantitative assessment of trends, together with relevant examples. A list of references provides additional sources of information.
The response of numerical weather prediction analysis systems to FGGE 2b data
NASA Technical Reports Server (NTRS)
Hollingsworth, A.; Lorenc, A.; Tracton, S.; Arpe, K.; Cats, G.; Uppala, S.; Kallberg, P.
1985-01-01
An intercomparison of analyses of the main PGGE Level IIb data set is presented with three advanced analysis systems. The aims of the work are to estimate the extent and magnitude of the differences between the analyses, to identify the reasons for the differences, and finally to estimate the significance of the differences. Extratropical analyses only are considered. Objective evaluations of analysis quality, such as fit to observations, statistics of analysis differences, and mean fields are discussed. In addition, substantial emphasis is placed on subjective evaluation of a series of case studies that were selected to illustrate the importance of different aspects of the analysis procedures, such as quality control, data selection, resolution, dynamical balance, and the role of the assimilating forecast model. In some cases, the forecast models are used as selective amplifiers of analysis differences to assist in deciding which analysis was more nearly correct in the treatment of particular data.
The Data Analysis in Gravitational Wave Detection
NASA Astrophysics Data System (ADS)
Wang, Xiao-ge; Lebigot, Eric; Du, Zhi-hui; Cao, Jun-wei; Wang, Yun-yong; Zhang, Fan; Cai, Yong-zhi; Li, Mu-zi; Zhu, Zong-hong; Qian, Jin; Yin, Cong; Wang, Jian-bo; Zhao, Wen; Zhang, Yang; Blair, David; Ju, Li; Zhao, Chun-nong; Wen, Lin-qing
2017-01-01
Gravitational wave (GW) astronomy based on the GW detection is a rising interdisciplinary field, and a new window for humanity to observe the universe, followed after the traditional astronomy with the electromagnetic waves as the detection means, it has a quite important significance for studying the origin and evolution of the universe, and for extending the astronomical research field. The appearance of laser interferometer GW detector has opened a new era of GW detection, and the data processing and analysis of GWs have already been developed quickly around the world, to provide a sharp weapon for the GW astronomy. This paper introduces systematically the tool software that commonly used for the data analysis of GWs, and discusses in detail the basic methods used in the data analysis of GWs, such as the time-frequency analysis, composite analysis, pulsar timing analysis, matched filter, template, χ2 test, and Monte-Carlo simulation, etc.
Data reduction and analysis of HELIOS plasma wave data
NASA Technical Reports Server (NTRS)
Anderson, Roger R.
1988-01-01
Reduction of data acquired from the HELIOS Solar Wind Plasma Wave Experiments on HELIOS 1 and 2 was continued. Production of 24 hour survey plots of the HELIOS 1 plasma wave data were continued and microfilm copies were submitted to the National Space Science Data Center. Much of the effort involved the shock memory from both HELIOS 1 and 2. This data had to be deconvoluted and time ordered before it could be displayed and plotted in an organized form. The UNIVAX 418-III computer was replaced by a DEC VAX 11/780 computer. In order to continue the reduction and analysis of the data set, all data reduction and analysis computer programs had to be rewritten.
Integrating diverse databases into an unified analysis framework: a Galaxy approach
Blankenberg, Daniel; Coraor, Nathan; Von Kuster, Gregory; Taylor, James; Nekrutenko, Anton
2011-01-01
Recent technological advances have lead to the ability to generate large amounts of data for model and non-model organisms. Whereas, in the past, there have been a relatively small number of central repositories that serve genomic data, an increasing number of distinct specialized data repositories and resources have been established. Here, we describe a generic approach that provides for the integration of a diverse spectrum of data resources into a unified analysis framework, Galaxy (http://usegalaxy.org). This approach allows the simplified coupling of external data resources with the data analysis tools available to Galaxy users, while leveraging the native data mining facilities of the external data resources. Database URL: http://usegalaxy.org PMID:21531983
Towards human-computer synergetic analysis of large-scale biological data.
Singh, Rahul; Yang, Hui; Dalziel, Ben; Asarnow, Daniel; Murad, William; Foote, David; Gormley, Matthew; Stillman, Jonathan; Fisher, Susan
2013-01-01
Advances in technology have led to the generation of massive amounts of complex and multifarious biological data in areas ranging from genomics to structural biology. The volume and complexity of such data leads to significant challenges in terms of its analysis, especially when one seeks to generate hypotheses or explore the underlying biological processes. At the state-of-the-art, the application of automated algorithms followed by perusal and analysis of the results by an expert continues to be the predominant paradigm for analyzing biological data. This paradigm works well in many problem domains. However, it also is limiting, since domain experts are forced to apply their instincts and expertise such as contextual reasoning, hypothesis formulation, and exploratory analysis after the algorithm has produced its results. In many areas where the organization and interaction of the biological processes is poorly understood and exploratory analysis is crucial, what is needed is to integrate domain expertise during the data analysis process and use it to drive the analysis itself. In context of the aforementioned background, the results presented in this paper describe advancements along two methodological directions. First, given the context of biological data, we utilize and extend a design approach called experiential computing from multimedia information system design. This paradigm combines information visualization and human-computer interaction with algorithms for exploratory analysis of large-scale and complex data. In the proposed approach, emphasis is laid on: (1) allowing users to directly visualize, interact, experience, and explore the data through interoperable visualization-based and algorithmic components, (2) supporting unified query and presentation spaces to facilitate experimentation and exploration, (3) providing external contextual information by assimilating relevant supplementary data, and (4) encouraging user-directed information visualization, data exploration, and hypotheses formulation. Second, to illustrate the proposed design paradigm and measure its efficacy, we describe two prototype web applications. The first, called XMAS (Experiential Microarray Analysis System) is designed for analysis of time-series transcriptional data. The second system, called PSPACE (Protein Space Explorer) is designed for holistic analysis of structural and structure-function relationships using interactive low-dimensional maps of the protein structure space. Both these systems promote and facilitate human-computer synergy, where cognitive elements such as domain knowledge, contextual reasoning, and purpose-driven exploration, are integrated with a host of powerful algorithmic operations that support large-scale data analysis, multifaceted data visualization, and multi-source information integration. The proposed design philosophy, combines visualization, algorithmic components and cognitive expertise into a seamless processing-analysis-exploration framework that facilitates sense-making, exploration, and discovery. Using XMAS, we present case studies that analyze transcriptional data from two highly complex domains: gene expression in the placenta during human pregnancy and reaction of marine organisms to heat stress. With PSPACE, we demonstrate how complex structure-function relationships can be explored. These results demonstrate the novelty, advantages, and distinctions of the proposed paradigm. Furthermore, the results also highlight how domain insights can be combined with algorithms to discover meaningful knowledge and formulate evidence-based hypotheses during the data analysis process. Finally, user studies against comparable systems indicate that both XMAS and PSPACE deliver results with better interpretability while placing lower cognitive loads on the users. XMAS is available at: http://tintin.sfsu.edu:8080/xmas. PSPACE is available at: http://pspace.info/.
Towards human-computer synergetic analysis of large-scale biological data
2013-01-01
Background Advances in technology have led to the generation of massive amounts of complex and multifarious biological data in areas ranging from genomics to structural biology. The volume and complexity of such data leads to significant challenges in terms of its analysis, especially when one seeks to generate hypotheses or explore the underlying biological processes. At the state-of-the-art, the application of automated algorithms followed by perusal and analysis of the results by an expert continues to be the predominant paradigm for analyzing biological data. This paradigm works well in many problem domains. However, it also is limiting, since domain experts are forced to apply their instincts and expertise such as contextual reasoning, hypothesis formulation, and exploratory analysis after the algorithm has produced its results. In many areas where the organization and interaction of the biological processes is poorly understood and exploratory analysis is crucial, what is needed is to integrate domain expertise during the data analysis process and use it to drive the analysis itself. Results In context of the aforementioned background, the results presented in this paper describe advancements along two methodological directions. First, given the context of biological data, we utilize and extend a design approach called experiential computing from multimedia information system design. This paradigm combines information visualization and human-computer interaction with algorithms for exploratory analysis of large-scale and complex data. In the proposed approach, emphasis is laid on: (1) allowing users to directly visualize, interact, experience, and explore the data through interoperable visualization-based and algorithmic components, (2) supporting unified query and presentation spaces to facilitate experimentation and exploration, (3) providing external contextual information by assimilating relevant supplementary data, and (4) encouraging user-directed information visualization, data exploration, and hypotheses formulation. Second, to illustrate the proposed design paradigm and measure its efficacy, we describe two prototype web applications. The first, called XMAS (Experiential Microarray Analysis System) is designed for analysis of time-series transcriptional data. The second system, called PSPACE (Protein Space Explorer) is designed for holistic analysis of structural and structure-function relationships using interactive low-dimensional maps of the protein structure space. Both these systems promote and facilitate human-computer synergy, where cognitive elements such as domain knowledge, contextual reasoning, and purpose-driven exploration, are integrated with a host of powerful algorithmic operations that support large-scale data analysis, multifaceted data visualization, and multi-source information integration. Conclusions The proposed design philosophy, combines visualization, algorithmic components and cognitive expertise into a seamless processing-analysis-exploration framework that facilitates sense-making, exploration, and discovery. Using XMAS, we present case studies that analyze transcriptional data from two highly complex domains: gene expression in the placenta during human pregnancy and reaction of marine organisms to heat stress. With PSPACE, we demonstrate how complex structure-function relationships can be explored. These results demonstrate the novelty, advantages, and distinctions of the proposed paradigm. Furthermore, the results also highlight how domain insights can be combined with algorithms to discover meaningful knowledge and formulate evidence-based hypotheses during the data analysis process. Finally, user studies against comparable systems indicate that both XMAS and PSPACE deliver results with better interpretability while placing lower cognitive loads on the users. XMAS is available at: http://tintin.sfsu.edu:8080/xmas. PSPACE is available at: http://pspace.info/. PMID:24267485