Sample records for automated multi-dataset analysis

  1. Heterogeneous Optimization Framework: Reproducible Preprocessing of Multi-Spectral Clinical MRI for Neuro-Oncology Imaging Research.

    PubMed

    Milchenko, Mikhail; Snyder, Abraham Z; LaMontagne, Pamela; Shimony, Joshua S; Benzinger, Tammie L; Fouke, Sarah Jost; Marcus, Daniel S

    2016-07-01

    Neuroimaging research often relies on clinically acquired magnetic resonance imaging (MRI) datasets that can originate from multiple institutions. Such datasets are characterized by high heterogeneity of modalities and variability of sequence parameters. This heterogeneity complicates the automation of image processing tasks such as spatial co-registration and physiological or functional image analysis. Given this heterogeneity, conventional processing workflows developed for research purposes are not optimal for clinical data. In this work, we describe an approach called Heterogeneous Optimization Framework (HOF) for developing image analysis pipelines that can handle the high degree of clinical data non-uniformity. HOF provides a set of guidelines for configuration, algorithm development, deployment, interpretation of results and quality control for such pipelines. At each step, we illustrate the HOF approach using the implementation of an automated pipeline for Multimodal Glioma Analysis (MGA) as an example. The MGA pipeline computes tissue diffusion characteristics of diffusion tensor imaging (DTI) acquisitions, hemodynamic characteristics using a perfusion model of susceptibility contrast (DSC) MRI, and spatial cross-modal co-registration of available anatomical, physiological and derived patient images. Developing MGA within HOF enabled the processing of neuro-oncology MR imaging studies to be fully automated. MGA has been successfully used to analyze over 160 clinical tumor studies to date within several research projects. Introduction of the MGA pipeline improved image processing throughput and, most importantly, effectively produced co-registered datasets that were suitable for advanced analysis despite high heterogeneity in acquisition protocols.

  2. Ontology-aided feature correlation for multi-modal urban sensing

    NASA Astrophysics Data System (ADS)

    Misra, Archan; Lantra, Zaman; Jayarajah, Kasthuri

    2016-05-01

    The paper explores the use of correlation across features extracted from different sensing channels to help in urban situational understanding. We use real-world datasets to show how such correlation can improve the accuracy of detection of city-wide events by combining metadata analysis with image analysis of Instagram content. We demonstrate this through a case study on the Singapore Haze. We show that simple ontological relationships and reasoning can significantly help in automating such correlation-based understanding of transient urban events.

  3. An automated quasi-continuous capillary refill timing device

    PubMed Central

    Blaxter, L L; Morris, D E; Crowe, J A; Henry, C; Hill, S; Sharkey, D; Vyas, H; Hayes-Gill, B R

    2016-01-01

    Capillary refill time (CRT) is a simple means of cardiovascular assessment which is widely used in clinical care. Currently, CRT is measured through manual assessment of the time taken for skin tone to return to normal colour following blanching of the skin surface. There is evidence to suggest that manually assessed CRT is subject to bias from ambient light conditions, a lack of standardisation of both blanching time and manually applied pressure, subjectiveness of return to normal colour, and variability in the manual assessment of time. We present a novel automated system for CRT measurement, incorporating three components: a non-invasive adhesive sensor incorporating a pneumatic actuator, a diffuse multi-wavelength reflectance measurement device, and a temperature sensor; a battery operated datalogger unit containing a self contained pneumatic supply; and PC based data analysis software for the extraction of refill time, patient skin surface temperature, and sensor signal quality. Through standardisation of the test, it is hoped that some of the shortcomings of manual CRT can be overcome. In addition, an automated system will facilitate easier integration of CRT into electronic record keeping and clinical monitoring or scoring systems, as well as reducing demands on clinicians. Summary analysis of volunteer (n = 30) automated CRT datasets are presented, from 15 healthy adults and 15 healthy children (aged from 5 to 15 years), as their arms were cooled from ambient temperature to 5°C. A more detailed analysis of two typical datasets is also presented, demonstrating that the response of automated CRT to cooling matches that of previously published studies. PMID:26642080

  4. Real time reconstruction of quasiperiodic multi parameter physiological signals

    NASA Astrophysics Data System (ADS)

    Ganeshapillai, Gartheeban; Guttag, John

    2012-12-01

    A modern intensive care unit (ICU) has automated analysis systems that depend on continuous uninterrupted real time monitoring of physiological signals such as electrocardiogram (ECG), arterial blood pressure (ABP), and photo-plethysmogram (PPG). These signals are often corrupted by noise, artifacts, and missing data. We present an automated learning framework for real time reconstruction of corrupted multi-parameter nonstationary quasiperiodic physiological signals. The key idea is to learn a patient-specific model of the relationships between signals, and then reconstruct corrupted segments using the information available in correlated signals. We evaluated our method on MIT-BIH arrhythmia data, a two-channel ECG dataset with many clinically significant arrhythmias, and on the CinC challenge 2010 data, a multi-parameter dataset containing ECG, ABP, and PPG. For each, we evaluated both the residual distance between the original signals and the reconstructed signals, and the performance of a heartbeat classifier on a reconstructed ECG signal. At an SNR of 0 dB, the average residual distance on the CinC data was roughly 3% of the energy in the signal, and on the arrhythmia database it was roughly 16%. The difference is attributable to the large amount of diversity in the arrhythmia database. Remarkably, despite the relatively high residual difference, the classification accuracy on the arrhythmia database was still 98%, indicating that our method restored the physiologically important aspects of the signal.

  5. Experiences Building Globus Genomics: A Next-Generation Sequencing Analysis Service using Galaxy, Globus, and Amazon Web Services

    PubMed Central

    Madduri, Ravi K.; Sulakhe, Dinanath; Lacinski, Lukasz; Liu, Bo; Rodriguez, Alex; Chard, Kyle; Dave, Utpal J.; Foster, Ian T.

    2014-01-01

    We describe Globus Genomics, a system that we have developed for rapid analysis of large quantities of next-generation sequencing (NGS) genomic data. This system achieves a high degree of end-to-end automation that encompasses every stage of data analysis including initial data retrieval from remote sequencing centers or storage (via the Globus file transfer system); specification, configuration, and reuse of multi-step processing pipelines (via the Galaxy workflow system); creation of custom Amazon Machine Images and on-demand resource acquisition via a specialized elastic provisioner (on Amazon EC2); and efficient scheduling of these pipelines over many processors (via the HTCondor scheduler). The system allows biomedical researchers to perform rapid analysis of large NGS datasets in a fully automated manner, without software installation or a need for any local computing infrastructure. We report performance and cost results for some representative workloads. PMID:25342933

  6. Experiences Building Globus Genomics: A Next-Generation Sequencing Analysis Service using Galaxy, Globus, and Amazon Web Services.

    PubMed

    Madduri, Ravi K; Sulakhe, Dinanath; Lacinski, Lukasz; Liu, Bo; Rodriguez, Alex; Chard, Kyle; Dave, Utpal J; Foster, Ian T

    2014-09-10

    We describe Globus Genomics, a system that we have developed for rapid analysis of large quantities of next-generation sequencing (NGS) genomic data. This system achieves a high degree of end-to-end automation that encompasses every stage of data analysis including initial data retrieval from remote sequencing centers or storage (via the Globus file transfer system); specification, configuration, and reuse of multi-step processing pipelines (via the Galaxy workflow system); creation of custom Amazon Machine Images and on-demand resource acquisition via a specialized elastic provisioner (on Amazon EC2); and efficient scheduling of these pipelines over many processors (via the HTCondor scheduler). The system allows biomedical researchers to perform rapid analysis of large NGS datasets in a fully automated manner, without software installation or a need for any local computing infrastructure. We report performance and cost results for some representative workloads.

  7. A multi-phenotypic imaging screen to identify bacterial effectors by exogenous expression in a HeLa cell line.

    PubMed

    Collins, Adam; Huett, Alan

    2018-05-15

    We present a high-content screen (HCS) for the simultaneous analysis of multiple phenotypes in HeLa cells expressing an autophagy reporter (mcherry-LC3) and one of 224 GFP-fused proteins from the Crohn's Disease (CD)-associated bacterium, Adherent Invasive E. coli (AIEC) strain LF82. Using automated confocal microscopy and image analysis (CellProfiler), we localised GFP fusions within cells, and monitored their effects upon autophagy (an important innate cellular defence mechanism), cellular and nuclear morphology, and the actin cytoskeleton. This data will provide an atlas for the localisation of 224 AIEC proteins within human cells, as well as a dataset to analyse their effects upon many aspects of host cell morphology. We also describe an open-source, automated, image-analysis workflow to identify bacterial effectors and their roles via the perturbations induced in reporter cell lines when candidate effectors are exogenously expressed.

  8. Multi-atlas segmentation enables robust multi-contrast MRI spleen segmentation for splenomegaly

    NASA Astrophysics Data System (ADS)

    Huo, Yuankai; Liu, Jiaqi; Xu, Zhoubing; Harrigan, Robert L.; Assad, Albert; Abramson, Richard G.; Landman, Bennett A.

    2017-02-01

    Non-invasive spleen volume estimation is essential in detecting splenomegaly. Magnetic resonance imaging (MRI) has been used to facilitate splenomegaly diagnosis in vivo. However, achieving accurate spleen volume estimation from MR images is challenging given the great inter-subject variance of human abdomens and wide variety of clinical images/modalities. Multi-atlas segmentation has been shown to be a promising approach to handle heterogeneous data and difficult anatomical scenarios. In this paper, we propose to use multi-atlas segmentation frameworks for MRI spleen segmentation for splenomegaly. To the best of our knowledge, this is the first work that integrates multi-atlas segmentation for splenomegaly as seen on MRI. To address the particular concerns of spleen MRI, automated and novel semi-automated atlas selection approaches are introduced. The automated approach interactively selects a subset of atlases using selective and iterative method for performance level estimation (SIMPLE) approach. To further control the outliers, semi-automated craniocaudal length based SIMPLE atlas selection (L-SIMPLE) is proposed to introduce a spatial prior in a fashion to guide the iterative atlas selection. A dataset from a clinical trial containing 55 MRI volumes (28 T1 weighted and 27 T2 weighted) was used to evaluate different methods. Both automated and semi-automated methods achieved median DSC > 0.9. The outliers were alleviated by the L-SIMPLE (≍1 min manual efforts per scan), which achieved 0.9713 Pearson correlation compared with the manual segmentation. The results demonstrated that the multi-atlas segmentation is able to achieve accurate spleen segmentation from the multi-contrast splenomegaly MRI scans.

  9. Multi-atlas Segmentation Enables Robust Multi-contrast MRI Spleen Segmentation for Splenomegaly.

    PubMed

    Huo, Yuankai; Liu, Jiaqi; Xu, Zhoubing; Harrigan, Robert L; Assad, Albert; Abramson, Richard G; Landman, Bennett A

    2017-02-11

    Non-invasive spleen volume estimation is essential in detecting splenomegaly. Magnetic resonance imaging (MRI) has been used to facilitate splenomegaly diagnosis in vivo. However, achieving accurate spleen volume estimation from MR images is challenging given the great inter-subject variance of human abdomens and wide variety of clinical images/modalities. Multi-atlas segmentation has been shown to be a promising approach to handle heterogeneous data and difficult anatomical scenarios. In this paper, we propose to use multi-atlas segmentation frameworks for MRI spleen segmentation for splenomegaly. To the best of our knowledge, this is the first work that integrates multi-atlas segmentation for splenomegaly as seen on MRI. To address the particular concerns of spleen MRI, automated and novel semi-automated atlas selection approaches are introduced. The automated approach interactively selects a subset of atlases using selective and iterative method for performance level estimation (SIMPLE) approach. To further control the outliers, semi-automated craniocaudal length based SIMPLE atlas selection (L-SIMPLE) is proposed to introduce a spatial prior in a fashion to guide the iterative atlas selection. A dataset from a clinical trial containing 55 MRI volumes (28 T1 weighted and 27 T2 weighted) was used to evaluate different methods. Both automated and semi-automated methods achieved median DSC > 0.9. The outliers were alleviated by the L-SIMPLE (≈1 min manual efforts per scan), which achieved 0.9713 Pearson correlation compared with the manual segmentation. The results demonstrated that the multi-atlas segmentation is able to achieve accurate spleen segmentation from the multi-contrast splenomegaly MRI scans.

  10. Mapping permafrost change hot-spots with Landsat time-series

    NASA Astrophysics Data System (ADS)

    Grosse, G.; Nitze, I.

    2016-12-01

    Recent and projected future climate warming strongly affects permafrost stability over large parts of the terrestrial Arctic with local, regional and global scale consequences. The monitoring and quantification of permafrost and associated land surface changes in these areas is crucial for the analysis of hydrological and biogeochemical cycles as well as vegetation and ecosystem dynamics. However, detailed knowledge of the spatial distribution and the temporal dynamics of these processes is scarce and likely key locations of permafrost landscape dynamics may remain unnoticed. As part of the ERC funded PETA-CARB and ESA GlobPermafrost projects, we developed an automated processing chain based on data from the entire Landsat archive (excluding MSS) for the detection of permafrost change related processes and hotspots. The automated method enables us to analyze thousands of Landsat scenes, which allows for a multi-scaled spatio-temporal analysis at 30 meter spatial resolution. All necessary processing steps are carried out automatically with minimal user interaction, including data extraction, masking, reprojection, subsetting, data stacking, and calculation of multi-spectral indices. These indices, e.g. Landsat Tasseled Cap and NDVI among others, are used as proxies for land surface conditions, such as vegetation status, moisture or albedo. Finally, a robust trend analysis is applied to each multi-spectral index and each pixel over the entire observation period of up to 30 years from 1985 to 2015, depending on data availability. Large transects of around 2 million km² across different permafrost types in Siberia and North America have been processed. Permafrost related or influencing landscape dynamics were detected within the trend analysis, including thermokarst lake dynamics, fires, thaw slumps, and coastal dynamics. The produced datasets will be distributed to the community as part of the ERC PETA-CARB and ESA GlobPermafrost projects. Users are encouraged to provide feedback and ground truth data for a continuous improvement of our methodology and datasets, which will lead to a better understanding of the spatial and temporal distribution of changes within the vulnerable permafrost zone.

  11. Automated Leaf Tracking using Multi-view Image Sequences of Maize Plants for Leaf-growth Monitoring

    NASA Astrophysics Data System (ADS)

    Das Choudhury, S.; Awada, T.; Samal, A.; Stoerger, V.; Bashyam, S.

    2017-12-01

    Extraction of phenotypes with botanical importance by analyzing plant image sequences has the desirable advantages of non-destructive temporal phenotypic measurements of a large number of plants with little or no manual intervention in a relatively short period of time. The health of a plant is best interpreted by the emergence timing and temporal growth of individual leaves. For automated leaf growth monitoring, it is essential to track each leaf throughout the life cycle of the plant. Plants are constantly changing organisms with increasing complexity in architecture due to variations in self-occlusions and phyllotaxy, i.e., arrangements of leaves around the stem. The leaf cross-overs pose challenges to accurately track each leaf using single view image sequence. Thus, we introduce a novel automated leaf tracking algorithm using a graph theoretic approach by multi-view image sequence analysis based on the determination of leaf-tips and leaf-junctions in the 3D space. The basis of the leaf tracking algorithm is: the leaves emerge using bottom-up approach in the case of a maize plant, and the direction of leaf emergence strictly alternates in terms of direction. The algorithm involves labeling of the individual parts of a plant, i.e., leaves and stem, following graphical representation of the plant skeleton, i.e., one-pixel wide connected line obtained from the binary image. The length of the leaf is measured by the number of pixels in the leaf skeleton. To evaluate the performance of the algorithm, a benchmark dataset is indispensable. Thus, we publicly release University of Nebraska-Lincoln Component Plant Phenotyping dataset-2 (UNL-CPPD-2) consisting of images of the 20 maize plants captured by visible light camera of the Lemnatec Scanalyzer 3D high throughout plant phenotyping facility once daily for 60 days from 10 different views. The dataset is aimed to facilitate the development and evaluation of leaf tracking algorithms and their uniform comparisons.

  12. Multi-channel acoustic recording and automated analysis of Drosophila courtship songs

    PubMed Central

    2013-01-01

    Background Drosophila melanogaster has served as a powerful model system for genetic studies of courtship songs. To accelerate research on the genetic and neural mechanisms underlying courtship song, we have developed a sensitive recording system to simultaneously capture the acoustic signals from 32 separate pairs of courting flies as well as software for automated segmentation of songs. Results Our novel hardware design enables recording of low amplitude sounds in most laboratory environments. We demonstrate the power of this system by collecting, segmenting and analyzing over 18 hours of courtship song from 75 males from five wild-type strains of Drosophila melanogaster. Our analysis reveals previously undetected modulation of courtship song features and extensive natural genetic variation for most components of courtship song. Despite having a large dataset with sufficient power to detect subtle modulations of song, we were unable to identify previously reported periodic rhythms in the inter-pulse interval of song. We provide detailed instructions for assembling the hardware and for using our open-source segmentation software. Conclusions Analysis of a large dataset of acoustic signals from Drosophila melanogaster provides novel insight into the structure and dynamics of species-specific courtship songs. Our new system for recording and analyzing fly acoustic signals should therefore greatly accelerate future studies of the genetics, neurobiology and evolution of courtship song. PMID:23369160

  13. Large-scale automated image analysis for computational profiling of brain tissue surrounding implanted neuroprosthetic devices using Python.

    PubMed

    Rey-Villamizar, Nicolas; Somasundar, Vinay; Megjhani, Murad; Xu, Yan; Lu, Yanbin; Padmanabhan, Raghav; Trett, Kristen; Shain, William; Roysam, Badri

    2014-01-01

    In this article, we describe the use of Python for large-scale automated server-based bio-image analysis in FARSIGHT, a free and open-source toolkit of image analysis methods for quantitative studies of complex and dynamic tissue microenvironments imaged by modern optical microscopes, including confocal, multi-spectral, multi-photon, and time-lapse systems. The core FARSIGHT modules for image segmentation, feature extraction, tracking, and machine learning are written in C++, leveraging widely used libraries including ITK, VTK, Boost, and Qt. For solving complex image analysis tasks, these modules must be combined into scripts using Python. As a concrete example, we consider the problem of analyzing 3-D multi-spectral images of brain tissue surrounding implanted neuroprosthetic devices, acquired using high-throughput multi-spectral spinning disk step-and-repeat confocal microscopy. The resulting images typically contain 5 fluorescent channels. Each channel consists of 6000 × 10,000 × 500 voxels with 16 bits/voxel, implying image sizes exceeding 250 GB. These images must be mosaicked, pre-processed to overcome imaging artifacts, and segmented to enable cellular-scale feature extraction. The features are used to identify cell types, and perform large-scale analysis for identifying spatial distributions of specific cell types relative to the device. Python was used to build a server-based script (Dell 910 PowerEdge servers with 4 sockets/server with 10 cores each, 2 threads per core and 1TB of RAM running on Red Hat Enterprise Linux linked to a RAID 5 SAN) capable of routinely handling image datasets at this scale and performing all these processing steps in a collaborative multi-user multi-platform environment. Our Python script enables efficient data storage and movement between computers and storage servers, logs all the processing steps, and performs full multi-threaded execution of all codes, including open and closed-source third party libraries.

  14. An automated assay for the assessment of cardiac arrest in fish embryo.

    PubMed

    Puybareau, Elodie; Genest, Diane; Barbeau, Emilie; Léonard, Marc; Talbot, Hugues

    2017-02-01

    Studies on fish embryo models are widely developed in research. They are used in several research fields including drug discovery or environmental toxicology. In this article, we propose an entirely automated assay to detect cardiac arrest in Medaka (Oryzias latipes) based on image analysis. We propose a multi-scale pipeline based on mathematical morphology. Starting from video sequences of entire wells in 24-well plates, we focus on the embryo, detect its heart, and ascertain whether or not the heart is beating based on intensity variation analysis. Our image analysis pipeline only uses commonly available operators. It has a low computational cost, allowing analysis at the same rate as acquisition. From an initial dataset of 3192 videos, 660 were discarded as unusable (20.7%), 655 of them correctly so (99.25%) and only 5 incorrectly so (0.75%). The 2532 remaining videos were used for our test. On these, 45 errors were made, leading to a success rate of 98.23%. Copyright © 2016 Elsevier Ltd. All rights reserved.

  15. GENESIS SciFlo: Choreographing Interoperable Web Services on the Grid using a Semantically-Enabled Dataflow Execution Environment

    NASA Astrophysics Data System (ADS)

    Wilson, B. D.; Manipon, G.; Xing, Z.

    2007-12-01

    The General Earth Science Investigation Suite (GENESIS) project is a NASA-sponsored partnership between the Jet Propulsion Laboratory, academia, and NASA data centers to develop a new suite of Web Services tools to facilitate multi-sensor investigations in Earth System Science. The goal of GENESIS is to enable large-scale, multi-instrument atmospheric science using combined datasets from the AIRS, MODIS, MISR, and GPS sensors. Investigations include cross-comparison of spaceborne climate sensors, cloud spectral analysis, study of upper troposphere-stratosphere water transport, study of the aerosol indirect cloud effect, and global climate model validation. The challenges are to bring together very large datasets, reformat and understand the individual instrument retrievals, co-register or re-grid the retrieved physical parameters, perform computationally-intensive data fusion and data mining operations, and accumulate complex statistics over months to years of data. To meet these challenges, we have developed a Grid computing and dataflow framework, named SciFlo, in which we are deploying a set of versatile and reusable operators for data access, subsetting, registration, mining, fusion, compression, and advanced statistical analysis. SciFlo leverages remote Web Services, called via Simple Object Access Protocol (SOAP) or REST (one-line) URLs, and the Grid Computing standards (WS-* & Globus Alliance toolkits), and enables scientists to do multi- instrument Earth Science by assembling reusable Web Services and native executables into a distributed computing flow (tree of operators). The SciFlo client & server engines optimize the execution of such distributed data flows and allow the user to transparently find and use datasets and operators without worrying about the actual location of the Grid resources. In particular, SciFlo exploits the wealth of datasets accessible by OpenGIS Consortium (OGC) Web Mapping Servers & Web Coverage Servers (WMS/WCS), and by Open Data Access Protocol (OpenDAP) servers. SciFlo also publishes its own SOAP services for space/time query and subsetting of Earth Science datasets, and automated access to large datasets via lists of (FTP, HTTP, or DAP) URLs which point to on-line HDF or netCDF files. Typical distributed workflows obtain datasets by calling standard WMS/WCS servers or discovering and fetching data granules from ftp sites; invoke remote analysis operators available as SOAP services (interface described by a WSDL document); and merge results into binary containers (netCDF or HDF files) for further analysis using local executable operators. Naming conventions (HDFEOS and CF-1.0 for netCDF) are exploited to automatically understand and read on-line datasets. More interoperable conventions, and broader adoption of existing converntions, are vital if we are to "scale up" automated choreography of Web Services beyond toy applications. Recently, the ESIP Federation sponsored a collaborative activity in which several ESIP members developed some collaborative science scenarios for atmospheric and aerosol science, and then choreographed services from multiple groups into demonstration workflows using the SciFlo engine and a Business Process Execution Language (BPEL) workflow engine. We will discuss the lessons learned from this activity, the need for standardized interfaces (like WMS/WCS), the difficulty in agreeing on even simple XML formats and interfaces, the benefits of doing collaborative science analysis at the "touch of a button" once services are connected, and further collaborations that are being pursued.

  16. 3-D interactive visualisation tools for Hi spectral line imaging

    NASA Astrophysics Data System (ADS)

    van der Hulst, J. M.; Punzo, D.; Roerdink, J. B. T. M.

    2017-06-01

    Upcoming HI surveys will deliver such large datasets that automated processing using the full 3-D information to find and characterize HI objects is unavoidable. Full 3-D visualization is an essential tool for enabling qualitative and quantitative inspection and analysis of the 3-D data, which is often complex in nature. Here we present SlicerAstro, an open-source extension of 3DSlicer, a multi-platform open source software package for visualization and medical image processing, which we developed for the inspection and analysis of HI spectral line data. We describe its initial capabilities, including 3-D filtering, 3-D selection and comparative modelling.

  17. Automated Analysis of Fluorescence Microscopy Images to Identify Protein-Protein Interactions

    DOE PAGES

    Venkatraman, S.; Doktycz, M. J.; Qi, H.; ...

    2006-01-01

    The identification of protein interactions is important for elucidating biological networks. One obstacle in comprehensive interaction studies is the analyses of large datasets, particularly those containing images. Development of an automated system to analyze an image-based protein interaction dataset is needed. Such an analysis system is described here, to automatically extract features from fluorescence microscopy images obtained from a bacterial protein interaction assay. These features are used to relay quantitative values that aid in the automated scoring of positive interactions. Experimental observations indicate that identifying at least 50% positive cells in an image is sufficient to detect a protein interaction.more » Based on this criterion, the automated system presents 100% accuracy in detecting positive interactions for a dataset of 16 images. Algorithms were implemented using MATLAB and the software developed is available on request from the authors.« less

  18. Addressing multi-label imbalance problem of surgical tool detection using CNN.

    PubMed

    Sahu, Manish; Mukhopadhyay, Anirban; Szengel, Angelika; Zachow, Stefan

    2017-06-01

    A fully automated surgical tool detection framework is proposed for endoscopic video streams. State-of-the-art surgical tool detection methods rely on supervised one-vs-all or multi-class classification techniques, completely ignoring the co-occurrence relationship of the tools and the associated class imbalance. In this paper, we formulate tool detection as a multi-label classification task where tool co-occurrences are treated as separate classes. In addition, imbalance on tool co-occurrences is analyzed and stratification techniques are employed to address the imbalance during convolutional neural network (CNN) training. Moreover, temporal smoothing is introduced as an online post-processing step to enhance runtime prediction. Quantitative analysis is performed on the M2CAI16 tool detection dataset to highlight the importance of stratification, temporal smoothing and the overall framework for tool detection. The analysis on tool imbalance, backed by the empirical results, indicates the need and superiority of the proposed framework over state-of-the-art techniques.

  19. Multi-atlas segmentation of subcortical brain structures via the AutoSeg software pipeline

    PubMed Central

    Wang, Jiahui; Vachet, Clement; Rumple, Ashley; Gouttard, Sylvain; Ouziel, Clémentine; Perrot, Emilie; Du, Guangwei; Huang, Xuemei; Gerig, Guido; Styner, Martin

    2014-01-01

    Automated segmenting and labeling of individual brain anatomical regions, in MRI are challenging, due to the issue of individual structural variability. Although atlas-based segmentation has shown its potential for both tissue and structure segmentation, due to the inherent natural variability as well as disease-related changes in MR appearance, a single atlas image is often inappropriate to represent the full population of datasets processed in a given neuroimaging study. As an alternative for the case of single atlas segmentation, the use of multiple atlases alongside label fusion techniques has been introduced using a set of individual “atlases” that encompasses the expected variability in the studied population. In our study, we proposed a multi-atlas segmentation scheme with a novel graph-based atlas selection technique. We first paired and co-registered all atlases and the subject MR scans. A directed graph with edge weights based on intensity and shape similarity between all MR scans is then computed. The set of neighboring templates is selected via clustering of the graph. Finally, weighted majority voting is employed to create the final segmentation over the selected atlases. This multi-atlas segmentation scheme is used to extend a single-atlas-based segmentation toolkit entitled AutoSeg, which is an open-source, extensible C++ based software pipeline employing BatchMake for its pipeline scripting, developed at the Neuro Image Research and Analysis Laboratories of the University of North Carolina at Chapel Hill. AutoSeg performs N4 intensity inhomogeneity correction, rigid registration to a common template space, automated brain tissue classification based skull-stripping, and the multi-atlas segmentation. The multi-atlas-based AutoSeg has been evaluated on subcortical structure segmentation with a testing dataset of 20 adult brain MRI scans and 15 atlas MRI scans. The AutoSeg achieved mean Dice coefficients of 81.73% for the subcortical structures. PMID:24567717

  20. DOTAGWA: A CASE STUDY IN WEB-BASED ARCHITECTURES FOR CONNECTING SURFACE WATER MODELS TO SPATIALLY ENABLED WEB APPLICATIONS

    EPA Science Inventory

    The Automated Geospatial Watershed Assessment (AGWA) tool is a desktop application that uses widely available standardized spatial datasets to derive inputs for multi-scale hydrologic models (Miller et al., 2007). The required data sets include topography (DEM data), soils, clima...

  1. DigOut: viewing differential expression genes as outliers.

    PubMed

    Yu, Hui; Tu, Kang; Xie, Lu; Li, Yuan-Yuan

    2010-12-01

    With regards to well-replicated two-conditional microarray datasets, the selection of differentially expressed (DE) genes is a well-studied computational topic, but for multi-conditional microarray datasets with limited or no replication, the same task is not properly addressed by previous studies. This paper adopts multivariate outlier analysis to analyze replication-lacking multi-conditional microarray datasets, finding that it performs significantly better than the widely used limit fold change (LFC) model in a simulated comparative experiment. Compared with the LFC model, the multivariate outlier analysis also demonstrates improved stability against sample variations in a series of manipulated real expression datasets. The reanalysis of a real non-replicated multi-conditional expression dataset series leads to satisfactory results. In conclusion, a multivariate outlier analysis algorithm, like DigOut, is particularly useful for selecting DE genes from non-replicated multi-conditional gene expression dataset.

  2. MRIQC: Advancing the automatic prediction of image quality in MRI from unseen sites

    PubMed Central

    2017-01-01

    Quality control of MRI is essential for excluding problematic acquisitions and avoiding bias in subsequent image processing and analysis. Visual inspection is subjective and impractical for large scale datasets. Although automated quality assessments have been demonstrated on single-site datasets, it is unclear that solutions can generalize to unseen data acquired at new sites. Here, we introduce the MRI Quality Control tool (MRIQC), a tool for extracting quality measures and fitting a binary (accept/exclude) classifier. Our tool can be run both locally and as a free online service via the OpenNeuro.org portal. The classifier is trained on a publicly available, multi-site dataset (17 sites, N = 1102). We perform model selection evaluating different normalization and feature exclusion approaches aimed at maximizing across-site generalization and estimate an accuracy of 76%±13% on new sites, using leave-one-site-out cross-validation. We confirm that result on a held-out dataset (2 sites, N = 265) also obtaining a 76% accuracy. Even though the performance of the trained classifier is statistically above chance, we show that it is susceptible to site effects and unable to account for artifacts specific to new sites. MRIQC performs with high accuracy in intra-site prediction, but performance on unseen sites leaves space for improvement which might require more labeled data and new approaches to the between-site variability. Overcoming these limitations is crucial for a more objective quality assessment of neuroimaging data, and to enable the analysis of extremely large and multi-site samples. PMID:28945803

  3. Multi-tissue and multi-scale approach for nuclei segmentation in H&E stained images.

    PubMed

    Salvi, Massimo; Molinari, Filippo

    2018-06-20

    Accurate nuclei detection and segmentation in histological images is essential for many clinical purposes. While manual annotations are time-consuming and operator-dependent, full automated segmentation remains a challenging task due to the high variability of cells intensity, size and morphology. Most of the proposed algorithms for the automated segmentation of nuclei were designed for specific organ or tissues. The aim of this study was to develop and validate a fully multiscale method, named MANA (Multiscale Adaptive Nuclei Analysis), for nuclei segmentation in different tissues and magnifications. MANA was tested on a dataset of H&E stained tissue images with more than 59,000 annotated nuclei, taken from six organs (colon, liver, bone, prostate, adrenal gland and thyroid) and three magnifications (10×, 20×, 40×). Automatic results were compared with manual segmentations and three open-source software designed for nuclei detection. For each organ, MANA obtained always an F1-score higher than 0.91, with an average F1 of 0.9305 ± 0.0161. The average computational time was about 20 s independently of the number of nuclei to be detected (anyway, higher than 1000), indicating the efficiency of the proposed technique. To the best of our knowledge, MANA is the first fully automated multi-scale and multi-tissue algorithm for nuclei detection. Overall, the robustness and versatility of MANA allowed to achieve, on different organs and magnifications, performances in line or better than those of state-of-art algorithms optimized for single tissues.

  4. Satellite Imagery Analysis for Automated Global Food Security Forecasting

    NASA Astrophysics Data System (ADS)

    Moody, D.; Brumby, S. P.; Chartrand, R.; Keisler, R.; Mathis, M.; Beneke, C. M.; Nicholaeff, D.; Skillman, S.; Warren, M. S.; Poehnelt, J.

    2017-12-01

    The recent computing performance revolution has driven improvements in sensor, communication, and storage technology. Multi-decadal remote sensing datasets at the petabyte scale are now available in commercial clouds, with new satellite constellations generating petabytes/year of daily high-resolution global coverage imagery. Cloud computing and storage, combined with recent advances in machine learning, are enabling understanding of the world at a scale and at a level of detail never before feasible. We present results from an ongoing effort to develop satellite imagery analysis tools that aggregate temporal, spatial, and spectral information and that can scale with the high-rate and dimensionality of imagery being collected. We focus on the problem of monitoring food crop productivity across the Middle East and North Africa, and show how an analysis-ready, multi-sensor data platform enables quick prototyping of satellite imagery analysis algorithms, from land use/land cover classification and natural resource mapping, to yearly and monthly vegetative health change trends at the structural field level.

  5. On the Multi-Modal Object Tracking and Image Fusion Using Unsupervised Deep Learning Methodologies

    NASA Astrophysics Data System (ADS)

    LaHaye, N.; Ott, J.; Garay, M. J.; El-Askary, H. M.; Linstead, E.

    2017-12-01

    The number of different modalities of remote-sensors has been on the rise, resulting in large datasets with different complexity levels. Such complex datasets can provide valuable information separately, yet there is a bigger value in having a comprehensive view of them combined. As such, hidden information can be deduced through applying data mining techniques on the fused data. The curse of dimensionality of such fused data, due to the potentially vast dimension space, hinders our ability to have deep understanding of them. This is because each dataset requires a user to have instrument-specific and dataset-specific knowledge for optimum and meaningful usage. Once a user decides to use multiple datasets together, deeper understanding of translating and combining these datasets in a correct and effective manner is needed. Although there exists data centric techniques, generic automated methodologies that can potentially solve this problem completely don't exist. Here we are developing a system that aims to gain a detailed understanding of different data modalities. Such system will provide an analysis environment that gives the user useful feedback and can aid in research tasks. In our current work, we show the initial outputs our system implementation that leverages unsupervised deep learning techniques so not to burden the user with the task of labeling input data, while still allowing for a detailed machine understanding of the data. Our goal is to be able to track objects, like cloud systems or aerosols, across different image-like data-modalities. The proposed system is flexible, scalable and robust to understand complex likenesses within multi-modal data in a similar spatio-temporal range, and also to be able to co-register and fuse these images when needed.

  6. Learning to recognize rat social behavior: Novel dataset and cross-dataset application.

    PubMed

    Lorbach, Malte; Kyriakou, Elisavet I; Poppe, Ronald; van Dam, Elsbeth A; Noldus, Lucas P J J; Veltkamp, Remco C

    2018-04-15

    Social behavior is an important aspect of rodent models. Automated measuring tools that make use of video analysis and machine learning are an increasingly attractive alternative to manual annotation. Because machine learning-based methods need to be trained, it is important that they are validated using data from different experiment settings. To develop and validate automated measuring tools, there is a need for annotated rodent interaction datasets. Currently, the availability of such datasets is limited to two mouse datasets. We introduce the first, publicly available rat social interaction dataset, RatSI. We demonstrate the practical value of the novel dataset by using it as the training set for a rat interaction recognition method. We show that behavior variations induced by the experiment setting can lead to reduced performance, which illustrates the importance of cross-dataset validation. Consequently, we add a simple adaptation step to our method and improve the recognition performance. Most existing methods are trained and evaluated in one experimental setting, which limits the predictive power of the evaluation to that particular setting. We demonstrate that cross-dataset experiments provide more insight in the performance of classifiers. With our novel, public dataset we encourage the development and validation of automated recognition methods. We are convinced that cross-dataset validation enhances our understanding of rodent interactions and facilitates the development of more sophisticated recognition methods. Combining them with adaptation techniques may enable us to apply automated recognition methods to a variety of animals and experiment settings. Copyright © 2017 Elsevier B.V. All rights reserved.

  7. Automated unsupervised multi-parametric classification of adipose tissue depots in skeletal muscle

    PubMed Central

    Valentinitsch, Alexander; Karampinos, Dimitrios C.; Alizai, Hamza; Subburaj, Karupppasamy; Kumar, Deepak; Link, Thomas M.; Majumdar, Sharmila

    2012-01-01

    Purpose To introduce and validate an automated unsupervised multi-parametric method for segmentation of the subcutaneous fat and muscle regions in order to determine subcutaneous adipose tissue (SAT) and intermuscular adipose tissue (IMAT) areas based on data from a quantitative chemical shift-based water-fat separation approach. Materials and Methods Unsupervised standard k-means clustering was employed to define sets of similar features (k = 2) within the whole multi-modal image after the water-fat separation. The automated image processing chain was composed of three primary stages including tissue, muscle and bone region segmentation. The algorithm was applied on calf and thigh datasets to compute SAT and IMAT areas and was compared to a manual segmentation. Results The IMAT area using the automatic segmentation had excellent agreement with the IMAT area using the manual segmentation for all the cases in the thigh (R2: 0.96) and for cases with up to moderate IMAT area in the calf (R2: 0.92). The group with the highest grade of muscle fat infiltration in the calf had the highest error in the inner SAT contour calculation. Conclusion The proposed multi-parametric segmentation approach combined with quantitative water-fat imaging provides an accurate and reliable method for an automated calculation of the SAT and IMAT areas reducing considerably the total post-processing time. PMID:23097409

  8. A detailed comparison of analysis processes for MCC-IMS data in disease classification—Automated methods can replace manual peak annotations

    PubMed Central

    Horsch, Salome; Kopczynski, Dominik; Kuthe, Elias; Baumbach, Jörg Ingo; Rahmann, Sven

    2017-01-01

    Motivation Disease classification from molecular measurements typically requires an analysis pipeline from raw noisy measurements to final classification results. Multi capillary column—ion mobility spectrometry (MCC-IMS) is a promising technology for the detection of volatile organic compounds in the air of exhaled breath. From raw measurements, the peak regions representing the compounds have to be identified, quantified, and clustered across different experiments. Currently, several steps of this analysis process require manual intervention of human experts. Our goal is to identify a fully automatic pipeline that yields competitive disease classification results compared to an established but subjective and tedious semi-manual process. Method We combine a large number of modern methods for peak detection, peak clustering, and multivariate classification into analysis pipelines for raw MCC-IMS data. We evaluate all combinations on three different real datasets in an unbiased cross-validation setting. We determine which specific algorithmic combinations lead to high AUC values in disease classifications across the different medical application scenarios. Results The best fully automated analysis process achieves even better classification results than the established manual process. The best algorithms for the three analysis steps are (i) SGLTR (Savitzky-Golay Laplace-operator filter thresholding regions) and LM (Local Maxima) for automated peak identification, (ii) EM clustering (Expectation Maximization) and DBSCAN (Density-Based Spatial Clustering of Applications with Noise) for the clustering step and (iii) RF (Random Forest) for multivariate classification. Thus, automated methods can replace the manual steps in the analysis process to enable an unbiased high throughput use of the technology. PMID:28910313

  9. A multi-atlas based method for automated anatomical Macaca fascicularis brain MRI segmentation and PET kinetic extraction.

    PubMed

    Ballanger, Bénédicte; Tremblay, Léon; Sgambato-Faure, Véronique; Beaudoin-Gobert, Maude; Lavenne, Franck; Le Bars, Didier; Costes, Nicolas

    2013-08-15

    MRI templates and digital atlases are needed for automated and reproducible quantitative analysis of non-human primate PET studies. Segmenting brain images via multiple atlases outperforms single-atlas labelling in humans. We present a set of atlases manually delineated on brain MRI scans of the monkey Macaca fascicularis. We use this multi-atlas dataset to evaluate two automated methods in terms of accuracy, robustness and reliability in segmenting brain structures on MRI and extracting regional PET measures. Twelve individual Macaca fascicularis high-resolution 3DT1 MR images were acquired. Four individual atlases were created by manually drawing 42 anatomical structures, including cortical and sub-cortical structures, white matter regions, and ventricles. To create the MRI template, we first chose one MRI to define a reference space, and then performed a two-step iterative procedure: affine registration of individual MRIs to the reference MRI, followed by averaging of the twelve resampled MRIs. Automated segmentation in native space was obtained in two ways: 1) Maximum probability atlases were created by decision fusion of two to four individual atlases in the reference space, and transformation back into the individual native space (MAXPROB)(.) 2) One to four individual atlases were registered directly to the individual native space, and combined by decision fusion (PROPAG). Accuracy was evaluated by computing the Dice similarity index and the volume difference. The robustness and reproducibility of PET regional measurements obtained via automated segmentation was evaluated on four co-registered MRI/PET datasets, which included test-retest data. Dice indices were always over 0.7 and reached maximal values of 0.9 for PROPAG with all four individual atlases. There was no significant mean volume bias. The standard deviation of the bias decreased significantly when increasing the number of individual atlases. MAXPROB performed better when increasing the number of atlases used. When all four atlases were used for the MAXPROB creation, the accuracy of morphometric segmentation approached that of the PROPAG method. PET measures extracted either via automatic methods or via the manually defined regions were strongly correlated, with no significant regional differences between methods. Intra-class correlation coefficients for test-retest data were over 0.87. Compared to single atlas extractions, multi-atlas methods improve the accuracy of region definition. They also perform comparably to manually defined regions for PET quantification. Multiple atlases of Macaca fascicularis brains are now available and allow reproducible and simplified analyses. Copyright © 2013 Elsevier Inc. All rights reserved.

  10. Multi-categorical deep learning neural network to classify retinal images: A pilot study employing small database.

    PubMed

    Choi, Joon Yul; Yoo, Tae Keun; Seo, Jeong Gi; Kwak, Jiyong; Um, Terry Taewoong; Rim, Tyler Hyungtaek

    2017-01-01

    Deep learning emerges as a powerful tool for analyzing medical images. Retinal disease detection by using computer-aided diagnosis from fundus image has emerged as a new method. We applied deep learning convolutional neural network by using MatConvNet for an automated detection of multiple retinal diseases with fundus photographs involved in STructured Analysis of the REtina (STARE) database. Dataset was built by expanding data on 10 categories, including normal retina and nine retinal diseases. The optimal outcomes were acquired by using a random forest transfer learning based on VGG-19 architecture. The classification results depended greatly on the number of categories. As the number of categories increased, the performance of deep learning models was diminished. When all 10 categories were included, we obtained results with an accuracy of 30.5%, relative classifier information (RCI) of 0.052, and Cohen's kappa of 0.224. Considering three integrated normal, background diabetic retinopathy, and dry age-related macular degeneration, the multi-categorical classifier showed accuracy of 72.8%, 0.283 RCI, and 0.577 kappa. In addition, several ensemble classifiers enhanced the multi-categorical classification performance. The transfer learning incorporated with ensemble classifier of clustering and voting approach presented the best performance with accuracy of 36.7%, 0.053 RCI, and 0.225 kappa in the 10 retinal diseases classification problem. First, due to the small size of datasets, the deep learning techniques in this study were ineffective to be applied in clinics where numerous patients suffering from various types of retinal disorders visit for diagnosis and treatment. Second, we found that the transfer learning incorporated with ensemble classifiers can improve the classification performance in order to detect multi-categorical retinal diseases. Further studies should confirm the effectiveness of algorithms with large datasets obtained from hospitals.

  11. Properties of induced seismicity at the geothermal reservoir Insheim, Germany

    NASA Astrophysics Data System (ADS)

    Olbert, Kai; Küperkoch, Ludger; Thomas, Meier

    2017-04-01

    Within the framework of the German MAGS2 Project the processing of induced events at the geothermal power plant Insheim, Germany, has been reassessed and evaluated. The power plant is located close to the western rim of the Upper Rhine Graben in a region with a strongly heterogeneous subsurface. Therefore, the location of seismic events particularly the depth estimation is challenging. The seismic network consisting of up to 50 stations has an aperture of approximately 15 km around the power plant. Consequently, the manual processing is time consuming. Using a waveform similarity detection algorithm, the existing dataset from 2012 to 2016 has been reprocessed to complete the catalog of induced seismic events. Based on the waveform similarity clusters of similar events have been detected. Automated P- and S-arrival time determination using an improved multi-component autoregressive prediction algorithm yields approximately 14.000 P- and S-arrivals for 758 events. Applying a dataset of manual picks as reference the automated picking algorithm has been optimized resulting in a standard deviation of the residuals between automated and manual picks of about 0.02s. The automated locations show uncertainties comparable to locations of the manual reference dataset. 90 % of the automated relocations fall within the error ellipsoid of the manual locations. The remaining locations are either badly resolved due to low numbers of picks or so well resolved that the automatic location is outside the error ellipsoid although located close to the manual location. The developed automated processing scheme proved to be a useful tool to supplement real-time monitoring. The event clusters are located at small patches of faults known from reflection seismic studies. The clusters are observed close to both the injection as well as the production wells.

  12. SOAP based web services and their future role in VO projects

    NASA Astrophysics Data System (ADS)

    Topf, F.; Jacquey, C.; Génot, V.; Cecconi, B.; André, N.; Zhang, T. L.; Kallio, E.; Lammer, H.; Facsko, G.; Stöckler, R.; Khodachenko, M.

    2011-10-01

    Modern state-of-the-art web services are from crucial importance for the interoperability of different VO tools existing in the planetary community. SOAP based web services assure the interconnectability between different data sources and tools by providing a common protocol for communication. This paper will point out a best practice approach with the Automated Multi-Dataset Analysis Tool (AMDA) developed by CDPP, Toulouse and the provision of VEX/MAG data from a remote database located at IWF, Graz. Furthermore a new FP7 project IMPEx will be introduced with a potential usage example of AMDA web services in conjunction with simulation models.

  13. Glaciated valleys in Europe and western Asia

    PubMed Central

    Prasicek, Günther; Otto, Jan-Christoph; Montgomery, David R.; Schrott, Lothar

    2015-01-01

    In recent years, remote sensing, morphometric analysis, and other computational concepts and tools have invigorated the field of geomorphological mapping. Automated interpretation of digital terrain data based on impartial rules holds substantial promise for large dataset processing and objective landscape classification. However, the geomorphological realm presents tremendous complexity and challenges in the translation of qualitative descriptions into geomorphometric semantics. Here, the simple, conventional distinction of V-shaped fluvial and U-shaped glacial valleys was analyzed quantitatively using multi-scale curvature and a novel morphometric variable termed Difference of Minimum Curvature (DMC). We used this automated terrain analysis approach to produce a raster map at a scale of 1:6,000,000 showing the distribution of glaciated valleys across Europe and western Asia. The data set has a cell size of 3 arc seconds and consists of more than 40 billion grid cells. Glaciated U-shaped valleys commonly associated with erosion by warm-based glaciers are abundant in the alpine regions of mid Europe and western Asia but also occur at the margins of mountain ice sheets in Scandinavia. The high-level correspondence with field mapping and the fully transferable semantics validate this approach for automated analysis of yet unexplored terrain around the globe and qualify for potential applications on other planetary bodies like Mars. PMID:27019665

  14. Large-Scale, Multi-Sensor Atmospheric Data Fusion Using Hybrid Cloud Computing

    NASA Astrophysics Data System (ADS)

    Wilson, Brian; Manipon, Gerald; Hua, Hook; Fetzer, Eric

    2014-05-01

    NASA's Earth Observing System (EOS) is an ambitious facility for studying global climate change. The mandate now is to combine measurements from the instruments on the "A-Train" platforms (AIRS, AMSR-E, MODIS, MISR, MLS, and CloudSat) and other Earth probes to enable large-scale studies of climate change over decades. Moving to multi-sensor, long-duration analyses of important climate variables presents serious challenges for large-scale data mining and fusion. For example, one might want to compare temperature and water vapor retrievals from one instrument (AIRS) to another (MODIS), and to a model (ECMWF), stratify the comparisons using a classification of the "cloud scenes" from CloudSat, and repeat the entire analysis over 10 years of data. To efficiently assemble such datasets, we are utilizing Elastic Computing in the Cloud and parallel map-reduce-based algorithms. However, these problems are Data Intensive computing so the data transfer times and storage costs (for caching) are key issues. SciReduce is a Hadoop-like parallel analysis system, programmed in parallel python, that is designed from the ground up for Earth science. SciReduce executes inside VMWare images and scales to any number of nodes in a hybrid Cloud (private eucalyptus & public Amazon). Unlike Hadoop, SciReduce operates on bundles of named numeric arrays, which can be passed in memory or serialized to disk in netCDF4 or HDF5. Multi-year datasets are automatically "sharded" by time and space across a cluster of nodes so that years of data (millions of files) can be processed in a massively parallel way. Input variables (arrays) are pulled on-demand into the Cloud using OPeNDAP URLs or other subsetting services, thereby minimizing the size of the cached input and intermediate datasets. We are using SciReduce to automate the production of multiple versions of a ten-year A-Train water vapor climatology under a NASA MEASURES grant. We will present the architecture of SciReduce, describe the achieved "clock time" speedups in fusing datasets on our own nodes and in the Cloud, and discuss the Cloud cost tradeoffs for storage, compute, and data transfer. We will also present a concept and prototype for staging NASA's A-Train Atmospheric datasets (Levels 2 & 3) in the Amazon Cloud so that any number of compute jobs can be executed "near" the multi-sensor data. Given such a system, multi-sensor climate studies over 10-20 years of data could be perform

  15. Combination of minimum enclosing balls classifier with SVM in coal-rock recognition.

    PubMed

    Song, QingJun; Jiang, HaiYan; Song, Qinghui; Zhao, XieGuang; Wu, Xiaoxuan

    2017-01-01

    Top-coal caving technology is a productive and efficient method in modern mechanized coal mining, the study of coal-rock recognition is key to realizing automation in comprehensive mechanized coal mining. In this paper we propose a new discriminant analysis framework for coal-rock recognition. In the framework, a data acquisition model with vibration and acoustic signals is designed and the caving dataset with 10 feature variables and three classes is got. And the perfect combination of feature variables can be automatically decided by using the multi-class F-score (MF-Score) feature selection. In terms of nonlinear mapping in real-world optimization problem, an effective minimum enclosing ball (MEB) algorithm plus Support vector machine (SVM) is proposed for rapid detection of coal-rock in the caving process. In particular, we illustrate how to construct MEB-SVM classifier in coal-rock recognition which exhibit inherently complex distribution data. The proposed method is examined on UCI data sets and the caving dataset, and compared with some new excellent SVM classifiers. We conduct experiments with accuracy and Friedman test for comparison of more classifiers over multiple on the UCI data sets. Experimental results demonstrate that the proposed algorithm has good robustness and generalization ability. The results of experiments on the caving dataset show the better performance which leads to a promising feature selection and multi-class recognition in coal-rock recognition.

  16. Combination of minimum enclosing balls classifier with SVM in coal-rock recognition

    PubMed Central

    Song, QingJun; Jiang, HaiYan; Song, Qinghui; Zhao, XieGuang; Wu, Xiaoxuan

    2017-01-01

    Top-coal caving technology is a productive and efficient method in modern mechanized coal mining, the study of coal-rock recognition is key to realizing automation in comprehensive mechanized coal mining. In this paper we propose a new discriminant analysis framework for coal-rock recognition. In the framework, a data acquisition model with vibration and acoustic signals is designed and the caving dataset with 10 feature variables and three classes is got. And the perfect combination of feature variables can be automatically decided by using the multi-class F-score (MF-Score) feature selection. In terms of nonlinear mapping in real-world optimization problem, an effective minimum enclosing ball (MEB) algorithm plus Support vector machine (SVM) is proposed for rapid detection of coal-rock in the caving process. In particular, we illustrate how to construct MEB-SVM classifier in coal-rock recognition which exhibit inherently complex distribution data. The proposed method is examined on UCI data sets and the caving dataset, and compared with some new excellent SVM classifiers. We conduct experiments with accuracy and Friedman test for comparison of more classifiers over multiple on the UCI data sets. Experimental results demonstrate that the proposed algorithm has good robustness and generalization ability. The results of experiments on the caving dataset show the better performance which leads to a promising feature selection and multi-class recognition in coal-rock recognition. PMID:28937987

  17. CANEapp: a user-friendly application for automated next generation transcriptomic data analysis.

    PubMed

    Velmeshev, Dmitry; Lally, Patrick; Magistri, Marco; Faghihi, Mohammad Ali

    2016-01-13

    Next generation sequencing (NGS) technologies are indispensable for molecular biology research, but data analysis represents the bottleneck in their application. Users need to be familiar with computer terminal commands, the Linux environment, and various software tools and scripts. Analysis workflows have to be optimized and experimentally validated to extract biologically meaningful data. Moreover, as larger datasets are being generated, their analysis requires use of high-performance servers. To address these needs, we developed CANEapp (application for Comprehensive automated Analysis of Next-generation sequencing Experiments), a unique suite that combines a Graphical User Interface (GUI) and an automated server-side analysis pipeline that is platform-independent, making it suitable for any server architecture. The GUI runs on a PC or Mac and seamlessly connects to the server to provide full GUI control of RNA-sequencing (RNA-seq) project analysis. The server-side analysis pipeline contains a framework that is implemented on a Linux server through completely automated installation of software components and reference files. Analysis with CANEapp is also fully automated and performs differential gene expression analysis and novel noncoding RNA discovery through alternative workflows (Cuffdiff and R packages edgeR and DESeq2). We compared CANEapp to other similar tools, and it significantly improves on previous developments. We experimentally validated CANEapp's performance by applying it to data derived from different experimental paradigms and confirming the results with quantitative real-time PCR (qRT-PCR). CANEapp adapts to any server architecture by effectively using available resources and thus handles large amounts of data efficiently. CANEapp performance has been experimentally validated on various biological datasets. CANEapp is available free of charge at http://psychiatry.med.miami.edu/research/laboratory-of-translational-rna-genomics/CANE-app . We believe that CANEapp will serve both biologists with no computational experience and bioinformaticians as a simple, timesaving but accurate and powerful tool to analyze large RNA-seq datasets and will provide foundations for future development of integrated and automated high-throughput genomics data analysis tools. Due to its inherently standardized pipeline and combination of automated analysis and platform-independence, CANEapp is an ideal for large-scale collaborative RNA-seq projects between different institutions and research groups.

  18. Automated mapping of persistent ice and snow cover across the western U.S. with Landsat

    NASA Astrophysics Data System (ADS)

    Selkowitz, David J.; Forster, Richard R.

    2016-07-01

    We implemented an automated approach for mapping persistent ice and snow cover (PISC) across the conterminous western U.S. using all available Landsat TM and ETM+ scenes acquired during the late summer/early fall period between 2010 and 2014. Two separate validation approaches indicate this dataset provides a more accurate representation of glacial ice and perennial snow cover for the region than either the U.S. glacier database derived from US Geological Survey (USGS) Digital Raster Graphics (DRG) maps (based on aerial photography primarily from the 1960s-1980s) or the National Land Cover Database 2011 perennial ice and snow cover class. Our 2010-2014 Landsat-derived dataset indicates 28% less glacier and perennial snow cover than the USGS DRG dataset. There are larger differences between the datasets in some regions, such as the Rocky Mountains of Northwest Wyoming and Southwest Montana, where the Landsat dataset indicates 54% less PISC area. Analysis of Landsat scenes from 1987-1988 and 2008-2010 for three regions using a more conventional, semi-automated approach indicates substantial decreases in glaciers and perennial snow cover that correlate with differences between PISC mapped by the USGS DRG dataset and the automated Landsat-derived dataset. This suggests that most of the differences in PISC between the USGS DRG and the Landsat-derived dataset can be attributed to decreases in PISC, as opposed to differences between mapping techniques. While the dataset produced by the automated Landsat mapping approach is not designed to serve as a conventional glacier inventory that provides glacier outlines and attribute information, it allows for an updated estimate of PISC for the conterminous U.S. as well as for smaller regions. Additionally, the new dataset highlights areas where decreases in PISC have been most significant over the past 25-50 years.

  19. Automated mapping of persistent ice and snow cover across the western U.S. with Landsat

    USGS Publications Warehouse

    Selkowitz, David J.; Forster, Richard R.

    2016-01-01

    We implemented an automated approach for mapping persistent ice and snow cover (PISC) across the conterminous western U.S. using all available Landsat TM and ETM+ scenes acquired during the late summer/early fall period between 2010 and 2014. Two separate validation approaches indicate this dataset provides a more accurate representation of glacial ice and perennial snow cover for the region than either the U.S. glacier database derived from US Geological Survey (USGS) Digital Raster Graphics (DRG) maps (based on aerial photography primarily from the 1960s–1980s) or the National Land Cover Database 2011 perennial ice and snow cover class. Our 2010–2014 Landsat-derived dataset indicates 28% less glacier and perennial snow cover than the USGS DRG dataset. There are larger differences between the datasets in some regions, such as the Rocky Mountains of Northwest Wyoming and Southwest Montana, where the Landsat dataset indicates 54% less PISC area. Analysis of Landsat scenes from 1987–1988 and 2008–2010 for three regions using a more conventional, semi-automated approach indicates substantial decreases in glaciers and perennial snow cover that correlate with differences between PISC mapped by the USGS DRG dataset and the automated Landsat-derived dataset. This suggests that most of the differences in PISC between the USGS DRG and the Landsat-derived dataset can be attributed to decreases in PISC, as opposed to differences between mapping techniques. While the dataset produced by the automated Landsat mapping approach is not designed to serve as a conventional glacier inventory that provides glacier outlines and attribute information, it allows for an updated estimate of PISC for the conterminous U.S. as well as for smaller regions. Additionally, the new dataset highlights areas where decreases in PISC have been most significant over the past 25–50 years.

  20. Automated classification of Permanent Scatterers time-series based on statistical characterization tests

    NASA Astrophysics Data System (ADS)

    Berti, Matteo; Corsini, Alessandro; Franceschini, Silvia; Iannacone, Jean Pascal

    2013-04-01

    The application of space borne synthetic aperture radar interferometry has progressed, over the last two decades, from the pioneer use of single interferograms for analyzing changes on the earth's surface to the development of advanced multi-interferogram techniques to analyze any sort of natural phenomena which involves movements of the ground. The success of multi-interferograms techniques in the analysis of natural hazards such as landslides and subsidence is widely documented in the scientific literature and demonstrated by the consensus among the end-users. Despite the great potential of this technique, radar interpretation of slope movements is generally based on the sole analysis of average displacement velocities, while the information embraced in multi interferogram time series is often overlooked if not completely neglected. The underuse of PS time series is probably due to the detrimental effect of residual atmospheric errors, which make the PS time series characterized by erratic, irregular fluctuations often difficult to interpret, and also to the difficulty of performing a visual, supervised analysis of the time series for a large dataset. In this work is we present a procedure for automatic classification of PS time series based on a series of statistical characterization tests. The procedure allows to classify the time series into six distinctive target trends (0=uncorrelated; 1=linear; 2=quadratic; 3=bilinear; 4=discontinuous without constant velocity; 5=discontinuous with change in velocity) and retrieve for each trend a series of descriptive parameters which can be efficiently used to characterize the temporal changes of ground motion. The classification algorithms were developed and tested using an ENVISAT datasets available in the frame of EPRS-E project (Extraordinary Plan of Environmental Remote Sensing) of the Italian Ministry of Environment (track "Modena", Northern Apennines). This dataset was generated using standard processing, then the time series are typically affected by a significant noise to signal ratio. The results of the analysis show that even with such a rough-quality dataset, our automated classification procedure can greatly improve radar interpretation of mass movements. In general, uncorrelated PS (type 0) are concentrated in flat areas such as fluvial terraces and valley bottoms, and along stable watershed divides; linear PS (type 1) are mainly located on slopes (both inside or outside mapped landslides) or near the edge of scarps or steep slopes; non-linear PS (types 2 to 5) typically fall inside landslide deposits or in the surrounding areas. The spatial distribution of classified PS allows to detect deformation phenomena not visible by considering the average velocity alone, and provide important information on the temporal evolution of the phenomena such as acceleration, deceleration, seasonal fluctuations, abrupt or continuous changes of the displacement rate. Based on these encouraging results we integrated all the classification algorithms into a Graphical User Interface (called PSTime) which is freely available as a standalone application.

  1. Automated pre-processing and multivariate vibrational spectra analysis software for rapid results in clinical settings

    NASA Astrophysics Data System (ADS)

    Bhattacharjee, T.; Kumar, P.; Fillipe, L.

    2018-02-01

    Vibrational spectroscopy, especially FTIR and Raman, has shown enormous potential in disease diagnosis, especially in cancers. Their potential for detecting varied pathological conditions are regularly reported. However, to prove their applicability in clinics, large multi-center multi-national studies need to be undertaken; and these will result in enormous amount of data. A parallel effort to develop analytical methods, including user-friendly software that can quickly pre-process data and subject them to required multivariate analysis is warranted in order to obtain results in real time. This study reports a MATLAB based script that can automatically import data, preprocess spectra— interpolation, derivatives, normalization, and then carry out Principal Component Analysis (PCA) followed by Linear Discriminant Analysis (LDA) of the first 10 PCs; all with a single click. The software has been verified on data obtained from cell lines, animal models, and in vivo patient datasets, and gives results comparable to Minitab 16 software. The software can be used to import variety of file extensions, asc, .txt., .xls, and many others. Options to ignore noisy data, plot all possible graphs with PCA factors 1 to 5, and save loading factors, confusion matrices and other parameters are also present. The software can provide results for a dataset of 300 spectra within 0.01 s. We believe that the software will be vital not only in clinical trials using vibrational spectroscopic data, but also to obtain rapid results when these tools get translated into clinics.

  2. Video and accelerometer-based motion analysis for automated surgical skills assessment.

    PubMed

    Zia, Aneeq; Sharma, Yachna; Bettadapura, Vinay; Sarin, Eric L; Essa, Irfan

    2018-03-01

    Basic surgical skills of suturing and knot tying are an essential part of medical training. Having an automated system for surgical skills assessment could help save experts time and improve training efficiency. There have been some recent attempts at automated surgical skills assessment using either video analysis or acceleration data. In this paper, we present a novel approach for automated assessment of OSATS-like surgical skills and provide an analysis of different features on multi-modal data (video and accelerometer data). We conduct a large study for basic surgical skill assessment on a dataset that contained video and accelerometer data for suturing and knot-tying tasks. We introduce "entropy-based" features-approximate entropy and cross-approximate entropy, which quantify the amount of predictability and regularity of fluctuations in time series data. The proposed features are compared to existing methods of Sequential Motion Texture, Discrete Cosine Transform and Discrete Fourier Transform, for surgical skills assessment. We report average performance of different features across all applicable OSATS-like criteria for suturing and knot-tying tasks. Our analysis shows that the proposed entropy-based features outperform previous state-of-the-art methods using video data, achieving average classification accuracies of 95.1 and 92.2% for suturing and knot tying, respectively. For accelerometer data, our method performs better for suturing achieving 86.8% average accuracy. We also show that fusion of video and acceleration features can improve overall performance for skill assessment. Automated surgical skills assessment can be achieved with high accuracy using the proposed entropy features. Such a system can significantly improve the efficiency of surgical training in medical schools and teaching hospitals.

  3. 101 Labeled Brain Images and a Consistent Human Cortical Labeling Protocol

    PubMed Central

    Klein, Arno; Tourville, Jason

    2012-01-01

    We introduce the Mindboggle-101 dataset, the largest and most complete set of free, publicly accessible, manually labeled human brain images. To manually label the macroscopic anatomy in magnetic resonance images of 101 healthy participants, we created a new cortical labeling protocol that relies on robust anatomical landmarks and minimal manual edits after initialization with automated labels. The “Desikan–Killiany–Tourville” (DKT) protocol is intended to improve the ease, consistency, and accuracy of labeling human cortical areas. Given how difficult it is to label brains, the Mindboggle-101 dataset is intended to serve as brain atlases for use in labeling other brains, as a normative dataset to establish morphometric variation in a healthy population for comparison against clinical populations, and contribute to the development, training, testing, and evaluation of automated registration and labeling algorithms. To this end, we also introduce benchmarks for the evaluation of such algorithms by comparing our manual labels with labels automatically generated by probabilistic and multi-atlas registration-based approaches. All data and related software and updated information are available on the http://mindboggle.info/data website. PMID:23227001

  4. Automated retinal image quality assessment on the UK Biobank dataset for epidemiological studies.

    PubMed

    Welikala, R A; Fraz, M M; Foster, P J; Whincup, P H; Rudnicka, A R; Owen, C G; Strachan, D P; Barman, S A

    2016-04-01

    Morphological changes in the retinal vascular network are associated with future risk of many systemic and vascular diseases. However, uncertainty over the presence and nature of some of these associations exists. Analysis of data from large population based studies will help to resolve these uncertainties. The QUARTZ (QUantitative Analysis of Retinal vessel Topology and siZe) retinal image analysis system allows automated processing of large numbers of retinal images. However, an image quality assessment module is needed to achieve full automation. In this paper, we propose such an algorithm, which uses the segmented vessel map to determine the suitability of retinal images for use in the creation of vessel morphometric data suitable for epidemiological studies. This includes an effective 3-dimensional feature set and support vector machine classification. A random subset of 800 retinal images from UK Biobank (a large prospective study of 500,000 middle aged adults; where 68,151 underwent retinal imaging) was used to examine the performance of the image quality algorithm. The algorithm achieved a sensitivity of 95.33% and a specificity of 91.13% for the detection of inadequate images. The strong performance of this image quality algorithm will make rapid automated analysis of vascular morphometry feasible on the entire UK Biobank dataset (and other large retinal datasets), with minimal operator involvement, and at low cost. Copyright © 2016 Elsevier Ltd. All rights reserved.

  5. Correlative Tomography

    PubMed Central

    Burnett, T. L.; McDonald, S. A.; Gholinia, A.; Geurts, R.; Janus, M.; Slater, T.; Haigh, S. J.; Ornek, C.; Almuaili, F.; Engelberg, D. L.; Thompson, G. E.; Withers, P. J.

    2014-01-01

    Increasingly researchers are looking to bring together perspectives across multiple scales, or to combine insights from different techniques, for the same region of interest. To this end, correlative microscopy has already yielded substantial new insights in two dimensions (2D). Here we develop correlative tomography where the correlative task is somewhat more challenging because the volume of interest is typically hidden beneath the sample surface. We have threaded together x-ray computed tomography, serial section FIB-SEM tomography, electron backscatter diffraction and finally TEM elemental analysis all for the same 3D region. This has allowed observation of the competition between pitting corrosion and intergranular corrosion at multiple scales revealing the structural hierarchy, crystallography and chemistry of veiled corrosion pits in stainless steel. With automated correlative workflows and co-visualization of the multi-scale or multi-modal datasets the technique promises to provide insights across biological, geological and materials science that are impossible using either individual or multiple uncorrelated techniques. PMID:24736640

  6. Benchmark datasets for phylogenomic pipeline validation, applications for foodborne pathogen surveillance

    PubMed Central

    Rand, Hugh; Shumway, Martin; Trees, Eija K.; Simmons, Mustafa; Agarwala, Richa; Davis, Steven; Tillman, Glenn E.; Defibaugh-Chavez, Stephanie; Carleton, Heather A.; Klimke, William A.; Katz, Lee S.

    2017-01-01

    Background As next generation sequence technology has advanced, there have been parallel advances in genome-scale analysis programs for determining evolutionary relationships as proxies for epidemiological relationship in public health. Most new programs skip traditional steps of ortholog determination and multi-gene alignment, instead identifying variants across a set of genomes, then summarizing results in a matrix of single-nucleotide polymorphisms or alleles for standard phylogenetic analysis. However, public health authorities need to document the performance of these methods with appropriate and comprehensive datasets so they can be validated for specific purposes, e.g., outbreak surveillance. Here we propose a set of benchmark datasets to be used for comparison and validation of phylogenomic pipelines. Methods We identified four well-documented foodborne pathogen events in which the epidemiology was concordant with routine phylogenomic analyses (reference-based SNP and wgMLST approaches). These are ideal benchmark datasets, as the trees, WGS data, and epidemiological data for each are all in agreement. We have placed these sequence data, sample metadata, and “known” phylogenetic trees in publicly-accessible databases and developed a standard descriptive spreadsheet format describing each dataset. To facilitate easy downloading of these benchmarks, we developed an automated script that uses the standard descriptive spreadsheet format. Results Our “outbreak” benchmark datasets represent the four major foodborne bacterial pathogens (Listeria monocytogenes, Salmonella enterica, Escherichia coli, and Campylobacter jejuni) and one simulated dataset where the “known tree” can be accurately called the “true tree”. The downloading script and associated table files are available on GitHub: https://github.com/WGS-standards-and-analysis/datasets. Discussion These five benchmark datasets will help standardize comparison of current and future phylogenomic pipelines, and facilitate important cross-institutional collaborations. Our work is part of a global effort to provide collaborative infrastructure for sequence data and analytic tools—we welcome additional benchmark datasets in our recommended format, and, if relevant, we will add these on our GitHub site. Together, these datasets, dataset format, and the underlying GitHub infrastructure present a recommended path for worldwide standardization of phylogenomic pipelines. PMID:29372115

  7. Benchmark datasets for phylogenomic pipeline validation, applications for foodborne pathogen surveillance.

    PubMed

    Timme, Ruth E; Rand, Hugh; Shumway, Martin; Trees, Eija K; Simmons, Mustafa; Agarwala, Richa; Davis, Steven; Tillman, Glenn E; Defibaugh-Chavez, Stephanie; Carleton, Heather A; Klimke, William A; Katz, Lee S

    2017-01-01

    As next generation sequence technology has advanced, there have been parallel advances in genome-scale analysis programs for determining evolutionary relationships as proxies for epidemiological relationship in public health. Most new programs skip traditional steps of ortholog determination and multi-gene alignment, instead identifying variants across a set of genomes, then summarizing results in a matrix of single-nucleotide polymorphisms or alleles for standard phylogenetic analysis. However, public health authorities need to document the performance of these methods with appropriate and comprehensive datasets so they can be validated for specific purposes, e.g., outbreak surveillance. Here we propose a set of benchmark datasets to be used for comparison and validation of phylogenomic pipelines. We identified four well-documented foodborne pathogen events in which the epidemiology was concordant with routine phylogenomic analyses (reference-based SNP and wgMLST approaches). These are ideal benchmark datasets, as the trees, WGS data, and epidemiological data for each are all in agreement. We have placed these sequence data, sample metadata, and "known" phylogenetic trees in publicly-accessible databases and developed a standard descriptive spreadsheet format describing each dataset. To facilitate easy downloading of these benchmarks, we developed an automated script that uses the standard descriptive spreadsheet format. Our "outbreak" benchmark datasets represent the four major foodborne bacterial pathogens ( Listeria monocytogenes , Salmonella enterica , Escherichia coli , and Campylobacter jejuni ) and one simulated dataset where the "known tree" can be accurately called the "true tree". The downloading script and associated table files are available on GitHub: https://github.com/WGS-standards-and-analysis/datasets. These five benchmark datasets will help standardize comparison of current and future phylogenomic pipelines, and facilitate important cross-institutional collaborations. Our work is part of a global effort to provide collaborative infrastructure for sequence data and analytic tools-we welcome additional benchmark datasets in our recommended format, and, if relevant, we will add these on our GitHub site. Together, these datasets, dataset format, and the underlying GitHub infrastructure present a recommended path for worldwide standardization of phylogenomic pipelines.

  8. Automated anatomical labeling of bronchial branches using multiple classifiers and its application to bronchoscopy guidance based on fusion of virtual and real bronchoscopy

    NASA Astrophysics Data System (ADS)

    Ota, Shunsuke; Deguchi, Daisuke; Kitasaka, Takayuki; Mori, Kensaku; Suenaga, Yasuhito; Hasegawa, Yoshinori; Imaizumi, Kazuyoshi; Takabatake, Hirotsugu; Mori, Masaki; Natori, Hiroshi

    2008-03-01

    This paper presents a method for automated anatomical labeling of bronchial branches (ALBB) extracted from 3D CT datasets. The proposed method constructs classifiers that output anatomical names of bronchial branches by employing the machine-learning approach. We also present its application to a bronchoscopy guidance system. Since the bronchus has a complex tree structure, bronchoscopists easily tend to get disoriented and lose the way to a target location. A bronchoscopy guidance system is strongly expected to be developed to assist bronchoscopists. In such guidance system, automated presentation of anatomical names is quite useful information for bronchoscopy. Although several methods for automated ALBB were reported, most of them constructed models taking only variations of branching patterns into account and did not consider those of running directions. Since the running directions of bronchial branches differ greatly in individuals, they could not perform ALBB accurately when running directions of bronchial branches were different from those of models. Our method tries to solve such problems by utilizing the machine-learning approach. Actual procedure consists of three steps: (a) extraction of bronchial tree structures from 3D CT datasets, (b) construction of classifiers using the multi-class AdaBoost technique, and (c) automated classification of bronchial branches by using the constructed classifiers. We applied the proposed method to 51 cases of 3D CT datasets. The constructed classifiers were evaluated by leave-one-out scheme. The experimental results showed that the proposed method could assign correct anatomical names to bronchial branches of 89.1% up to segmental lobe branches. Also, we confirmed that it was quite useful to assist the bronchoscopy by presenting anatomical names of bronchial branches on real bronchoscopic views.

  9. The PREP pipeline: standardized preprocessing for large-scale EEG analysis.

    PubMed

    Bigdely-Shamlo, Nima; Mullen, Tim; Kothe, Christian; Su, Kyung-Min; Robbins, Kay A

    2015-01-01

    The technology to collect brain imaging and physiological measures has become portable and ubiquitous, opening the possibility of large-scale analysis of real-world human imaging. By its nature, such data is large and complex, making automated processing essential. This paper shows how lack of attention to the very early stages of an EEG preprocessing pipeline can reduce the signal-to-noise ratio and introduce unwanted artifacts into the data, particularly for computations done in single precision. We demonstrate that ordinary average referencing improves the signal-to-noise ratio, but that noisy channels can contaminate the results. We also show that identification of noisy channels depends on the reference and examine the complex interaction of filtering, noisy channel identification, and referencing. We introduce a multi-stage robust referencing scheme to deal with the noisy channel-reference interaction. We propose a standardized early-stage EEG processing pipeline (PREP) and discuss the application of the pipeline to more than 600 EEG datasets. The pipeline includes an automatically generated report for each dataset processed. Users can download the PREP pipeline as a freely available MATLAB library from http://eegstudy.org/prepcode.

  10. 'The surface management system' (SuMS) database: a surface-based database to aid cortical surface reconstruction, visualization and analysis

    NASA Technical Reports Server (NTRS)

    Dickson, J.; Drury, H.; Van Essen, D. C.

    2001-01-01

    Surface reconstructions of the cerebral cortex are increasingly widely used in the analysis and visualization of cortical structure, function and connectivity. From a neuroinformatics perspective, dealing with surface-related data poses a number of challenges. These include the multiplicity of configurations in which surfaces are routinely viewed (e.g. inflated maps, spheres and flat maps), plus the diversity of experimental data that can be represented on any given surface. To address these challenges, we have developed a surface management system (SuMS) that allows automated storage and retrieval of complex surface-related datasets. SuMS provides a systematic framework for the classification, storage and retrieval of many types of surface-related data and associated volume data. Within this classification framework, it serves as a version-control system capable of handling large numbers of surface and volume datasets. With built-in database management system support, SuMS provides rapid search and retrieval capabilities across all the datasets, while also incorporating multiple security levels to regulate access. SuMS is implemented in Java and can be accessed via a Web interface (WebSuMS) or using downloaded client software. Thus, SuMS is well positioned to act as a multiplatform, multi-user 'surface request broker' for the neuroscience community.

  11. CARES: Completely Automated Robust Edge Snapper for carotid ultrasound IMT measurement on a multi-institutional database of 300 images: a two stage system combining an intensity-based feature approach with first order absolute moments

    NASA Astrophysics Data System (ADS)

    Molinari, Filippo; Acharya, Rajendra; Zeng, Guang; Suri, Jasjit S.

    2011-03-01

    The carotid intima-media thickness (IMT) is the most used marker for the progression of atherosclerosis and onset of the cardiovascular diseases. Computer-aided measurements improve accuracy, but usually require user interaction. In this paper we characterized a new and completely automated technique for carotid segmentation and IMT measurement based on the merits of two previously developed techniques. We used an integrated approach of intelligent image feature extraction and line fitting for automatically locating the carotid artery in the image frame, followed by wall interfaces extraction based on Gaussian edge operator. We called our system - CARES. We validated the CARES on a multi-institutional database of 300 carotid ultrasound images. IMT measurement bias was 0.032 +/- 0.141 mm, better than other automated techniques and comparable to that of user-driven methodologies. Our novel approach of CARES processed 96% of the images leading to the figure of merit to be 95.7%. CARES ensured complete automation and high accuracy in IMT measurement; hence it could be a suitable clinical tool for processing of large datasets in multicenter studies involving atherosclerosis.pre-

  12. An overview of the U.S. Army Research Laboratory's Sensor Information Testbed for Collaborative Research Environment (SITCORE) and Automated Online Data Repository (AODR) capabilities

    NASA Astrophysics Data System (ADS)

    Ward, Dennis W.; Bennett, Kelly W.

    2017-05-01

    The Sensor Information Testbed COllaberative Research Environment (SITCORE) and the Automated Online Data Repository (AODR) are significant enablers of the U.S. Army Research Laboratory (ARL)'s Open Campus Initiative and together create a highly-collaborative research laboratory and testbed environment focused on sensor data and information fusion. SITCORE creates a virtual research development environment allowing collaboration from other locations, including DoD, industry, academia, and collation facilities. SITCORE combined with AODR provides end-toend algorithm development, experimentation, demonstration, and validation. The AODR enterprise allows the U.S. Army Research Laboratory (ARL), as well as other government organizations, industry, and academia to store and disseminate multiple intelligence (Multi-INT) datasets collected at field exercises and demonstrations, and to facilitate research and development (R and D), and advancement of analytical tools and algorithms supporting the Intelligence, Surveillance, and Reconnaissance (ISR) community. The AODR provides a potential central repository for standards compliant datasets to serve as the "go-to" location for lessons-learned and reference products. Many of the AODR datasets have associated ground truth and other metadata which provides a rich and robust data suite for researchers to develop, test, and refine their algorithms. Researchers download the test data to their own environments using a sophisticated web interface. The AODR allows researchers to request copies of stored datasets and for the government to process the requests and approvals in an automated fashion. Access to the AODR requires two-factor authentication in the form of a Common Access Card (CAC) or External Certificate Authority (ECA)

  13. Combining semi-automated image analysis techniques with machine learning algorithms to accelerate large-scale genetic studies.

    PubMed

    Atkinson, Jonathan A; Lobet, Guillaume; Noll, Manuel; Meyer, Patrick E; Griffiths, Marcus; Wells, Darren M

    2017-10-01

    Genetic analyses of plant root systems require large datasets of extracted architectural traits. To quantify such traits from images of root systems, researchers often have to choose between automated tools (that are prone to error and extract only a limited number of architectural traits) or semi-automated ones (that are highly time consuming). We trained a Random Forest algorithm to infer architectural traits from automatically extracted image descriptors. The training was performed on a subset of the dataset, then applied to its entirety. This strategy allowed us to (i) decrease the image analysis time by 73% and (ii) extract meaningful architectural traits based on image descriptors. We also show that these traits are sufficient to identify the quantitative trait loci that had previously been discovered using a semi-automated method. We have shown that combining semi-automated image analysis with machine learning algorithms has the power to increase the throughput of large-scale root studies. We expect that such an approach will enable the quantification of more complex root systems for genetic studies. We also believe that our approach could be extended to other areas of plant phenotyping. © The Authors 2017. Published by Oxford University Press.

  14. Combining semi-automated image analysis techniques with machine learning algorithms to accelerate large-scale genetic studies

    PubMed Central

    Atkinson, Jonathan A.; Lobet, Guillaume; Noll, Manuel; Meyer, Patrick E.; Griffiths, Marcus

    2017-01-01

    Abstract Genetic analyses of plant root systems require large datasets of extracted architectural traits. To quantify such traits from images of root systems, researchers often have to choose between automated tools (that are prone to error and extract only a limited number of architectural traits) or semi-automated ones (that are highly time consuming). We trained a Random Forest algorithm to infer architectural traits from automatically extracted image descriptors. The training was performed on a subset of the dataset, then applied to its entirety. This strategy allowed us to (i) decrease the image analysis time by 73% and (ii) extract meaningful architectural traits based on image descriptors. We also show that these traits are sufficient to identify the quantitative trait loci that had previously been discovered using a semi-automated method. We have shown that combining semi-automated image analysis with machine learning algorithms has the power to increase the throughput of large-scale root studies. We expect that such an approach will enable the quantification of more complex root systems for genetic studies. We also believe that our approach could be extended to other areas of plant phenotyping. PMID:29020748

  15. Large-Scale, Parallel, Multi-Sensor Atmospheric Data Fusion Using Cloud Computing

    NASA Astrophysics Data System (ADS)

    Wilson, B. D.; Manipon, G.; Hua, H.; Fetzer, E.

    2013-05-01

    NASA's Earth Observing System (EOS) is an ambitious facility for studying global climate change. The mandate now is to combine measurements from the instruments on the "A-Train" platforms (AIRS, AMSR-E, MODIS, MISR, MLS, and CloudSat) and other Earth probes to enable large-scale studies of climate change over decades. Moving to multi-sensor, long-duration analyses of important climate variables presents serious challenges for large-scale data mining and fusion. For example, one might want to compare temperature and water vapor retrievals from one instrument (AIRS) to another (MODIS), and to a model (ECMWF), stratify the comparisons using a classification of the "cloud scenes" from CloudSat, and repeat the entire analysis over 10 years of data. To efficiently assemble such datasets, we are utilizing Elastic Computing in the Cloud and parallel map/reduce-based algorithms. However, these problems are Data Intensive computing so the data transfer times and storage costs (for caching) are key issues. SciReduce is a Hadoop-like parallel analysis system, programmed in parallel python, that is designed from the ground up for Earth science. SciReduce executes inside VMWare images and scales to any number of nodes in the Cloud. Unlike Hadoop, SciReduce operates on bundles of named numeric arrays, which can be passed in memory or serialized to disk in netCDF4 or HDF5. Figure 1 shows the architecture of the full computational system, with SciReduce at the core. Multi-year datasets are automatically "sharded" by time and space across a cluster of nodes so that years of data (millions of files) can be processed in a massively parallel way. Input variables (arrays) are pulled on-demand into the Cloud using OPeNDAP URLs or other subsetting services, thereby minimizing the size of the cached input and intermediate datasets. We are using SciReduce to automate the production of multiple versions of a ten-year A-Train water vapor climatology under a NASA MEASURES grant. We will present the architecture of SciReduce, describe the achieved "clock time" speedups in fusing datasets on our own nodes and in the Cloud, and discuss the Cloud cost tradeoffs for storage, compute, and data transfer. We will also present a concept/prototype for staging NASA's A-Train Atmospheric datasets (Levels 2 & 3) in the Amazon Cloud so that any number of compute jobs can be executed "near" the multi-sensor data. Given such a system, multi-sensor climate studies over 10-20 years of data could be performed in an efficient way, with the researcher paying only his own Cloud compute bill.; Figure 1 -- Architecture.

  16. Large-Scale, Parallel, Multi-Sensor Atmospheric Data Fusion Using Cloud Computing

    NASA Astrophysics Data System (ADS)

    Wilson, B. D.; Manipon, G.; Hua, H.; Fetzer, E. J.

    2013-12-01

    NASA's Earth Observing System (EOS) is an ambitious facility for studying global climate change. The mandate now is to combine measurements from the instruments on the 'A-Train' platforms (AIRS, AMSR-E, MODIS, MISR, MLS, and CloudSat) and other Earth probes to enable large-scale studies of climate change over decades. Moving to multi-sensor, long-duration analyses of important climate variables presents serious challenges for large-scale data mining and fusion. For example, one might want to compare temperature and water vapor retrievals from one instrument (AIRS) to another (MODIS), and to a model (MERRA), stratify the comparisons using a classification of the 'cloud scenes' from CloudSat, and repeat the entire analysis over 10 years of data. To efficiently assemble such datasets, we are utilizing Elastic Computing in the Cloud and parallel map/reduce-based algorithms. However, these problems are Data Intensive computing so the data transfer times and storage costs (for caching) are key issues. SciReduce is a Hadoop-like parallel analysis system, programmed in parallel python, that is designed from the ground up for Earth science. SciReduce executes inside VMWare images and scales to any number of nodes in the Cloud. Unlike Hadoop, SciReduce operates on bundles of named numeric arrays, which can be passed in memory or serialized to disk in netCDF4 or HDF5. Figure 1 shows the architecture of the full computational system, with SciReduce at the core. Multi-year datasets are automatically 'sharded' by time and space across a cluster of nodes so that years of data (millions of files) can be processed in a massively parallel way. Input variables (arrays) are pulled on-demand into the Cloud using OPeNDAP URLs or other subsetting services, thereby minimizing the size of the cached input and intermediate datasets. We are using SciReduce to automate the production of multiple versions of a ten-year A-Train water vapor climatology under a NASA MEASURES grant. We will present the architecture of SciReduce, describe the achieved 'clock time' speedups in fusing datasets on our own compute nodes and in the public Cloud, and discuss the Cloud cost tradeoffs for storage, compute, and data transfer. We will also present a concept/prototype for staging NASA's A-Train Atmospheric datasets (Levels 2 & 3) in the Amazon Cloud so that any number of compute jobs can be executed 'near' the multi-sensor data. Given such a system, multi-sensor climate studies over 10-20 years of data could be performed in an efficient way, with the researcher paying only his own Cloud compute bill. SciReduce Architecture

  17. Towards more reliable automated multi-dose dispensing: retrospective follow-up study on medication dose errors and product defects.

    PubMed

    Palttala, Iida; Heinämäki, Jyrki; Honkanen, Outi; Suominen, Risto; Antikainen, Osmo; Hirvonen, Jouni; Yliruusi, Jouko

    2013-03-01

    To date, little is known on applicability of different types of pharmaceutical dosage forms in an automated high-speed multi-dose dispensing process. The purpose of the present study was to identify and further investigate various process-induced and/or product-related limitations associated with multi-dose dispensing process. The rates of product defects and dose dispensing errors in automated multi-dose dispensing were retrospectively investigated during a 6-months follow-up period. The study was based on the analysis of process data of totally nine automated high-speed multi-dose dispensing systems. Special attention was paid to the dependence of multi-dose dispensing errors/product defects and pharmaceutical tablet properties (such as shape, dimensions, weight, scored lines, coatings, etc.) to profile the most suitable forms of tablets for automated dose dispensing systems. The relationship between the risk of errors in dose dispensing and tablet characteristics were visualized by creating a principal component analysis (PCA) model for the outcome of dispensed tablets. The two most common process-induced failures identified in the multi-dose dispensing are predisposal of tablet defects and unexpected product transitions in the medication cassette (dose dispensing error). The tablet defects are product-dependent failures, while the tablet transitions are dependent on automated multi-dose dispensing systems used. The occurrence of tablet defects is approximately twice as common as tablet transitions. Optimal tablet preparation for the high-speed multi-dose dispensing would be a round-shaped, relatively small/middle-sized, film-coated tablet without any scored line. Commercial tablet products can be profiled and classified based on their suitability to a high-speed multi-dose dispensing process.

  18. Automation process for morphometric analysis of volumetric CT data from pulmonary vasculature in rats.

    PubMed

    Shingrani, Rahul; Krenz, Gary; Molthen, Robert

    2010-01-01

    With advances in medical imaging scanners, it has become commonplace to generate large multidimensional datasets. These datasets require tools for a rapid, thorough analysis. To address this need, we have developed an automated algorithm for morphometric analysis incorporating A Visualization Workshop computational and image processing libraries for three-dimensional segmentation, vascular tree generation and structural hierarchical ordering with a two-stage numeric optimization procedure for estimating vessel diameters. We combine this new technique with our mathematical models of pulmonary vascular morphology to quantify structural and functional attributes of lung arterial trees. Our physiological studies require repeated measurements of vascular structure to determine differences in vessel biomechanical properties between animal models of pulmonary disease. Automation provides many advantages including significantly improved speed and minimized operator interaction and biasing. The results are validated by comparison with previously published rat pulmonary arterial micro-CT data analysis techniques, in which vessels were manually mapped and measured using intense operator intervention. Published by Elsevier Ireland Ltd.

  19. Multi-categorical deep learning neural network to classify retinal images: A pilot study employing small database

    PubMed Central

    Seo, Jeong Gi; Kwak, Jiyong; Um, Terry Taewoong; Rim, Tyler Hyungtaek

    2017-01-01

    Deep learning emerges as a powerful tool for analyzing medical images. Retinal disease detection by using computer-aided diagnosis from fundus image has emerged as a new method. We applied deep learning convolutional neural network by using MatConvNet for an automated detection of multiple retinal diseases with fundus photographs involved in STructured Analysis of the REtina (STARE) database. Dataset was built by expanding data on 10 categories, including normal retina and nine retinal diseases. The optimal outcomes were acquired by using a random forest transfer learning based on VGG-19 architecture. The classification results depended greatly on the number of categories. As the number of categories increased, the performance of deep learning models was diminished. When all 10 categories were included, we obtained results with an accuracy of 30.5%, relative classifier information (RCI) of 0.052, and Cohen’s kappa of 0.224. Considering three integrated normal, background diabetic retinopathy, and dry age-related macular degeneration, the multi-categorical classifier showed accuracy of 72.8%, 0.283 RCI, and 0.577 kappa. In addition, several ensemble classifiers enhanced the multi-categorical classification performance. The transfer learning incorporated with ensemble classifier of clustering and voting approach presented the best performance with accuracy of 36.7%, 0.053 RCI, and 0.225 kappa in the 10 retinal diseases classification problem. First, due to the small size of datasets, the deep learning techniques in this study were ineffective to be applied in clinics where numerous patients suffering from various types of retinal disorders visit for diagnosis and treatment. Second, we found that the transfer learning incorporated with ensemble classifiers can improve the classification performance in order to detect multi-categorical retinal diseases. Further studies should confirm the effectiveness of algorithms with large datasets obtained from hospitals. PMID:29095872

  20. Designing an End-to-End System for Data Storage, Analysis, and Visualization for an Urban Environmental Observatory

    NASA Astrophysics Data System (ADS)

    McGuire, M. P.; Welty, C.; Gangopadhyay, A.; Karabatis, G.; Chen, Z.

    2006-05-01

    The urban environment is formed by complex interactions between natural and human dominated systems, the study of which requires the collection and analysis of very large datasets that span many disciplines. Recent advances in sensor technology and automated data collection have improved the ability to monitor urban environmental systems and are making the idea of an urban environmental observatory a reality. This in turn has created a number of potential challenges in data management and analysis. We present the design of an end-to-end system to store, analyze, and visualize data from a prototype urban environmental observatory based at the Baltimore Ecosystem Study, a National Science Foundation Long Term Ecological Research site (BES LTER). We first present an object-relational design of an operational database to store high resolution spatial datasets as well as data from sensor networks, archived data from the BES LTER, data from external sources such as USGS NWIS, EPA Storet, and metadata. The second component of the system design includes a spatiotemporal data warehouse consisting of a data staging plan and a multidimensional data model designed for the spatiotemporal analysis of monitoring data. The system design also includes applications for multi-resolution exploratory data analysis, multi-resolution data mining, and spatiotemporal visualization based on the spatiotemporal data warehouse. Also the system design includes interfaces with water quality models such as HSPF, SWMM, and SWAT, and applications for real-time sensor network visualization, data discovery, data download, QA/QC, and backup and recovery, all of which are based on the operational database. The system design includes both internet and workstation-based interfaces. Finally we present the design of a laboratory for spatiotemporal analysis and visualization as well as real-time monitoring of the sensor network.

  1. Spectral multi-energy CT texture analysis with machine learning for tissue classification: an investigation using classification of benign parotid tumours as a testing paradigm.

    PubMed

    Al Ajmi, Eiman; Forghani, Behzad; Reinhold, Caroline; Bayat, Maryam; Forghani, Reza

    2018-06-01

    There is a rich amount of quantitative information in spectral datasets generated from dual-energy CT (DECT). In this study, we compare the performance of texture analysis performed on multi-energy datasets to that of virtual monochromatic images (VMIs) at 65 keV only, using classification of the two most common benign parotid neoplasms as a testing paradigm. Forty-two patients with pathologically proven Warthin tumour (n = 25) or pleomorphic adenoma (n = 17) were evaluated. Texture analysis was performed on VMIs ranging from 40 to 140 keV in 5-keV increments (multi-energy analysis) or 65-keV VMIs only, which is typically considered equivalent to single-energy CT. Random forest (RF) models were constructed for outcome prediction using separate randomly selected training and testing sets or the entire patient set. Using multi-energy texture analysis, tumour classification in the independent testing set had accuracy, sensitivity, specificity, positive predictive value, and negative predictive value of 92%, 86%, 100%, 100%, and 83%, compared to 75%, 57%, 100%, 100%, and 63%, respectively, for single-energy analysis. Multi-energy texture analysis demonstrates superior performance compared to single-energy texture analysis of VMIs at 65 keV for classification of benign parotid tumours. • We present and validate a paradigm for texture analysis of DECT scans. • Multi-energy dataset texture analysis is superior to single-energy dataset texture analysis. • DECT texture analysis has high accura\\cy for diagnosis of benign parotid tumours. • DECT texture analysis with machine learning can enhance non-invasive diagnostic tumour evaluation.

  2. Large-Scale, Parallel, Multi-Sensor Data Fusion in the Cloud

    NASA Astrophysics Data System (ADS)

    Wilson, B. D.; Manipon, G.; Hua, H.

    2012-12-01

    NASA's Earth Observing System (EOS) is an ambitious facility for studying global climate change. The mandate now is to combine measurements from the instruments on the "A-Train" platforms (AIRS, AMSR-E, MODIS, MISR, MLS, and CloudSat) and other Earth probes to enable large-scale studies of climate change over periods of years to decades. However, moving from predominantly single-instrument studies to a multi-sensor, measurement-based model for long-duration analysis of important climate variables presents serious challenges for large-scale data mining and data fusion. For example, one might want to compare temperature and water vapor retrievals from one instrument (AIRS) to another instrument (MODIS), and to a model (ECMWF), stratify the comparisons using a classification of the "cloud scenes" from CloudSat, and repeat the entire analysis over years of AIRS data. To perform such an analysis, one must discover & access multiple datasets from remote sites, find the space/time "matchups" between instruments swaths and model grids, understand the quality flags and uncertainties for retrieved physical variables, assemble merged datasets, and compute fused products for further scientific and statistical analysis. To efficiently assemble such decade-scale datasets in a timely manner, we are utilizing Elastic Computing in the Cloud and parallel map/reduce-based algorithms. "SciReduce" is a Hadoop-like parallel analysis system, programmed in parallel python, that is designed from the ground up for Earth science. SciReduce executes inside VMWare images and scales to any number of nodes in the Cloud. Unlike Hadoop, in which simple tuples (keys & values) are passed between the map and reduce functions, SciReduce operates on bundles of named numeric arrays, which can be passed in memory or serialized to disk in netCDF4 or HDF5. Thus, SciReduce uses the native datatypes (geolocated grids, swaths, and points) that geo-scientists are familiar with. We are deploying within SciReduce a versatile set of python operators for data lookup, access, subsetting, co-registration, mining, fusion, and statistical analysis. All operators take in sets of geo-located arrays and generate more arrays. Large, multi-year satellite and model datasets are automatically "sharded" by time and space across a cluster of nodes so that years of data (millions of granules) can be compared or fused in a massively parallel way. Input variables (arrays) are pulled on-demand into the Cloud using OPeNDAP or webification URLs, thereby minimizing the size of the stored input and intermediate datasets. A typical map function might assemble and quality control AIRS Level-2 water vapor profiles for a year of data in parallel, then a reduce function would average the profiles in lat/lon bins (again, in parallel), and a final reduce would aggregate the climatology and write it to output files. We are using SciReduce to automate the production of multiple versions of a multi-year water vapor climatology (AIRS & MODIS), stratified by Cloudsat cloud classification, and compare it to models (ECMWF & MERRA reanalysis). We will present the architecture of SciReduce, describe the achieved "clock time" speedups in fusing huge datasets on our own nodes and in the Amazon Cloud, and discuss the Cloud cost tradeoffs for storage, compute, and data transfer.

  3. Large-Scale, Parallel, Multi-Sensor Data Fusion in the Cloud

    NASA Astrophysics Data System (ADS)

    Wilson, B.; Manipon, G.; Hua, H.

    2012-04-01

    NASA's Earth Observing System (EOS) is an ambitious facility for studying global climate change. The mandate now is to combine measurements from the instruments on the "A-Train" platforms (AIRS, AMSR-E, MODIS, MISR, MLS, and CloudSat) and other Earth probes to enable large-scale studies of climate change over periods of years to decades. However, moving from predominantly single-instrument studies to a multi-sensor, measurement-based model for long-duration analysis of important climate variables presents serious challenges for large-scale data mining and data fusion. For example, one might want to compare temperature and water vapor retrievals from one instrument (AIRS) to another instrument (MODIS), and to a model (ECMWF), stratify the comparisons using a classification of the "cloud scenes" from CloudSat, and repeat the entire analysis over years of AIRS data. To perform such an analysis, one must discover & access multiple datasets from remote sites, find the space/time "matchups" between instruments swaths and model grids, understand the quality flags and uncertainties for retrieved physical variables, assemble merged datasets, and compute fused products for further scientific and statistical analysis. To efficiently assemble such decade-scale datasets in a timely manner, we are utilizing Elastic Computing in the Cloud and parallel map/reduce-based algorithms. "SciReduce" is a Hadoop-like parallel analysis system, programmed in parallel python, that is designed from the ground up for Earth science. SciReduce executes inside VMWare images and scales to any number of nodes in the Cloud. Unlike Hadoop, in which simple tuples (keys & values) are passed between the map and reduce functions, SciReduce operates on bundles of named numeric arrays, which can be passed in memory or serialized to disk in netCDF4 or HDF5. Thus, SciReduce uses the native datatypes (geolocated grids, swaths, and points) that geo-scientists are familiar with. We are deploying within SciReduce a versatile set of python operators for data lookup, access, subsetting, co-registration, mining, fusion, and statistical analysis. All operators take in sets of geo-arrays and generate more arrays. Large, multi-year satellite and model datasets are automatically "sharded" by time and space across a cluster of nodes so that years of data (millions of granules) can be compared or fused in a massively parallel way. Input variables (arrays) are pulled on-demand into the Cloud using OPeNDAP or webification URLs, thereby minimizing the size of the stored input and intermediate datasets. A typical map function might assemble and quality control AIRS Level-2 water vapor profiles for a year of data in parallel, then a reduce function would average the profiles in bins (again, in parallel), and a final reduce would aggregate the climatology and write it to output files. We are using SciReduce to automate the production of multiple versions of a multi-year water vapor climatology (AIRS & MODIS), stratified by Cloudsat cloud classification, and compare it to models (ECMWF & MERRA reanalysis). We will present the architecture of SciReduce, describe the achieved "clock time" speedups in fusing huge datasets on our own nodes and in the Amazon Cloud, and discuss the Cloud cost tradeoffs for storage, compute, and data transfer.

  4. Automated road network extraction from high spatial resolution multi-spectral imagery

    NASA Astrophysics Data System (ADS)

    Zhang, Qiaoping

    For the last three decades, the Geomatics Engineering and Computer Science communities have considered automated road network extraction from remotely-sensed imagery to be a challenging and important research topic. The main objective of this research is to investigate the theory and methodology of automated feature extraction for image-based road database creation, refinement or updating, and to develop a series of algorithms for road network extraction from high resolution multi-spectral imagery. The proposed framework for road network extraction from multi-spectral imagery begins with an image segmentation using the k-means algorithm. This step mainly concerns the exploitation of the spectral information for feature extraction. The road cluster is automatically identified using a fuzzy classifier based on a set of predefined road surface membership functions. These membership functions are established based on the general spectral signature of road pavement materials and the corresponding normalized digital numbers on each multi-spectral band. Shape descriptors of the Angular Texture Signature are defined and used to reduce the misclassifications between roads and other spectrally similar objects (e.g., crop fields, parking lots, and buildings). An iterative and localized Radon transform is developed for the extraction of road centerlines from the classified images. The purpose of the transform is to accurately and completely detect the road centerlines. It is able to find short, long, and even curvilinear lines. The input image is partitioned into a set of subset images called road component images. An iterative Radon transform is locally applied to each road component image. At each iteration, road centerline segments are detected based on an accurate estimation of the line parameters and line widths. Three localization approaches are implemented and compared using qualitative and quantitative methods. Finally, the road centerline segments are grouped into a road network. The extracted road network is evaluated against a reference dataset using a line segment matching algorithm. The entire process is unsupervised and fully automated. Based on extensive experimentation on a variety of remotely-sensed multi-spectral images, the proposed methodology achieves a moderate success in automating road network extraction from high spatial resolution multi-spectral imagery.

  5. MAAMD: a workflow to standardize meta-analyses and comparison of affymetrix microarray data

    PubMed Central

    2014-01-01

    Background Mandatory deposit of raw microarray data files for public access, prior to study publication, provides significant opportunities to conduct new bioinformatics analyses within and across multiple datasets. Analysis of raw microarray data files (e.g. Affymetrix CEL files) can be time consuming, complex, and requires fundamental computational and bioinformatics skills. The development of analytical workflows to automate these tasks simplifies the processing of, improves the efficiency of, and serves to standardize multiple and sequential analyses. Once installed, workflows facilitate the tedious steps required to run rapid intra- and inter-dataset comparisons. Results We developed a workflow to facilitate and standardize Meta-Analysis of Affymetrix Microarray Data analysis (MAAMD) in Kepler. Two freely available stand-alone software tools, R and AltAnalyze were embedded in MAAMD. The inputs of MAAMD are user-editable csv files, which contain sample information and parameters describing the locations of input files and required tools. MAAMD was tested by analyzing 4 different GEO datasets from mice and drosophila. MAAMD automates data downloading, data organization, data quality control assesment, differential gene expression analysis, clustering analysis, pathway visualization, gene-set enrichment analysis, and cross-species orthologous-gene comparisons. MAAMD was utilized to identify gene orthologues responding to hypoxia or hyperoxia in both mice and drosophila. The entire set of analyses for 4 datasets (34 total microarrays) finished in ~ one hour. Conclusions MAAMD saves time, minimizes the required computer skills, and offers a standardized procedure for users to analyze microarray datasets and make new intra- and inter-dataset comparisons. PMID:24621103

  6. Gene ARMADA: an integrated multi-analysis platform for microarray data implemented in MATLAB.

    PubMed

    Chatziioannou, Aristotelis; Moulos, Panagiotis; Kolisis, Fragiskos N

    2009-10-27

    The microarray data analysis realm is ever growing through the development of various tools, open source and commercial. However there is absence of predefined rational algorithmic analysis workflows or batch standardized processing to incorporate all steps, from raw data import up to the derivation of significantly differentially expressed gene lists. This absence obfuscates the analytical procedure and obstructs the massive comparative processing of genomic microarray datasets. Moreover, the solutions provided, heavily depend on the programming skills of the user, whereas in the case of GUI embedded solutions, they do not provide direct support of various raw image analysis formats or a versatile and simultaneously flexible combination of signal processing methods. We describe here Gene ARMADA (Automated Robust MicroArray Data Analysis), a MATLAB implemented platform with a Graphical User Interface. This suite integrates all steps of microarray data analysis including automated data import, noise correction and filtering, normalization, statistical selection of differentially expressed genes, clustering, classification and annotation. In its current version, Gene ARMADA fully supports 2 coloured cDNA and Affymetrix oligonucleotide arrays, plus custom arrays for which experimental details are given in tabular form (Excel spreadsheet, comma separated values, tab-delimited text formats). It also supports the analysis of already processed results through its versatile import editor. Besides being fully automated, Gene ARMADA incorporates numerous functionalities of the Statistics and Bioinformatics Toolboxes of MATLAB. In addition, it provides numerous visualization and exploration tools plus customizable export data formats for seamless integration by other analysis tools or MATLAB, for further processing. Gene ARMADA requires MATLAB 7.4 (R2007a) or higher and is also distributed as a stand-alone application with MATLAB Component Runtime. Gene ARMADA provides a highly adaptable, integrative, yet flexible tool which can be used for automated quality control, analysis, annotation and visualization of microarray data, constituting a starting point for further data interpretation and integration with numerous other tools.

  7. A Fully-Automated Subcortical and Ventricular Shape Generation Pipeline Preserving Smoothness and Anatomical Topology

    PubMed Central

    Tang, Xiaoying; Luo, Yuan; Chen, Zhibin; Huang, Nianwei; Johnson, Hans J.; Paulsen, Jane S.; Miller, Michael I.

    2018-01-01

    In this paper, we present a fully-automated subcortical and ventricular shape generation pipeline that acts on structural magnetic resonance images (MRIs) of the human brain. Principally, the proposed pipeline consists of three steps: (1) automated structure segmentation using the diffeomorphic multi-atlas likelihood-fusion algorithm; (2) study-specific shape template creation based on the Delaunay triangulation; (3) deformation-based shape filtering using the large deformation diffeomorphic metric mapping for surfaces. The proposed pipeline is shown to provide high accuracy, sufficient smoothness, and accurate anatomical topology. Two datasets focused upon Huntington's disease (HD) were used for evaluating the performance of the proposed pipeline. The first of these contains a total of 16 MRI scans, each with a gold standard available, on which the proposed pipeline's outputs were observed to be highly accurate and smooth when compared with the gold standard. Visual examinations and outlier analyses on the second dataset, which contains a total of 1,445 MRI scans, revealed 100% success rates for the putamen, the thalamus, the globus pallidus, the amygdala, and the lateral ventricle in both hemispheres and rates no smaller than 97% for the bilateral hippocampus and caudate. Another independent dataset, consisting of 15 atlas images and 20 testing images, was also used to quantitatively evaluate the proposed pipeline, with high accuracy having been obtained. In short, the proposed pipeline is herein demonstrated to be effective, both quantitatively and qualitatively, using a large collection of MRI scans. PMID:29867332

  8. A Fully-Automated Subcortical and Ventricular Shape Generation Pipeline Preserving Smoothness and Anatomical Topology.

    PubMed

    Tang, Xiaoying; Luo, Yuan; Chen, Zhibin; Huang, Nianwei; Johnson, Hans J; Paulsen, Jane S; Miller, Michael I

    2018-01-01

    In this paper, we present a fully-automated subcortical and ventricular shape generation pipeline that acts on structural magnetic resonance images (MRIs) of the human brain. Principally, the proposed pipeline consists of three steps: (1) automated structure segmentation using the diffeomorphic multi-atlas likelihood-fusion algorithm; (2) study-specific shape template creation based on the Delaunay triangulation; (3) deformation-based shape filtering using the large deformation diffeomorphic metric mapping for surfaces. The proposed pipeline is shown to provide high accuracy, sufficient smoothness, and accurate anatomical topology. Two datasets focused upon Huntington's disease (HD) were used for evaluating the performance of the proposed pipeline. The first of these contains a total of 16 MRI scans, each with a gold standard available, on which the proposed pipeline's outputs were observed to be highly accurate and smooth when compared with the gold standard. Visual examinations and outlier analyses on the second dataset, which contains a total of 1,445 MRI scans, revealed 100% success rates for the putamen, the thalamus, the globus pallidus, the amygdala, and the lateral ventricle in both hemispheres and rates no smaller than 97% for the bilateral hippocampus and caudate. Another independent dataset, consisting of 15 atlas images and 20 testing images, was also used to quantitatively evaluate the proposed pipeline, with high accuracy having been obtained. In short, the proposed pipeline is herein demonstrated to be effective, both quantitatively and qualitatively, using a large collection of MRI scans.

  9. Machine Learning, Sentiment Analysis, and Tweets: An Examination of Alzheimer's Disease Stigma on Twitter.

    PubMed

    Oscar, Nels; Fox, Pamela A; Croucher, Racheal; Wernick, Riana; Keune, Jessica; Hooker, Karen

    2017-09-01

    Social scientists need practical methods for harnessing large, publicly available datasets that inform the social context of aging. We describe our development of a semi-automated text coding method and use a content analysis of Alzheimer's disease (AD) and dementia portrayal on Twitter to demonstrate its use. The approach improves feasibility of examining large publicly available datasets. Machine learning techniques modeled stigmatization expressed in 31,150 AD-related tweets collected via Twitter's search API based on 9 AD-related keywords. Two researchers manually coded 311 random tweets on 6 dimensions. This input from 1% of the dataset was used to train a classifier against the tweet text and code the remaining 99% of the dataset. Our automated process identified that 21.13% of the AD-related tweets used AD-related keywords to perpetuate public stigma, which could impact stereotypes and negative expectations for individuals with the disease and increase "excess disability". This technique could be applied to questions in social gerontology related to how social media outlets reflect and shape attitudes bearing on other developmental outcomes. Recommendations for the collection and analysis of large Twitter datasets are discussed. © The Author 2017. Published by Oxford University Press on behalf of The Gerontological Society of America. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  10. An open, multi-vendor, multi-field-strength brain MR dataset and analysis of publicly available skull stripping methods agreement.

    PubMed

    Souza, Roberto; Lucena, Oeslle; Garrafa, Julia; Gobbi, David; Saluzzi, Marina; Appenzeller, Simone; Rittner, Letícia; Frayne, Richard; Lotufo, Roberto

    2018-04-15

    This paper presents an open, multi-vendor, multi-field strength magnetic resonance (MR) T1-weighted volumetric brain imaging dataset, named Calgary-Campinas-359 (CC-359). The dataset is composed of images of older healthy adults (29-80 years) acquired on scanners from three vendors (Siemens, Philips and General Electric) at both 1.5 T and 3 T. CC-359 is comprised of 359 datasets, approximately 60 subjects per vendor and magnetic field strength. The dataset is approximately age and gender balanced, subject to the constraints of the available images. It provides consensus brain extraction masks for all volumes generated using supervised classification. Manual segmentation results for twelve randomly selected subjects performed by an expert are also provided. The CC-359 dataset allows investigation of 1) the influences of both vendor and magnetic field strength on quantitative analysis of brain MR; 2) parameter optimization for automatic segmentation methods; and potentially 3) machine learning classifiers with big data, specifically those based on deep learning methods, as these approaches require a large amount of data. To illustrate the utility of this dataset, we compared to the results of a supervised classifier, the results of eight publicly available skull stripping methods and one publicly available consensus algorithm. A linear mixed effects model analysis indicated that vendor (p-value<0.001) and magnetic field strength (p-value<0.001) have statistically significant impacts on skull stripping results. Copyright © 2017 Elsevier Inc. All rights reserved.

  11. Farseer-NMR: automatic treatment, analysis and plotting of large, multi-variable NMR data.

    PubMed

    Teixeira, João M C; Skinner, Simon P; Arbesú, Miguel; Breeze, Alexander L; Pons, Miquel

    2018-05-11

    We present Farseer-NMR ( https://git.io/vAueU ), a software package to treat, evaluate and combine NMR spectroscopic data from sets of protein-derived peaklists covering a range of experimental conditions. The combined advances in NMR and molecular biology enable the study of complex biomolecular systems such as flexible proteins or large multibody complexes, which display a strong and functionally relevant response to their environmental conditions, e.g. the presence of ligands, site-directed mutations, post translational modifications, molecular crowders or the chemical composition of the solution. These advances have created a growing need to analyse those systems' responses to multiple variables. The combined analysis of NMR peaklists from large and multivariable datasets has become a new bottleneck in the NMR analysis pipeline, whereby information-rich NMR-derived parameters have to be manually generated, which can be tedious, repetitive and prone to human error, or even unfeasible for very large datasets. There is a persistent gap in the development and distribution of software focused on peaklist treatment, analysis and representation, and specifically able to handle large multivariable datasets, which are becoming more commonplace. In this regard, Farseer-NMR aims to close this longstanding gap in the automated NMR user pipeline and, altogether, reduce the time burden of analysis of large sets of peaklists from days/weeks to seconds/minutes. We have implemented some of the most common, as well as new, routines for calculation of NMR parameters and several publication-quality plotting templates to improve NMR data representation. Farseer-NMR has been written entirely in Python and its modular code base enables facile extension.

  12. Automated daily quality control analysis for mammography in a multi-unit imaging center.

    PubMed

    Sundell, Veli-Matti; Mäkelä, Teemu; Meaney, Alexander; Kaasalainen, Touko; Savolainen, Sauli

    2018-01-01

    Background The high requirements for mammography image quality necessitate a systematic quality assurance process. Digital imaging allows automation of the image quality analysis, which can potentially improve repeatability and objectivity compared to a visual evaluation made by the users. Purpose To develop an automatic image quality analysis software for daily mammography quality control in a multi-unit imaging center. Material and Methods An automated image quality analysis software using the discrete wavelet transform and multiresolution analysis was developed for the American College of Radiology accreditation phantom. The software was validated by analyzing 60 randomly selected phantom images from six mammography systems and 20 phantom images with different dose levels from one mammography system. The results were compared to a visual analysis made by four reviewers. Additionally, long-term image quality trends of a full-field digital mammography system and a computed radiography mammography system were investigated. Results The automated software produced feature detection levels comparable to visual analysis. The agreement was good in the case of fibers, while the software detected somewhat more microcalcifications and characteristic masses. Long-term follow-up via a quality assurance web portal demonstrated the feasibility of using the software for monitoring the performance of mammography systems in a multi-unit imaging center. Conclusion Automated image quality analysis enables monitoring the performance of digital mammography systems in an efficient, centralized manner.

  13. The PREP pipeline: standardized preprocessing for large-scale EEG analysis

    PubMed Central

    Bigdely-Shamlo, Nima; Mullen, Tim; Kothe, Christian; Su, Kyung-Min; Robbins, Kay A.

    2015-01-01

    The technology to collect brain imaging and physiological measures has become portable and ubiquitous, opening the possibility of large-scale analysis of real-world human imaging. By its nature, such data is large and complex, making automated processing essential. This paper shows how lack of attention to the very early stages of an EEG preprocessing pipeline can reduce the signal-to-noise ratio and introduce unwanted artifacts into the data, particularly for computations done in single precision. We demonstrate that ordinary average referencing improves the signal-to-noise ratio, but that noisy channels can contaminate the results. We also show that identification of noisy channels depends on the reference and examine the complex interaction of filtering, noisy channel identification, and referencing. We introduce a multi-stage robust referencing scheme to deal with the noisy channel-reference interaction. We propose a standardized early-stage EEG processing pipeline (PREP) and discuss the application of the pipeline to more than 600 EEG datasets. The pipeline includes an automatically generated report for each dataset processed. Users can download the PREP pipeline as a freely available MATLAB library from http://eegstudy.org/prepcode. PMID:26150785

  14. Performance of an Artificial Multi-observer Deep Neural Network for Fully Automated Segmentation of Polycystic Kidneys.

    PubMed

    Kline, Timothy L; Korfiatis, Panagiotis; Edwards, Marie E; Blais, Jaime D; Czerwiec, Frank S; Harris, Peter C; King, Bernard F; Torres, Vicente E; Erickson, Bradley J

    2017-08-01

    Deep learning techniques are being rapidly applied to medical imaging tasks-from organ and lesion segmentation to tissue and tumor classification. These techniques are becoming the leading algorithmic approaches to solve inherently difficult image processing tasks. Currently, the most critical requirement for successful implementation lies in the need for relatively large datasets that can be used for training the deep learning networks. Based on our initial studies of MR imaging examinations of the kidneys of patients affected by polycystic kidney disease (PKD), we have generated a unique database of imaging data and corresponding reference standard segmentations of polycystic kidneys. In the study of PKD, segmentation of the kidneys is needed in order to measure total kidney volume (TKV). Automated methods to segment the kidneys and measure TKV are needed to increase measurement throughput and alleviate the inherent variability of human-derived measurements. We hypothesize that deep learning techniques can be leveraged to perform fast, accurate, reproducible, and fully automated segmentation of polycystic kidneys. Here, we describe a fully automated approach for segmenting PKD kidneys within MR images that simulates a multi-observer approach in order to create an accurate and robust method for the task of segmentation and computation of TKV for PKD patients. A total of 2000 cases were used for training and validation, and 400 cases were used for testing. The multi-observer ensemble method had mean ± SD percent volume difference of 0.68 ± 2.2% compared with the reference standard segmentations. The complete framework performs fully automated segmentation at a level comparable with interobserver variability and could be considered as a replacement for the task of segmentation of PKD kidneys by a human.

  15. Image segmentation evaluation for very-large datasets

    NASA Astrophysics Data System (ADS)

    Reeves, Anthony P.; Liu, Shuang; Xie, Yiting

    2016-03-01

    With the advent of modern machine learning methods and fully automated image analysis there is a need for very large image datasets having documented segmentations for both computer algorithm training and evaluation. Current approaches of visual inspection and manual markings do not scale well to big data. We present a new approach that depends on fully automated algorithm outcomes for segmentation documentation, requires no manual marking, and provides quantitative evaluation for computer algorithms. The documentation of new image segmentations and new algorithm outcomes are achieved by visual inspection. The burden of visual inspection on large datasets is minimized by (a) customized visualizations for rapid review and (b) reducing the number of cases to be reviewed through analysis of quantitative segmentation evaluation. This method has been applied to a dataset of 7,440 whole-lung CT images for 6 different segmentation algorithms designed to fully automatically facilitate the measurement of a number of very important quantitative image biomarkers. The results indicate that we could achieve 93% to 99% successful segmentation for these algorithms on this relatively large image database. The presented evaluation method may be scaled to much larger image databases.

  16. Development and Experimental Evaluation of an Automated Multi-Media Course on Transistors.

    ERIC Educational Resources Information Center

    Whitted, J.H., Jr.; And Others

    A completely automated multi-media self-study program for teaching a portion of electronic solid-state fundamentals was developed. The subject matter areas included were fundamental theory of transistors, transistor amplifier fundamentals, and simple mathematical analysis of transistors including equivalent circuits, parameters, and characteristic…

  17. Remote visual analysis of large turbulence databases at multiple scales

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Pulido, Jesus; Livescu, Daniel; Kanov, Kalin

    The remote analysis and visualization of raw large turbulence datasets is challenging. Current accurate direct numerical simulations (DNS) of turbulent flows generate datasets with billions of points per time-step and several thousand time-steps per simulation. Until recently, the analysis and visualization of such datasets was restricted to scientists with access to large supercomputers. The public Johns Hopkins Turbulence database simplifies access to multi-terabyte turbulence datasets and facilitates the computation of statistics and extraction of features through the use of commodity hardware. In this paper, we present a framework designed around wavelet-based compression for high-speed visualization of large datasets and methodsmore » supporting multi-resolution analysis of turbulence. By integrating common technologies, this framework enables remote access to tools available on supercomputers and over 230 terabytes of DNS data over the Web. Finally, the database toolset is expanded by providing access to exploratory data analysis tools, such as wavelet decomposition capabilities and coherent feature extraction.« less

  18. Remote visual analysis of large turbulence databases at multiple scales

    DOE PAGES

    Pulido, Jesus; Livescu, Daniel; Kanov, Kalin; ...

    2018-06-15

    The remote analysis and visualization of raw large turbulence datasets is challenging. Current accurate direct numerical simulations (DNS) of turbulent flows generate datasets with billions of points per time-step and several thousand time-steps per simulation. Until recently, the analysis and visualization of such datasets was restricted to scientists with access to large supercomputers. The public Johns Hopkins Turbulence database simplifies access to multi-terabyte turbulence datasets and facilitates the computation of statistics and extraction of features through the use of commodity hardware. In this paper, we present a framework designed around wavelet-based compression for high-speed visualization of large datasets and methodsmore » supporting multi-resolution analysis of turbulence. By integrating common technologies, this framework enables remote access to tools available on supercomputers and over 230 terabytes of DNS data over the Web. Finally, the database toolset is expanded by providing access to exploratory data analysis tools, such as wavelet decomposition capabilities and coherent feature extraction.« less

  19. Large-scale image region documentation for fully automated image biomarker algorithm development and evaluation.

    PubMed

    Reeves, Anthony P; Xie, Yiting; Liu, Shuang

    2017-04-01

    With the advent of fully automated image analysis and modern machine learning methods, there is a need for very large image datasets having documented segmentations for both computer algorithm training and evaluation. This paper presents a method and implementation for facilitating such datasets that addresses the critical issue of size scaling for algorithm validation and evaluation; current evaluation methods that are usually used in academic studies do not scale to large datasets. This method includes protocols for the documentation of many regions in very large image datasets; the documentation may be incrementally updated by new image data and by improved algorithm outcomes. This method has been used for 5 years in the context of chest health biomarkers from low-dose chest CT images that are now being used with increasing frequency in lung cancer screening practice. The lung scans are segmented into over 100 different anatomical regions, and the method has been applied to a dataset of over 20,000 chest CT images. Using this framework, the computer algorithms have been developed to achieve over 90% acceptable image segmentation on the complete dataset.

  20. Semi-automated brain tumor segmentation on multi-parametric MRI using regularized non-negative matrix factorization.

    PubMed

    Sauwen, Nicolas; Acou, Marjan; Sima, Diana M; Veraart, Jelle; Maes, Frederik; Himmelreich, Uwe; Achten, Eric; Huffel, Sabine Van

    2017-05-04

    Segmentation of gliomas in multi-parametric (MP-)MR images is challenging due to their heterogeneous nature in terms of size, appearance and location. Manual tumor segmentation is a time-consuming task and clinical practice would benefit from (semi-) automated segmentation of the different tumor compartments. We present a semi-automated framework for brain tumor segmentation based on non-negative matrix factorization (NMF) that does not require prior training of the method. L1-regularization is incorporated into the NMF objective function to promote spatial consistency and sparseness of the tissue abundance maps. The pathological sources are initialized through user-defined voxel selection. Knowledge about the spatial location of the selected voxels is combined with tissue adjacency constraints in a post-processing step to enhance segmentation quality. The method is applied to an MP-MRI dataset of 21 high-grade glioma patients, including conventional, perfusion-weighted and diffusion-weighted MRI. To assess the effect of using MP-MRI data and the L1-regularization term, analyses are also run using only conventional MRI and without L1-regularization. Robustness against user input variability is verified by considering the statistical distribution of the segmentation results when repeatedly analyzing each patient's dataset with a different set of random seeding points. Using L1-regularized semi-automated NMF segmentation, mean Dice-scores of 65%, 74 and 80% are found for active tumor, the tumor core and the whole tumor region. Mean Hausdorff distances of 6.1 mm, 7.4 mm and 8.2 mm are found for active tumor, the tumor core and the whole tumor region. Lower Dice-scores and higher Hausdorff distances are found without L1-regularization and when only considering conventional MRI data. Based on the mean Dice-scores and Hausdorff distances, segmentation results are competitive with state-of-the-art in literature. Robust results were found for most patients, although careful voxel selection is mandatory to avoid sub-optimal segmentation.

  1. Application of the actor model to large scale NDE data analysis

    NASA Astrophysics Data System (ADS)

    Coughlin, Chris

    2018-03-01

    The Actor model of concurrent computation discretizes a problem into a series of independent units or actors that interact only through the exchange of messages. Without direct coupling between individual components, an Actor-based system is inherently concurrent and fault-tolerant. These traits lend themselves to so-called "Big Data" applications in which the volume of data to analyze requires a distributed multi-system design. For a practical demonstration of the Actor computational model, a system was developed to assist with the automated analysis of Nondestructive Evaluation (NDE) datasets using the open source Myriad Data Reduction Framework. A machine learning model trained to detect damage in two-dimensional slices of C-Scan data was deployed in a streaming data processing pipeline. To demonstrate the flexibility of the Actor model, the pipeline was deployed on a local system and re-deployed as a distributed system without recompiling, reconfiguring, or restarting the running application.

  2. Evaluation of Cross-Protocol Stability of a Fully Automated Brain Multi-Atlas Parcellation Tool.

    PubMed

    Liang, Zifei; He, Xiaohai; Ceritoglu, Can; Tang, Xiaoying; Li, Yue; Kutten, Kwame S; Oishi, Kenichi; Miller, Michael I; Mori, Susumu; Faria, Andreia V

    2015-01-01

    Brain parcellation tools based on multiple-atlas algorithms have recently emerged as a promising method with which to accurately define brain structures. When dealing with data from various sources, it is crucial that these tools are robust for many different imaging protocols. In this study, we tested the robustness of a multiple-atlas, likelihood fusion algorithm using Alzheimer's Disease Neuroimaging Initiative (ADNI) data with six different protocols, comprising three manufacturers and two magnetic field strengths. The entire brain was parceled into five different levels of granularity. In each level, which defines a set of brain structures, ranging from eight to 286 regions, we evaluated the variability of brain volumes related to the protocol, age, and diagnosis (healthy or Alzheimer's disease). Our results indicated that, with proper pre-processing steps, the impact of different protocols is minor compared to biological effects, such as age and pathology. A precise knowledge of the sources of data variation enables sufficient statistical power and ensures the reliability of an anatomical analysis when using this automated brain parcellation tool on datasets from various imaging protocols, such as clinical databases.

  3. Real-Time Three-Dimensional Cell Segmentation in Large-Scale Microscopy Data of Developing Embryos.

    PubMed

    Stegmaier, Johannes; Amat, Fernando; Lemon, William C; McDole, Katie; Wan, Yinan; Teodoro, George; Mikut, Ralf; Keller, Philipp J

    2016-01-25

    We present the Real-time Accurate Cell-shape Extractor (RACE), a high-throughput image analysis framework for automated three-dimensional cell segmentation in large-scale images. RACE is 55-330 times faster and 2-5 times more accurate than state-of-the-art methods. We demonstrate the generality of RACE by extracting cell-shape information from entire Drosophila, zebrafish, and mouse embryos imaged with confocal and light-sheet microscopes. Using RACE, we automatically reconstructed cellular-resolution tissue anisotropy maps across developing Drosophila embryos and quantified differences in cell-shape dynamics in wild-type and mutant embryos. We furthermore integrated RACE with our framework for automated cell lineaging and performed joint segmentation and cell tracking in entire Drosophila embryos. RACE processed these terabyte-sized datasets on a single computer within 1.4 days. RACE is easy to use, as it requires adjustment of only three parameters, takes full advantage of state-of-the-art multi-core processors and graphics cards, and is available as open-source software for Windows, Linux, and Mac OS. Copyright © 2016 Elsevier Inc. All rights reserved.

  4. Multi-output decision trees for lesion segmentation in multiple sclerosis

    NASA Astrophysics Data System (ADS)

    Jog, Amod; Carass, Aaron; Pham, Dzung L.; Prince, Jerry L.

    2015-03-01

    Multiple Sclerosis (MS) is a disease of the central nervous system in which the protective myelin sheath of the neurons is damaged. MS leads to the formation of lesions, predominantly in the white matter of the brain and the spinal cord. The number and volume of lesions visible in magnetic resonance (MR) imaging (MRI) are important criteria for diagnosing and tracking the progression of MS. Locating and delineating lesions manually requires the tedious and expensive efforts of highly trained raters. In this paper, we propose an automated algorithm to segment lesions in MR images using multi-output decision trees. We evaluated our algorithm on the publicly available MICCAI 2008 MS Lesion Segmentation Challenge training dataset of 20 subjects, and showed improved results in comparison to state-of-the-art methods. We also evaluated our algorithm on an in-house dataset of 49 subjects with a true positive rate of 0.41 and a positive predictive value 0.36.

  5. STOP using just GO: a multi-ontology hypothesis generation tool for high throughput experimentation

    PubMed Central

    2013-01-01

    Background Gene Ontology (GO) enrichment analysis remains one of the most common methods for hypothesis generation from high throughput datasets. However, we believe that researchers strive to test other hypotheses that fall outside of GO. Here, we developed and evaluated a tool for hypothesis generation from gene or protein lists using ontological concepts present in manually curated text that describes those genes and proteins. Results As a consequence we have developed the method Statistical Tracking of Ontological Phrases (STOP) that expands the realm of testable hypotheses in gene set enrichment analyses by integrating automated annotations of genes to terms from over 200 biomedical ontologies. While not as precise as manually curated terms, we find that the additional enriched concepts have value when coupled with traditional enrichment analyses using curated terms. Conclusion Multiple ontologies have been developed for gene and protein annotation, by using a dataset of both manually curated GO terms and automatically recognized concepts from curated text we can expand the realm of hypotheses that can be discovered. The web application STOP is available at http://mooneygroup.org/stop/. PMID:23409969

  6. Using multiclass classification to automate the identification of patient safety incident reports by type and severity.

    PubMed

    Wang, Ying; Coiera, Enrico; Runciman, William; Magrabi, Farah

    2017-06-12

    Approximately 10% of admissions to acute-care hospitals are associated with an adverse event. Analysis of incident reports helps to understand how and why incidents occur and can inform policy and practice for safer care. Unfortunately our capacity to monitor and respond to incident reports in a timely manner is limited by the sheer volumes of data collected. In this study, we aim to evaluate the feasibility of using multiclass classification to automate the identification of patient safety incidents in hospitals. Text based classifiers were applied to identify 10 incident types and 4 severity levels. Using the one-versus-one (OvsO) and one-versus-all (OvsA) ensemble strategies, we evaluated regularized logistic regression, linear support vector machine (SVM) and SVM with a radial-basis function (RBF) kernel. Classifiers were trained and tested with "balanced" datasets (n_ Type  = 2860, n_ SeverityLevel  = 1160) from a state-wide incident reporting system. Testing was also undertaken with imbalanced "stratified" datasets (n_ Type  = 6000, n_ SeverityLevel =5950) from the state-wide system and an independent hospital reporting system. Classifier performance was evaluated using a confusion matrix, as well as F-score, precision and recall. The most effective combination was a OvsO ensemble of binary SVM RBF classifiers with binary count feature extraction. For incident type, classifiers performed well on balanced and stratified datasets (F-score: 78.3, 73.9%), but were worse on independent datasets (68.5%). Reports about falls, medications, pressure injury, aggression and blood products were identified with high recall and precision. "Documentation" was the hardest type to identify. For severity level, F-score for severity assessment code (SAC) 1 (extreme risk) was 87.3 and 64% for SAC4 (low risk) on balanced data. With stratified data, high recall was achieved for SAC1 (82.8-84%) but precision was poor (6.8-11.2%). High risk incidents (SAC2) were confused with medium risk incidents (SAC3). Binary classifier ensembles appear to be a feasible method for identifying incidents by type and severity level. Automated identification should enable safety problems to be detected and addressed in a more timely manner. Multi-label classifiers may be necessary for reports that relate to more than one incident type.

  7. Improved Hourly and Sub-Hourly Gauge Data for Assessing Precipitation Extremes in the U.S.

    NASA Astrophysics Data System (ADS)

    Lawrimore, J. H.; Wuertz, D.; Palecki, M. A.; Kim, D.; Stevens, S. E.; Leeper, R.; Korzeniewski, B.

    2017-12-01

    The NOAA/National Weather Service (NWS) Fischer-Porter (F&P) weighing bucket precipitation gauge network consists of approximately 2000 stations that comprise a subset of the NWS Cooperative Observers Program network. This network has operated since the mid-20th century, providing one of the longest records of hourly and 15-minute precipitation observations in the U.S. The lengthy record of this dataset combined with its relatively high spatial density, provides an important source of data for many hydrological applications including understanding trends and variability in the frequency and intensity of extreme precipitation events. In recent years NOAA's National Centers for Environmental Information initiated an upgrade of its end-to-end processing and quality control system for these data. This involved a change from a largely manual review and edit process to a fully automated system that removes the subjectivity that was previously a necessary part of dataset quality control and processing. An overview of improvements to this dataset is provided along with the results of an analysis of observed variability and trends in U.S. precipitation extremes since the mid-20th century. Multi-decadal trends in many parts of the nation are consistent with model projections of an increase in the frequency and intensity of heavy precipitation in a warming world.

  8. A machine learning pipeline for automated registration and classification of 3D lidar data

    NASA Astrophysics Data System (ADS)

    Rajagopal, Abhejit; Chellappan, Karthik; Chandrasekaran, Shivkumar; Brown, Andrew P.

    2017-05-01

    Despite the large availability of geospatial data, registration and exploitation of these datasets remains a persis- tent challenge in geoinformatics. Popular signal processing and machine learning algorithms, such as non-linear SVMs and neural networks, rely on well-formatted input models as well as reliable output labels, which are not always immediately available. In this paper we outline a pipeline for gathering, registering, and classifying initially unlabeled wide-area geospatial data. As an illustrative example, we demonstrate the training and test- ing of a convolutional neural network to recognize 3D models in the OGRIP 2007 LiDAR dataset using fuzzy labels derived from OpenStreetMap as well as other datasets available on OpenTopography.org. When auxiliary label information is required, various text and natural language processing filters are used to extract and cluster keywords useful for identifying potential target classes. A subset of these keywords are subsequently used to form multi-class labels, with no assumption of independence. Finally, we employ class-dependent geometry extraction routines to identify candidates from both training and testing datasets. Our regression networks are able to identify the presence of 6 structural classes, including roads, walls, and buildings, in volumes as big as 8000 m3 in as little as 1.2 seconds on a commodity 4-core Intel CPU. The presented framework is neither dataset nor sensor-modality limited due to the registration process, and is capable of multi-sensor data-fusion.

  9. Integrated Multi-process Microfluidic Systems for Automating Analysis

    PubMed Central

    Yang, Weichun; Woolley, Adam T.

    2010-01-01

    Microfluidic technologies have been applied extensively in rapid sample analysis. Some current challenges for standard microfluidic systems are relatively high detection limits, and reduced resolving power and peak capacity compared to conventional approaches. The integration of multiple functions and components onto a single platform can overcome these separation and detection limitations of microfluidics. Multiplexed systems can greatly increase peak capacity in multidimensional separations and can increase sample throughput by analyzing many samples simultaneously. On-chip sample preparation, including labeling, preconcentration, cleanup and amplification, can all serve to speed up and automate processes in integrated microfluidic systems. This paper summarizes advances in integrated multi-process microfluidic systems for automated analysis, their benefits and areas for needed improvement. PMID:20514343

  10. Parallel workflow tools to facilitate human brain MRI post-processing

    PubMed Central

    Cui, Zaixu; Zhao, Chenxi; Gong, Gaolang

    2015-01-01

    Multi-modal magnetic resonance imaging (MRI) techniques are widely applied in human brain studies. To obtain specific brain measures of interest from MRI datasets, a number of complex image post-processing steps are typically required. Parallel workflow tools have recently been developed, concatenating individual processing steps and enabling fully automated processing of raw MRI data to obtain the final results. These workflow tools are also designed to make optimal use of available computational resources and to support the parallel processing of different subjects or of independent processing steps for a single subject. Automated, parallel MRI post-processing tools can greatly facilitate relevant brain investigations and are being increasingly applied. In this review, we briefly summarize these parallel workflow tools and discuss relevant issues. PMID:26029043

  11. Automated Analysis of Renewable Energy Datasets ('EE/RE Data Mining')

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bush, Brian; Elmore, Ryan; Getman, Dan

    This poster illustrates methods to substantially improve the understanding of renewable energy data sets and the depth and efficiency of their analysis through the application of statistical learning methods ('data mining') in the intelligent processing of these often large and messy information sources. The six examples apply methods for anomaly detection, data cleansing, and pattern mining to time-series data (measurements from metering points in buildings) and spatiotemporal data (renewable energy resource datasets).

  12. STAMPS: Software Tool for Automated MRI Post-processing on a supercomputer.

    PubMed

    Bigler, Don C; Aksu, Yaman; Miller, David J; Yang, Qing X

    2009-08-01

    This paper describes a Software Tool for Automated MRI Post-processing (STAMP) of multiple types of brain MRIs on a workstation and for parallel processing on a supercomputer (STAMPS). This software tool enables the automation of nonlinear registration for a large image set and for multiple MR image types. The tool uses standard brain MRI post-processing tools (such as SPM, FSL, and HAMMER) for multiple MR image types in a pipeline fashion. It also contains novel MRI post-processing features. The STAMP image outputs can be used to perform brain analysis using Statistical Parametric Mapping (SPM) or single-/multi-image modality brain analysis using Support Vector Machines (SVMs). Since STAMPS is PBS-based, the supercomputer may be a multi-node computer cluster or one of the latest multi-core computers.

  13. The study of muscle remodeling in Drosophila metamorphosis using in vivo microscopy and bioimage informatics

    PubMed Central

    2012-01-01

    Background Metamorphosis in insects transforms the larval into an adult body plan and comprises the destruction and remodeling of larval and the generation of adult tissues. The remodeling of larval into adult muscles promises to be a genetic model for human atrophy since it is associated with dramatic alteration in cell size. Furthermore, muscle development is amenable to 3D in vivo microscopy at high cellular resolution. However, multi-dimensional image acquisition leads to sizeable amounts of data that demand novel approaches in image processing and analysis. Results To handle, visualize and quantify time-lapse datasets recorded in multiple locations, we designed a workflow comprising three major modules. First, the previously introduced TLM-converter concatenates stacks of single time-points. The second module, TLM-2D-Explorer, creates maximum intensity projections for rapid inspection and allows the temporal alignment of multiple datasets. The transition between prepupal and pupal stage serves as reference point to compare datasets of different genotypes or treatments. We demonstrate how the temporal alignment can reveal novel insights into the east gene which is involved in muscle remodeling. The third module, TLM-3D-Segmenter, performs semi-automated segmentation of selected muscle fibers over multiple frames. 3D image segmentation consists of 3 stages. First, the user places a seed into a muscle of a key frame and performs surface detection based on level-set evolution. Second, the surface is propagated to subsequent frames. Third, automated segmentation detects nuclei inside the muscle fiber. The detected surfaces can be used to visualize and quantify the dynamics of cellular remodeling. To estimate the accuracy of our segmentation method, we performed a comparison with a manually created ground truth. Key and predicted frames achieved a performance of 84% and 80%, respectively. Conclusions We describe an analysis pipeline for the efficient handling and analysis of time-series microscopy data that enhances productivity and facilitates the phenotypic characterization of genetic perturbations. Our methodology can easily be scaled up for genome-wide genetic screens using readily available resources for RNAi based gene silencing in Drosophila and other animal models. PMID:23282138

  14. Managing Multi-center Flow Cytometry Data for Immune Monitoring

    PubMed Central

    White, Scott; Laske, Karoline; Welters, Marij JP; Bidmon, Nicole; van der Burg, Sjoerd H; Britten, Cedrik M; Enzor, Jennifer; Staats, Janet; Weinhold, Kent J; Gouttefangeas, Cécile; Chan, Cliburn

    2014-01-01

    With the recent results of promising cancer vaccines and immunotherapy1–5, immune monitoring has become increasingly relevant for measuring treatment-induced effects on T cells, and an essential tool for shedding light on the mechanisms responsible for a successful treatment. Flow cytometry is the canonical multi-parameter assay for the fine characterization of single cells in solution, and is ubiquitously used in pre-clinical tumor immunology and in cancer immunotherapy trials. Current state-of-the-art polychromatic flow cytometry involves multi-step, multi-reagent assays followed by sample acquisition on sophisticated instruments capable of capturing up to 20 parameters per cell at a rate of tens of thousands of cells per second. Given the complexity of flow cytometry assays, reproducibility is a major concern, especially for multi-center studies. A promising approach for improving reproducibility is the use of automated analysis borrowing from statistics, machine learning and information visualization21–23, as these methods directly address the subjectivity, operator-dependence, labor-intensive and low fidelity of manual analysis. However, it is quite time-consuming to investigate and test new automated analysis techniques on large data sets without some centralized information management system. For large-scale automated analysis to be practical, the presence of consistent and high-quality data linked to the raw FCS files is indispensable. In particular, the use of machine-readable standard vocabularies to characterize channel metadata is essential when constructing analytic pipelines to avoid errors in processing, analysis and interpretation of results. For automation, this high-quality metadata needs to be programmatically accessible, implying the need for a consistent Application Programming Interface (API). In this manuscript, we propose that upfront time spent normalizing flow cytometry data to conform to carefully designed data models enables automated analysis, potentially saving time in the long run. The ReFlow informatics framework was developed to address these data management challenges. PMID:26085786

  15. Deep multi-scale location-aware 3D convolutional neural networks for automated detection of lacunes of presumed vascular origin.

    PubMed

    Ghafoorian, Mohsen; Karssemeijer, Nico; Heskes, Tom; Bergkamp, Mayra; Wissink, Joost; Obels, Jiri; Keizer, Karlijn; de Leeuw, Frank-Erik; Ginneken, Bram van; Marchiori, Elena; Platel, Bram

    2017-01-01

    Lacunes of presumed vascular origin (lacunes) are associated with an increased risk of stroke, gait impairment, and dementia and are a primary imaging feature of the small vessel disease. Quantification of lacunes may be of great importance to elucidate the mechanisms behind neuro-degenerative disorders and is recommended as part of study standards for small vessel disease research. However, due to the different appearance of lacunes in various brain regions and the existence of other similar-looking structures, such as perivascular spaces, manual annotation is a difficult, elaborative and subjective task, which can potentially be greatly improved by reliable and consistent computer-aided detection (CAD) routines. In this paper, we propose an automated two-stage method using deep convolutional neural networks (CNN). We show that this method has good performance and can considerably benefit readers. We first use a fully convolutional neural network to detect initial candidates. In the second step, we employ a 3D CNN as a false positive reduction tool. As the location information is important to the analysis of candidate structures, we further equip the network with contextual information using multi-scale analysis and integration of explicit location features. We trained, validated and tested our networks on a large dataset of 1075 cases obtained from two different studies. Subsequently, we conducted an observer study with four trained observers and compared our method with them using a free-response operating characteristic analysis. Shown on a test set of 111 cases, the resulting CAD system exhibits performance similar to the trained human observers and achieves a sensitivity of 0.974 with 0.13 false positives per slice. A feasibility study also showed that a trained human observer would considerably benefit once aided by the CAD system.

  16. Analysing and correcting the differences between multi-source and multi-scale spatial remote sensing observations.

    PubMed

    Dong, Yingying; Luo, Ruisen; Feng, Haikuan; Wang, Jihua; Zhao, Jinling; Zhu, Yining; Yang, Guijun

    2014-01-01

    Differences exist among analysis results of agriculture monitoring and crop production based on remote sensing observations, which are obtained at different spatial scales from multiple remote sensors in same time period, and processed by same algorithms, models or methods. These differences can be mainly quantitatively described from three aspects, i.e. multiple remote sensing observations, crop parameters estimation models, and spatial scale effects of surface parameters. Our research proposed a new method to analyse and correct the differences between multi-source and multi-scale spatial remote sensing surface reflectance datasets, aiming to provide references for further studies in agricultural application with multiple remotely sensed observations from different sources. The new method was constructed on the basis of physical and mathematical properties of multi-source and multi-scale reflectance datasets. Theories of statistics were involved to extract statistical characteristics of multiple surface reflectance datasets, and further quantitatively analyse spatial variations of these characteristics at multiple spatial scales. Then, taking the surface reflectance at small spatial scale as the baseline data, theories of Gaussian distribution were selected for multiple surface reflectance datasets correction based on the above obtained physical characteristics and mathematical distribution properties, and their spatial variations. This proposed method was verified by two sets of multiple satellite images, which were obtained in two experimental fields located in Inner Mongolia and Beijing, China with different degrees of homogeneity of underlying surfaces. Experimental results indicate that differences of surface reflectance datasets at multiple spatial scales could be effectively corrected over non-homogeneous underlying surfaces, which provide database for further multi-source and multi-scale crop growth monitoring and yield prediction, and their corresponding consistency analysis evaluation.

  17. Analysing and Correcting the Differences between Multi-Source and Multi-Scale Spatial Remote Sensing Observations

    PubMed Central

    Dong, Yingying; Luo, Ruisen; Feng, Haikuan; Wang, Jihua; Zhao, Jinling; Zhu, Yining; Yang, Guijun

    2014-01-01

    Differences exist among analysis results of agriculture monitoring and crop production based on remote sensing observations, which are obtained at different spatial scales from multiple remote sensors in same time period, and processed by same algorithms, models or methods. These differences can be mainly quantitatively described from three aspects, i.e. multiple remote sensing observations, crop parameters estimation models, and spatial scale effects of surface parameters. Our research proposed a new method to analyse and correct the differences between multi-source and multi-scale spatial remote sensing surface reflectance datasets, aiming to provide references for further studies in agricultural application with multiple remotely sensed observations from different sources. The new method was constructed on the basis of physical and mathematical properties of multi-source and multi-scale reflectance datasets. Theories of statistics were involved to extract statistical characteristics of multiple surface reflectance datasets, and further quantitatively analyse spatial variations of these characteristics at multiple spatial scales. Then, taking the surface reflectance at small spatial scale as the baseline data, theories of Gaussian distribution were selected for multiple surface reflectance datasets correction based on the above obtained physical characteristics and mathematical distribution properties, and their spatial variations. This proposed method was verified by two sets of multiple satellite images, which were obtained in two experimental fields located in Inner Mongolia and Beijing, China with different degrees of homogeneity of underlying surfaces. Experimental results indicate that differences of surface reflectance datasets at multiple spatial scales could be effectively corrected over non-homogeneous underlying surfaces, which provide database for further multi-source and multi-scale crop growth monitoring and yield prediction, and their corresponding consistency analysis evaluation. PMID:25405760

  18. Multi-Sectional Views Textural Based SVM for MS Lesion Segmentation in Multi-Channels MRIs

    PubMed Central

    Abdullah, Bassem A; Younis, Akmal A; John, Nigel M

    2012-01-01

    In this paper, a new technique is proposed for automatic segmentation of multiple sclerosis (MS) lesions from brain magnetic resonance imaging (MRI) data. The technique uses a trained support vector machine (SVM) to discriminate between the blocks in regions of MS lesions and the blocks in non-MS lesion regions mainly based on the textural features with aid of the other features. The classification is done on each of the axial, sagittal and coronal sectional brain view independently and the resultant segmentations are aggregated to provide more accurate output segmentation. The main contribution of the proposed technique described in this paper is the use of textural features to detect MS lesions in a fully automated approach that does not rely on manually delineating the MS lesions. In addition, the technique introduces the concept of the multi-sectional view segmentation to produce verified segmentation. The proposed textural-based SVM technique was evaluated using three simulated datasets and more than fifty real MRI datasets. The results were compared with state of the art methods. The obtained results indicate that the proposed method would be viable for use in clinical practice for the detection of MS lesions in MRI. PMID:22741026

  19. Comparative Microbial Modules Resource: Generation and Visualization of Multi-species Biclusters

    PubMed Central

    Bate, Ashley; Eichenberger, Patrick; Bonneau, Richard

    2011-01-01

    The increasing abundance of large-scale, high-throughput datasets for many closely related organisms provides opportunities for comparative analysis via the simultaneous biclustering of datasets from multiple species. These analyses require a reformulation of how to organize multi-species datasets and visualize comparative genomics data analyses results. Recently, we developed a method, multi-species cMonkey, which integrates heterogeneous high-throughput datatypes from multiple species to identify conserved regulatory modules. Here we present an integrated data visualization system, built upon the Gaggle, enabling exploration of our method's results (available at http://meatwad.bio.nyu.edu/cmmr.html). The system can also be used to explore other comparative genomics datasets and outputs from other data analysis procedures – results from other multiple-species clustering programs or from independent clustering of different single-species datasets. We provide an example use of our system for two bacteria, Escherichia coli and Salmonella Typhimurium. We illustrate the use of our system by exploring conserved biclusters involved in nitrogen metabolism, uncovering a putative function for yjjI, a currently uncharacterized gene that we predict to be involved in nitrogen assimilation. PMID:22144874

  20. Comparative microbial modules resource: generation and visualization of multi-species biclusters.

    PubMed

    Kacmarczyk, Thadeous; Waltman, Peter; Bate, Ashley; Eichenberger, Patrick; Bonneau, Richard

    2011-12-01

    The increasing abundance of large-scale, high-throughput datasets for many closely related organisms provides opportunities for comparative analysis via the simultaneous biclustering of datasets from multiple species. These analyses require a reformulation of how to organize multi-species datasets and visualize comparative genomics data analyses results. Recently, we developed a method, multi-species cMonkey, which integrates heterogeneous high-throughput datatypes from multiple species to identify conserved regulatory modules. Here we present an integrated data visualization system, built upon the Gaggle, enabling exploration of our method's results (available at http://meatwad.bio.nyu.edu/cmmr.html). The system can also be used to explore other comparative genomics datasets and outputs from other data analysis procedures - results from other multiple-species clustering programs or from independent clustering of different single-species datasets. We provide an example use of our system for two bacteria, Escherichia coli and Salmonella Typhimurium. We illustrate the use of our system by exploring conserved biclusters involved in nitrogen metabolism, uncovering a putative function for yjjI, a currently uncharacterized gene that we predict to be involved in nitrogen assimilation. © 2011 Kacmarczyk et al.

  1. Large-scale image region documentation for fully automated image biomarker algorithm development and evaluation

    PubMed Central

    Reeves, Anthony P.; Xie, Yiting; Liu, Shuang

    2017-01-01

    Abstract. With the advent of fully automated image analysis and modern machine learning methods, there is a need for very large image datasets having documented segmentations for both computer algorithm training and evaluation. This paper presents a method and implementation for facilitating such datasets that addresses the critical issue of size scaling for algorithm validation and evaluation; current evaluation methods that are usually used in academic studies do not scale to large datasets. This method includes protocols for the documentation of many regions in very large image datasets; the documentation may be incrementally updated by new image data and by improved algorithm outcomes. This method has been used for 5 years in the context of chest health biomarkers from low-dose chest CT images that are now being used with increasing frequency in lung cancer screening practice. The lung scans are segmented into over 100 different anatomical regions, and the method has been applied to a dataset of over 20,000 chest CT images. Using this framework, the computer algorithms have been developed to achieve over 90% acceptable image segmentation on the complete dataset. PMID:28612037

  2. Computerized detection of breast lesions in multi-centre and multi-instrument DCE-MR data using 3D principal component maps and template matching

    NASA Astrophysics Data System (ADS)

    Ertas, Gokhan; Doran, Simon; Leach, Martin O.

    2011-12-01

    In this study, we introduce a novel, robust and accurate computerized algorithm based on volumetric principal component maps and template matching that facilitates lesion detection on dynamic contrast-enhanced MR. The study dataset comprises 24 204 contrast-enhanced breast MR images corresponding to 4034 axial slices from 47 women in the UK multi-centre study of MRI screening for breast cancer and categorized as high risk. The scans analysed here were performed on six different models of scanner from three commercial vendors, sited in 13 clinics around the UK. 1952 slices from this dataset, containing 15 benign and 13 malignant lesions, were used for training. The remaining 2082 slices, with 14 benign and 12 malignant lesions, were used for test purposes. To prevent false positives being detected from other tissues and regions of the body, breast volumes are segmented from pre-contrast images using a fast semi-automated algorithm. Principal component analysis is applied to the centred intensity vectors formed from the dynamic contrast-enhanced T1-weighted images of the segmented breasts, followed by automatic thresholding to eliminate fatty tissues and slowly enhancing normal parenchyma and a convolution and filtering process to minimize artefacts from moderately enhanced normal parenchyma and blood vessels. Finally, suspicious lesions are identified through a volumetric sixfold neighbourhood connectivity search and calculation of two morphological features: volume and volumetric eccentricity, to exclude highly enhanced blood vessels, nipples and normal parenchyma and to localize lesions. This provides satisfactory lesion localization. For a detection sensitivity of 100%, the overall false-positive detection rate of the system is 1.02/lesion, 1.17/case and 0.08/slice, comparing favourably with previous studies. This approach may facilitate detection of lesions in multi-centre and multi-instrument dynamic contrast-enhanced breast MR data.

  3. Normalized gradient fields cross-correlation for automated detection of prostate in magnetic resonance images

    NASA Astrophysics Data System (ADS)

    Fotin, Sergei V.; Yin, Yin; Periaswamy, Senthil; Kunz, Justin; Haldankar, Hrishikesh; Muradyan, Naira; Cornud, François; Turkbey, Baris; Choyke, Peter L.

    2012-02-01

    Fully automated prostate segmentation helps to address several problems in prostate cancer diagnosis and treatment: it can assist in objective evaluation of multiparametric MR imagery, provides a prostate contour for MR-ultrasound (or CT) image fusion for computer-assisted image-guided biopsy or therapy planning, may facilitate reporting and enables direct prostate volume calculation. Among the challenges in automated analysis of MR images of the prostate are the variations of overall image intensities across scanners, the presence of nonuniform multiplicative bias field within scans and differences in acquisition setup. Furthermore, images acquired with the presence of an endorectal coil suffer from localized high-intensity artifacts at the posterior part of the prostate. In this work, a three-dimensional method for fast automated prostate detection based on normalized gradient fields cross-correlation, insensitive to intensity variations and coil-induced artifacts, is presented and evaluated. The components of the method, offline template learning and the localization algorithm, are described in detail. The method was validated on a dataset of 522 T2-weighted MR images acquired at the National Cancer Institute, USA that was split in two halves for development and testing. In addition, second dataset of 29 MR exams from Centre d'Imagerie Médicale Tourville, France were used to test the algorithm. The 95% confidence intervals for the mean Euclidean distance between automatically and manually identified prostate centroids were 4.06 +/- 0.33 mm and 3.10 +/- 0.43 mm for the first and second test datasets respectively. Moreover, the algorithm provided the centroid within the true prostate volume in 100% of images from both datasets. Obtained results demonstrate high utility of the detection method for a fully automated prostate segmentation.

  4. Medical imaging informatics based solutions for human performance analytics

    NASA Astrophysics Data System (ADS)

    Verma, Sneha; McNitt-Gray, Jill; Liu, Brent J.

    2018-03-01

    For human performance analysis, extensive experimental trials are often conducted to identify the underlying cause or long-term consequences of certain pathologies and to improve motor functions by examining the movement patterns of affected individuals. Data collected for human performance analysis includes high-speed video, surveys, spreadsheets, force data recordings from instrumented surfaces etc. These datasets are recorded from various standalone sources and therefore captured in different folder structures as well as in varying formats depending on the hardware configurations. Therefore, data integration and synchronization present a huge challenge while handling these multimedia datasets specifically for large datasets. Another challenge faced by researchers is querying large quantity of unstructured data and to design feedbacks/reporting tools for users who need to use datasets at various levels. In the past, database server storage solutions have been introduced to securely store these datasets. However, to automate the process of uploading raw files, various file manipulation steps are required. In the current workflow, this file manipulation and structuring is done manually and is not feasible for large amounts of data. However, by attaching metadata files and data dictionaries with these raw datasets, they can provide information and structure needed for automated server upload. We introduce one such system for metadata creation for unstructured multimedia data based on the DICOM data model design. We will discuss design and implementation of this system and evaluate this system with data set collected for movement analysis study. The broader aim of this paper is to present a solutions space achievable based on medical imaging informatics design and methods for improvement in workflow for human performance analysis in a biomechanics research lab.

  5. Automated multi-plug filtration cleanup for liquid chromatographic-tandem mass spectrometric pesticide multi-residue analysis in representative crop commodities.

    PubMed

    Qin, Yuhong; Zhang, Jingru; Zhang, Yuan; Li, Fangbing; Han, Yongtao; Zou, Nan; Xu, Haowei; Qian, Meiyuan; Pan, Canping

    2016-09-02

    An automated multi-plug filtration cleanup (m-PFC) method on modified QuEChERS (quick, easy, cheap, effective, rugged, and safe) extracts was developed. The automatic device was aimed to reduce labor-consuming manual operation workload in the cleanup steps. It could control the volume and the speed of pulling and pushing cycles accurately. In this work, m-PFC was based on multi-walled carbon nanotubes (MWCNTs) mixed with other sorbents and anhydrous magnesium sulfate (MgSO4) in a packed tip for analysis of pesticide multi-residues in crop commodities followed by liquid chromatography with tandem mass spectrometric (LC-MS/MS) detection. It was validated by analyzing 25 pesticides in six representative matrices spiked at two concentration levels of 10 and 100μg/kg. Salts, sorbents, m-PFC procedure, automated pulling and pushing volume, automated pulling speed, and pushing speed for each matrix were optimized. After optimization, two general automated m-PFC methods were introduced to relatively simple (apple, citrus fruit, peanut) and relatively complex (spinach, leek, green tea) matrices. Spike recoveries were within 83 and 108% and 1-14% RSD for most analytes in the tested matrices. Matrix-matched calibrations were performed with the coefficients of determination >0.997 between concentration levels of 10 and 1000μg/kg. The developed method was successfully applied to the determination of pesticide residues in market samples. Copyright © 2016 Elsevier B.V. All rights reserved.

  6. NOTE: Entropy-based automated classification of independent components separated from fMCG

    NASA Astrophysics Data System (ADS)

    Comani, S.; Srinivasan, V.; Alleva, G.; Romani, G. L.

    2007-03-01

    Fetal magnetocardiography (fMCG) is a noninvasive technique suitable for the prenatal diagnosis of the fetal heart function. Reliable fetal cardiac signals can be reconstructed from multi-channel fMCG recordings by means of independent component analysis (ICA). However, the identification of the separated components is usually accomplished by visual inspection. This paper discusses a novel automated system based on entropy estimators, namely approximate entropy (ApEn) and sample entropy (SampEn), for the classification of independent components (ICs). The system was validated on 40 fMCG datasets of normal fetuses with the gestational age ranging from 22 to 37 weeks. Both ApEn and SampEn were able to measure the stability and predictability of the physiological signals separated with ICA, and the entropy values of the three categories were significantly different at p <0.01. The system performances were compared with those of a method based on the analysis of the time and frequency content of the components. The outcomes of this study showed a superior performance of the entropy-based system, in particular for early gestation, with an overall ICs detection rate of 98.75% and 97.92% for ApEn and SampEn respectively, as against a value of 94.50% obtained with the time-frequency-based system.

  7. End-to-end workflow for finite element analysis of tumor treating fields in glioblastomas

    NASA Astrophysics Data System (ADS)

    Timmons, Joshua J.; Lok, Edwin; San, Pyay; Bui, Kevin; Wong, Eric T.

    2017-11-01

    Tumor Treating Fields (TTFields) therapy is an approved modality of treatment for glioblastoma. Patient anatomy-based finite element analysis (FEA) has the potential to reveal not only how these fields affect tumor control but also how to improve efficacy. While the automated tools for segmentation speed up the generation of FEA models, multi-step manual corrections are required, including removal of disconnected voxels, incorporation of unsegmented structures and the addition of 36 electrodes plus gel layers matching the TTFields transducers. Existing approaches are also not scalable for the high throughput analysis of large patient volumes. A semi-automated workflow was developed to prepare FEA models for TTFields mapping in the human brain. Magnetic resonance imaging (MRI) pre-processing, segmentation, electrode and gel placement, and post-processing were all automated. The material properties of each tissue were applied to their corresponding mask in silico using COMSOL Multiphysics (COMSOL, Burlington, MA, USA). The fidelity of the segmentations with and without post-processing was compared against the full semi-automated segmentation workflow approach using Dice coefficient analysis. The average relative differences for the electric fields generated by COMSOL were calculated in addition to observed differences in electric field-volume histograms. Furthermore, the mesh file formats in MPHTXT and NASTRAN were also compared using the differences in the electric field-volume histogram. The Dice coefficient was less for auto-segmentation without versus auto-segmentation with post-processing, indicating convergence on a manually corrected model. An existent but marginal relative difference of electric field maps from models with manual correction versus those without was identified, and a clear advantage of using the NASTRAN mesh file format was found. The software and workflow outlined in this article may be used to accelerate the investigation of TTFields in glioblastoma patients by facilitating the creation of FEA models derived from patient MRI datasets.

  8. LIME: 3D visualisation and interpretation of virtual geoscience models

    NASA Astrophysics Data System (ADS)

    Buckley, Simon; Ringdal, Kari; Dolva, Benjamin; Naumann, Nicole; Kurz, Tobias

    2017-04-01

    Three-dimensional and photorealistic acquisition of surface topography, using methods such as laser scanning and photogrammetry, has become widespread across the geosciences over the last decade. With recent innovations in photogrammetric processing software, robust and automated data capture hardware, and novel sensor platforms, including unmanned aerial vehicles, obtaining 3D representations of exposed topography has never been easier. In addition to 3D datasets, fusion of surface geometry with imaging sensors, such as multi/hyperspectral, thermal and ground-based InSAR, and geophysical methods, create novel and highly visual datasets that provide a fundamental spatial framework to address open geoscience research questions. Although data capture and processing routines are becoming well-established and widely reported in the scientific literature, challenges remain related to the analysis, co-visualisation and presentation of 3D photorealistic models, especially for new users (e.g. students and scientists new to geomatics methods). Interpretation and measurement is essential for quantitative analysis of 3D datasets, and qualitative methods are valuable for presentation purposes, for planning and in education. Motivated by this background, the current contribution presents LIME, a lightweight and high performance 3D software for interpreting and co-visualising 3D models and related image data in geoscience applications. The software focuses on novel data integration and visualisation of 3D topography with image sources such as hyperspectral imagery, logs and interpretation panels, geophysical datasets and georeferenced maps and images. High quality visual output can be generated for dissemination purposes, to aid researchers with communication of their research results. The background of the software is described and case studies from outcrop geology, in hyperspectral mineral mapping and geophysical-geospatial data integration are used to showcase the novel methods developed.

  9. Multichannel microscale system for high throughput preparative separation with comprehensive collection and analysis

    DOEpatents

    Karger, Barry L.; Kotler, Lev; Foret, Frantisek; Minarik, Marek; Kleparnik, Karel

    2003-12-09

    A modular multiple lane or capillary electrophoresis (chromatography) system that permits automated parallel separation and comprehensive collection of all fractions from samples in all lanes or columns, with the option of further on-line automated sample fraction analysis, is disclosed. Preferably, fractions are collected in a multi-well fraction collection unit, or plate (40). The multi-well collection plate (40) is preferably made of a solvent permeable gel, most preferably a hydrophilic, polymeric gel such as agarose or cross-linked polyacrylamide.

  10. HCS-Neurons: identifying phenotypic changes in multi-neuron images upon drug treatments of high-content screening.

    PubMed

    Charoenkwan, Phasit; Hwang, Eric; Cutler, Robert W; Lee, Hua-Chin; Ko, Li-Wei; Huang, Hui-Ling; Ho, Shinn-Ying

    2013-01-01

    High-content screening (HCS) has become a powerful tool for drug discovery. However, the discovery of drugs targeting neurons is still hampered by the inability to accurately identify and quantify the phenotypic changes of multiple neurons in a single image (named multi-neuron image) of a high-content screen. Therefore, it is desirable to develop an automated image analysis method for analyzing multi-neuron images. We propose an automated analysis method with novel descriptors of neuromorphology features for analyzing HCS-based multi-neuron images, called HCS-neurons. To observe multiple phenotypic changes of neurons, we propose two kinds of descriptors which are neuron feature descriptor (NFD) of 13 neuromorphology features, e.g., neurite length, and generic feature descriptors (GFDs), e.g., Haralick texture. HCS-neurons can 1) automatically extract all quantitative phenotype features in both NFD and GFDs, 2) identify statistically significant phenotypic changes upon drug treatments using ANOVA and regression analysis, and 3) generate an accurate classifier to group neurons treated by different drug concentrations using support vector machine and an intelligent feature selection method. To evaluate HCS-neurons, we treated P19 neurons with nocodazole (a microtubule depolymerizing drug which has been shown to impair neurite development) at six concentrations ranging from 0 to 1000 ng/mL. The experimental results show that all the 13 features of NFD have statistically significant difference with respect to changes in various levels of nocodazole drug concentrations (NDC) and the phenotypic changes of neurites were consistent to the known effect of nocodazole in promoting neurite retraction. Three identified features, total neurite length, average neurite length, and average neurite area were able to achieve an independent test accuracy of 90.28% for the six-dosage classification problem. This NFD module and neuron image datasets are provided as a freely downloadable MatLab project at http://iclab.life.nctu.edu.tw/HCS-Neurons. Few automatic methods focus on analyzing multi-neuron images collected from HCS used in drug discovery. We provided an automatic HCS-based method for generating accurate classifiers to classify neurons based on their phenotypic changes upon drug treatments. The proposed HCS-neurons method is helpful in identifying and classifying chemical or biological molecules that alter the morphology of a group of neurons in HCS.

  11. Automated Student Aid Processing: The Challenge and Opportunity.

    ERIC Educational Resources Information Center

    St. John, Edward P.

    1985-01-01

    To utilize automated technology for student aid processing, it is necessary to work with multi-institutional offices (student aid, admissions, registration, and business) and to develop automated interfaces with external processing systems at state and federal agencies and perhaps at need-analysis organizations and lenders. (MLW)

  12. Unsupervised multiple kernel learning for heterogeneous data integration.

    PubMed

    Mariette, Jérôme; Villa-Vialaneix, Nathalie

    2018-03-15

    Recent high-throughput sequencing advances have expanded the breadth of available omics datasets and the integrated analysis of multiple datasets obtained on the same samples has allowed to gain important insights in a wide range of applications. However, the integration of various sources of information remains a challenge for systems biology since produced datasets are often of heterogeneous types, with the need of developing generic methods to take their different specificities into account. We propose a multiple kernel framework that allows to integrate multiple datasets of various types into a single exploratory analysis. Several solutions are provided to learn either a consensus meta-kernel or a meta-kernel that preserves the original topology of the datasets. We applied our framework to analyse two public multi-omics datasets. First, the multiple metagenomic datasets, collected during the TARA Oceans expedition, was explored to demonstrate that our method is able to retrieve previous findings in a single kernel PCA as well as to provide a new image of the sample structures when a larger number of datasets are included in the analysis. To perform this analysis, a generic procedure is also proposed to improve the interpretability of the kernel PCA in regards with the original data. Second, the multi-omics breast cancer datasets, provided by The Cancer Genome Atlas, is analysed using a kernel Self-Organizing Maps with both single and multi-omics strategies. The comparison of these two approaches demonstrates the benefit of our integration method to improve the representation of the studied biological system. Proposed methods are available in the R package mixKernel, released on CRAN. It is fully compatible with the mixOmics package and a tutorial describing the approach can be found on mixOmics web site http://mixomics.org/mixkernel/. jerome.mariette@inra.fr or nathalie.villa-vialaneix@inra.fr. Supplementary data are available at Bioinformatics online.

  13. APEX_SCOPE: A graphical user interface for visualization of multi-modal data in inter-disciplinary studies.

    PubMed

    Kanbar, Lara J; Shalish, Wissam; Precup, Doina; Brown, Karen; Sant'Anna, Guilherme M; Kearney, Robert E

    2017-07-01

    In multi-disciplinary studies, different forms of data are often collected for analysis. For example, APEX, a study on the automated prediction of extubation readiness in extremely preterm infants, collects clinical parameters and cardiorespiratory signals. A variety of cardiorespiratory metrics are computed from these signals and used to assign a cardiorespiratory pattern at each time. In such a situation, exploratory analysis requires a visualization tool capable of displaying these different types of acquired and computed signals in an integrated environment. Thus, we developed APEX_SCOPE, a graphical tool for the visualization of multi-modal data comprising cardiorespiratory signals, automated cardiorespiratory metrics, automated respiratory patterns, manually classified respiratory patterns, and manual annotations by clinicians during data acquisition. This MATLAB-based application provides a means for collaborators to view combinations of signals to promote discussion, generate hypotheses and develop features.

  14. An Open Source approach to automated hydrological analysis of ungauged drainage basins in Serbia using R and SAGA

    NASA Astrophysics Data System (ADS)

    Zlatanovic, Nikola; Milovanovic, Irina; Cotric, Jelena

    2014-05-01

    Drainage basins are for the most part ungauged or poorly gauged not only in Serbia but in most parts of the world, usually due to insufficient funds, but also the decommission of river gauges in upland catchments to focus on downstream areas which are more populated. Very often, design discharges are needed for these streams or rivers where no streamflow data is available, for various applications. Examples include river training works for flood protection measures or erosion control, design of culverts, water supply facilities, small hydropower plants etc. The estimation of discharges in ungauged basins is most often performed using rainfall-runoff models, whose parameters heavily rely on geomorphometric attributes of the basin (e.g. catchment area, elevation, slopes of channels and hillslopes etc.). The calculation of these, as well as other paramaters, is most often done in GIS (Geographic Information System) software environments. This study deals with the application of freely available and open source software and datasets for automating rainfall-runoff analysis of ungauged basins using methodologies currently in use hydrological practice. The R programming language was used for scripting and automating the hydrological calculations, coupled with SAGA GIS (System for Automated Geoscientivic Analysis) for geocomputing functions and terrain analysis. Datasets used in the analyses include the freely available SRTM (Shuttle Radar Topography Mission) terrain data, CORINE (Coordination of Information on the Environment) Land Cover data, as well as soil maps and rainfall data. The choice of free and open source software and datasets makes the project ideal for academic and research purposes and cross-platform projects. The geomorphometric module was tested on more than 100 catchments throughout Serbia and compared to manually calculated values (using topographic maps). The discharge estimation module was tested on 21 catchments where data were available and compared to results obtained by frequency analysis of annual maximum discharge. The geomorphometric module of the calculation system showed excellent results, saving a great deal of time that would otherwise have been spent on manual processing of geospatial data. This type of automated analysis presented in this study will enable a much quicker hydrologic analysis on multiple watersheds, providing the platform for further research into spatial variability of runoff.

  15. Automated analysis of high-content microscopy data with deep learning.

    PubMed

    Kraus, Oren Z; Grys, Ben T; Ba, Jimmy; Chong, Yolanda; Frey, Brendan J; Boone, Charles; Andrews, Brenda J

    2017-04-18

    Existing computational pipelines for quantitative analysis of high-content microscopy data rely on traditional machine learning approaches that fail to accurately classify more than a single dataset without substantial tuning and training, requiring extensive analysis. Here, we demonstrate that the application of deep learning to biological image data can overcome the pitfalls associated with conventional machine learning classifiers. Using a deep convolutional neural network (DeepLoc) to analyze yeast cell images, we show improved performance over traditional approaches in the automated classification of protein subcellular localization. We also demonstrate the ability of DeepLoc to classify highly divergent image sets, including images of pheromone-arrested cells with abnormal cellular morphology, as well as images generated in different genetic backgrounds and in different laboratories. We offer an open-source implementation that enables updating DeepLoc on new microscopy datasets. This study highlights deep learning as an important tool for the expedited analysis of high-content microscopy data. © 2017 The Authors. Published under the terms of the CC BY 4.0 license.

  16. Artificial intelligence for multi-mission planetary operations

    NASA Technical Reports Server (NTRS)

    Atkinson, David J.; Lawson, Denise L.; James, Mark L.

    1990-01-01

    A brief introduction is given to an automated system called the Spacecraft Health Automated Reasoning Prototype (SHARP). SHARP is designed to demonstrate automated health and status analysis for multi-mission spacecraft and ground data systems operations. The SHARP system combines conventional computer science methodologies with artificial intelligence techniques to produce an effective method for detecting and analyzing potential spacecraft and ground systems problems. The system performs real-time analysis of spacecraft and other related telemetry, and is also capable of examining data in historical context. Telecommunications link analysis of the Voyager II spacecraft is the initial focus for evaluation of the prototype in a real-time operations setting during the Voyager spacecraft encounter with Neptune in August, 1989. The preliminary results of the SHARP project and plans for future application of the technology are discussed.

  17. Automated brain tumor segmentation using spatial accuracy-weighted hidden Markov Random Field.

    PubMed

    Nie, Jingxin; Xue, Zhong; Liu, Tianming; Young, Geoffrey S; Setayesh, Kian; Guo, Lei; Wong, Stephen T C

    2009-09-01

    A variety of algorithms have been proposed for brain tumor segmentation from multi-channel sequences, however, most of them require isotropic or pseudo-isotropic resolution of the MR images. Although co-registration and interpolation of low-resolution sequences, such as T2-weighted images, onto the space of the high-resolution image, such as T1-weighted image, can be performed prior to the segmentation, the results are usually limited by partial volume effects due to interpolation of low-resolution images. To improve the quality of tumor segmentation in clinical applications where low-resolution sequences are commonly used together with high-resolution images, we propose the algorithm based on Spatial accuracy-weighted Hidden Markov random field and Expectation maximization (SHE) approach for both automated tumor and enhanced-tumor segmentation. SHE incorporates the spatial interpolation accuracy of low-resolution images into the optimization procedure of the Hidden Markov Random Field (HMRF) to segment tumor using multi-channel MR images with different resolutions, e.g., high-resolution T1-weighted and low-resolution T2-weighted images. In experiments, we evaluated this algorithm using a set of simulated multi-channel brain MR images with known ground-truth tissue segmentation and also applied it to a dataset of MR images obtained during clinical trials of brain tumor chemotherapy. The results show that more accurate tumor segmentation results can be obtained by comparing with conventional multi-channel segmentation algorithms.

  18. Intellicount: High-Throughput Quantification of Fluorescent Synaptic Protein Puncta by Machine Learning

    PubMed Central

    Fantuzzo, J. A.; Mirabella, V. R.; Zahn, J. D.

    2017-01-01

    Abstract Synapse formation analyses can be performed by imaging and quantifying fluorescent signals of synaptic markers. Traditionally, these analyses are done using simple or multiple thresholding and segmentation approaches or by labor-intensive manual analysis by a human observer. Here, we describe Intellicount, a high-throughput, fully-automated synapse quantification program which applies a novel machine learning (ML)-based image processing algorithm to systematically improve region of interest (ROI) identification over simple thresholding techniques. Through processing large datasets from both human and mouse neurons, we demonstrate that this approach allows image processing to proceed independently of carefully set thresholds, thus reducing the need for human intervention. As a result, this method can efficiently and accurately process large image datasets with minimal interaction by the experimenter, making it less prone to bias and less liable to human error. Furthermore, Intellicount is integrated into an intuitive graphical user interface (GUI) that provides a set of valuable features, including automated and multifunctional figure generation, routine statistical analyses, and the ability to run full datasets through nested folders, greatly expediting the data analysis process. PMID:29218324

  19. MUSE: MUlti-atlas region Segmentation utilizing Ensembles of registration algorithms and parameters, and locally optimal atlas selection

    PubMed Central

    Ou, Yangming; Resnick, Susan M.; Gur, Ruben C.; Gur, Raquel E.; Satterthwaite, Theodore D.; Furth, Susan; Davatzikos, Christos

    2016-01-01

    Atlas-based automated anatomical labeling is a fundamental tool in medical image segmentation, as it defines regions of interest for subsequent analysis of structural and functional image data. The extensive investigation of multi-atlas warping and fusion techniques over the past 5 or more years has clearly demonstrated the advantages of consensus-based segmentation. However, the common approach is to use multiple atlases with a single registration method and parameter set, which is not necessarily optimal for every individual scan, anatomical region, and problem/data-type. Different registration criteria and parameter sets yield different solutions, each providing complementary information. Herein, we present a consensus labeling framework that generates a broad ensemble of labeled atlases in target image space via the use of several warping algorithms, regularization parameters, and atlases. The label fusion integrates two complementary sources of information: a local similarity ranking to select locally optimal atlases and a boundary modulation term to refine the segmentation consistently with the target image's intensity profile. The ensemble approach consistently outperforms segmentations using individual warping methods alone, achieving high accuracy on several benchmark datasets. The MUSE methodology has been used for processing thousands of scans from various datasets, producing robust and consistent results. MUSE is publicly available both as a downloadable software package, and as an application that can be run on the CBICA Image Processing Portal (https://ipp.cbica.upenn.edu), a web based platform for remote processing of medical images. PMID:26679328

  20. The GAAIN Entity Mapper: An Active-Learning System for Medical Data Mapping.

    PubMed

    Ashish, Naveen; Dewan, Peehoo; Toga, Arthur W

    2015-01-01

    This work is focused on mapping biomedical datasets to a common representation, as an integral part of data harmonization for integrated biomedical data access and sharing. We present GEM, an intelligent software assistant for automated data mapping across different datasets or from a dataset to a common data model. The GEM system automates data mapping by providing precise suggestions for data element mappings. It leverages the detailed metadata about elements in associated dataset documentation such as data dictionaries that are typically available with biomedical datasets. It employs unsupervised text mining techniques to determine similarity between data elements and also employs machine-learning classifiers to identify element matches. It further provides an active-learning capability where the process of training the GEM system is optimized. Our experimental evaluations show that the GEM system provides highly accurate data mappings (over 90% accuracy) for real datasets of thousands of data elements each, in the Alzheimer's disease research domain. Further, the effort in training the system for new datasets is also optimized. We are currently employing the GEM system to map Alzheimer's disease datasets from around the globe into a common representation, as part of a global Alzheimer's disease integrated data sharing and analysis network called GAAIN. GEM achieves significantly higher data mapping accuracy for biomedical datasets compared to other state-of-the-art tools for database schema matching that have similar functionality. With the use of active-learning capabilities, the user effort in training the system is minimal.

  1. The GAAIN Entity Mapper: An Active-Learning System for Medical Data Mapping

    PubMed Central

    Ashish, Naveen; Dewan, Peehoo; Toga, Arthur W.

    2016-01-01

    This work is focused on mapping biomedical datasets to a common representation, as an integral part of data harmonization for integrated biomedical data access and sharing. We present GEM, an intelligent software assistant for automated data mapping across different datasets or from a dataset to a common data model. The GEM system automates data mapping by providing precise suggestions for data element mappings. It leverages the detailed metadata about elements in associated dataset documentation such as data dictionaries that are typically available with biomedical datasets. It employs unsupervised text mining techniques to determine similarity between data elements and also employs machine-learning classifiers to identify element matches. It further provides an active-learning capability where the process of training the GEM system is optimized. Our experimental evaluations show that the GEM system provides highly accurate data mappings (over 90% accuracy) for real datasets of thousands of data elements each, in the Alzheimer's disease research domain. Further, the effort in training the system for new datasets is also optimized. We are currently employing the GEM system to map Alzheimer's disease datasets from around the globe into a common representation, as part of a global Alzheimer's disease integrated data sharing and analysis network called GAAIN1. GEM achieves significantly higher data mapping accuracy for biomedical datasets compared to other state-of-the-art tools for database schema matching that have similar functionality. With the use of active-learning capabilities, the user effort in training the system is minimal. PMID:26793094

  2. Discovering the Past through Data: Promoting the Design and Analysis of Original Data-Sets in History Undergraduate Courses in Hong Kong

    ERIC Educational Resources Information Center

    Kim, Loretta; Wong, Shun Han Rebekah

    2015-01-01

    This article discusses the objectives and outcomes of a project to enhance digital humanities training at the undergraduate level in a Hong Kong university. The co-investigators re-designed a multi-source data-set as an example and then taught a multi-step curriculum about gathering, organizing, and presenting original data to an introductory…

  3. Methodological challenges and analytic opportunities for modeling and interpreting Big Healthcare Data.

    PubMed

    Dinov, Ivo D

    2016-01-01

    Managing, processing and understanding big healthcare data is challenging, costly and demanding. Without a robust fundamental theory for representation, analysis and inference, a roadmap for uniform handling and analyzing of such complex data remains elusive. In this article, we outline various big data challenges, opportunities, modeling methods and software techniques for blending complex healthcare data, advanced analytic tools, and distributed scientific computing. Using imaging, genetic and healthcare data we provide examples of processing heterogeneous datasets using distributed cloud services, automated and semi-automated classification techniques, and open-science protocols. Despite substantial advances, new innovative technologies need to be developed that enhance, scale and optimize the management and processing of large, complex and heterogeneous data. Stakeholder investments in data acquisition, research and development, computational infrastructure and education will be critical to realize the huge potential of big data, to reap the expected information benefits and to build lasting knowledge assets. Multi-faceted proprietary, open-source, and community developments will be essential to enable broad, reliable, sustainable and efficient data-driven discovery and analytics. Big data will affect every sector of the economy and their hallmark will be 'team science'.

  4. Atlas-guided prostate intensity modulated radiation therapy (IMRT) planning.

    PubMed

    Sheng, Yang; Li, Taoran; Zhang, You; Lee, W Robert; Yin, Fang-Fang; Ge, Yaorong; Wu, Q Jackie

    2015-09-21

    An atlas-based IMRT planning technique for prostate cancer was developed and evaluated. A multi-dose atlas was built based on the anatomy patterns of the patients, more specifically, the percent distance to the prostate and the concaveness angle formed by the seminal vesicles relative to the anterior-posterior axis. A 70-case dataset was classified using a k-medoids clustering analysis to recognize anatomy pattern variations in the dataset. The best classification, defined by the number of classes or medoids, was determined by the largest value of the average silhouette width. Reference plans from each class formed a multi-dose atlas. The atlas-guided planning (AGP) technique started with matching the new case anatomy pattern to one of the reference cases in the atlas; then a deformable registration between the atlas and new case anatomies transferred the dose from the atlas to the new case to guide inverse planning with full automation. 20 additional clinical cases were re-planned to evaluate the AGP technique. Dosimetric properties between AGP and clinical plans were evaluated. The classification analysis determined that the 5-case atlas would best represent anatomy patterns for the patient cohort. AGP took approximately 1 min on average (corresponding to 70 iterations of optimization) for all cases. When dosimetric parameters were compared, the differences between AGP and clinical plans were less than 3.5%, albeit some statistical significances observed: homogeneity index (p  >  0.05), conformity index (p  <  0.01), bladder gEUD (p  <  0.01), and rectum gEUD (p  =  0.02). Atlas-guided treatment planning is feasible and efficient. Atlas predicted dose can effectively guide the optimizer to achieve plan quality comparable to that of clinical plans.

  5. Multi-Stage System for Automatic Target Recognition

    NASA Technical Reports Server (NTRS)

    Chao, Tien-Hsin; Lu, Thomas T.; Ye, David; Edens, Weston; Johnson, Oliver

    2010-01-01

    A multi-stage automated target recognition (ATR) system has been designed to perform computer vision tasks with adequate proficiency in mimicking human vision. The system is able to detect, identify, and track targets of interest. Potential regions of interest (ROIs) are first identified by the detection stage using an Optimum Trade-off Maximum Average Correlation Height (OT-MACH) filter combined with a wavelet transform. False positives are then eliminated by the verification stage using feature extraction methods in conjunction with neural networks. Feature extraction transforms the ROIs using filtering and binning algorithms to create feature vectors. A feedforward back-propagation neural network (NN) is then trained to classify each feature vector and to remove false positives. The system parameter optimizations process has been developed to adapt to various targets and datasets. The objective was to design an efficient computer vision system that can learn to detect multiple targets in large images with unknown backgrounds. Because the target size is small relative to the image size in this problem, there are many regions of the image that could potentially contain the target. A cursory analysis of every region can be computationally efficient, but may yield too many false positives. On the other hand, a detailed analysis of every region can yield better results, but may be computationally inefficient. The multi-stage ATR system was designed to achieve an optimal balance between accuracy and computational efficiency by incorporating both models. The detection stage first identifies potential ROIs where the target may be present by performing a fast Fourier domain OT-MACH filter-based correlation. Because threshold for this stage is chosen with the goal of detecting all true positives, a number of false positives are also detected as ROIs. The verification stage then transforms the regions of interest into feature space, and eliminates false positives using an artificial neural network classifier. The multi-stage system allows tuning the detection sensitivity and the identification specificity individually in each stage. It is easier to achieve optimized ATR operation based on its specific goal. The test results show that the system was successful in substantially reducing the false positive rate when tested on a sonar and video image datasets.

  6. The automated multi-stage substructuring system for NASTRAN

    NASA Technical Reports Server (NTRS)

    Field, E. I.; Herting, D. N.; Herendeen, D. L.; Hoesly, R. L.

    1975-01-01

    The substructuring capability developed for eventual installation in Level 16 is now operational in a test version of NASTRAN. Its features are summarized. These include the user-oriented, Case Control type control language, the automated multi-stage matrix processing, the independent direct access data storage facilities, and the static and normal modes solution capabilities. A complete problem analysis sequence is presented with card-by-card description of the user input.

  7. Metric Evaluation Pipeline for 3d Modeling of Urban Scenes

    NASA Astrophysics Data System (ADS)

    Bosch, M.; Leichtman, A.; Chilcott, D.; Goldberg, H.; Brown, M.

    2017-05-01

    Publicly available benchmark data and metric evaluation approaches have been instrumental in enabling research to advance state of the art methods for remote sensing applications in urban 3D modeling. Most publicly available benchmark datasets have consisted of high resolution airborne imagery and lidar suitable for 3D modeling on a relatively modest scale. To enable research in larger scale 3D mapping, we have recently released a public benchmark dataset with multi-view commercial satellite imagery and metrics to compare 3D point clouds with lidar ground truth. We now define a more complete metric evaluation pipeline developed as publicly available open source software to assess semantically labeled 3D models of complex urban scenes derived from multi-view commercial satellite imagery. Evaluation metrics in our pipeline include horizontal and vertical accuracy and completeness, volumetric completeness and correctness, perceptual quality, and model simplicity. Sources of ground truth include airborne lidar and overhead imagery, and we demonstrate a semi-automated process for producing accurate ground truth shape files to characterize building footprints. We validate our current metric evaluation pipeline using 3D models produced using open source multi-view stereo methods. Data and software is made publicly available to enable further research and planned benchmarking activities.

  8. Resources and costs for microbial sequence analysis evaluated using virtual machines and cloud computing.

    PubMed

    Angiuoli, Samuel V; White, James R; Matalka, Malcolm; White, Owen; Fricke, W Florian

    2011-01-01

    The widespread popularity of genomic applications is threatened by the "bioinformatics bottleneck" resulting from uncertainty about the cost and infrastructure needed to meet increasing demands for next-generation sequence analysis. Cloud computing services have been discussed as potential new bioinformatics support systems but have not been evaluated thoroughly. We present benchmark costs and runtimes for common microbial genomics applications, including 16S rRNA analysis, microbial whole-genome shotgun (WGS) sequence assembly and annotation, WGS metagenomics and large-scale BLAST. Sequence dataset types and sizes were selected to correspond to outputs typically generated by small- to midsize facilities equipped with 454 and Illumina platforms, except for WGS metagenomics where sampling of Illumina data was used. Automated analysis pipelines, as implemented in the CloVR virtual machine, were used in order to guarantee transparency, reproducibility and portability across different operating systems, including the commercial Amazon Elastic Compute Cloud (EC2), which was used to attach real dollar costs to each analysis type. We found considerable differences in computational requirements, runtimes and costs associated with different microbial genomics applications. While all 16S analyses completed on a single-CPU desktop in under three hours, microbial genome and metagenome analyses utilized multi-CPU support of up to 120 CPUs on Amazon EC2, where each analysis completed in under 24 hours for less than $60. Representative datasets were used to estimate maximum data throughput on different cluster sizes and to compare costs between EC2 and comparable local grid servers. Although bioinformatics requirements for microbial genomics depend on dataset characteristics and the analysis protocols applied, our results suggests that smaller sequencing facilities (up to three Roche/454 or one Illumina GAIIx sequencer) invested in 16S rRNA amplicon sequencing, microbial single-genome and metagenomics WGS projects can achieve cost-efficient bioinformatics support using CloVR in combination with Amazon EC2 as an alternative to local computing centers.

  9. Resources and Costs for Microbial Sequence Analysis Evaluated Using Virtual Machines and Cloud Computing

    PubMed Central

    Angiuoli, Samuel V.; White, James R.; Matalka, Malcolm; White, Owen; Fricke, W. Florian

    2011-01-01

    Background The widespread popularity of genomic applications is threatened by the “bioinformatics bottleneck” resulting from uncertainty about the cost and infrastructure needed to meet increasing demands for next-generation sequence analysis. Cloud computing services have been discussed as potential new bioinformatics support systems but have not been evaluated thoroughly. Results We present benchmark costs and runtimes for common microbial genomics applications, including 16S rRNA analysis, microbial whole-genome shotgun (WGS) sequence assembly and annotation, WGS metagenomics and large-scale BLAST. Sequence dataset types and sizes were selected to correspond to outputs typically generated by small- to midsize facilities equipped with 454 and Illumina platforms, except for WGS metagenomics where sampling of Illumina data was used. Automated analysis pipelines, as implemented in the CloVR virtual machine, were used in order to guarantee transparency, reproducibility and portability across different operating systems, including the commercial Amazon Elastic Compute Cloud (EC2), which was used to attach real dollar costs to each analysis type. We found considerable differences in computational requirements, runtimes and costs associated with different microbial genomics applications. While all 16S analyses completed on a single-CPU desktop in under three hours, microbial genome and metagenome analyses utilized multi-CPU support of up to 120 CPUs on Amazon EC2, where each analysis completed in under 24 hours for less than $60. Representative datasets were used to estimate maximum data throughput on different cluster sizes and to compare costs between EC2 and comparable local grid servers. Conclusions Although bioinformatics requirements for microbial genomics depend on dataset characteristics and the analysis protocols applied, our results suggests that smaller sequencing facilities (up to three Roche/454 or one Illumina GAIIx sequencer) invested in 16S rRNA amplicon sequencing, microbial single-genome and metagenomics WGS projects can achieve cost-efficient bioinformatics support using CloVR in combination with Amazon EC2 as an alternative to local computing centers. PMID:22028928

  10. An Optimised System for Generating Multi-Resolution Dtms Using NASA Mro Datasets

    NASA Astrophysics Data System (ADS)

    Tao, Y.; Muller, J.-P.; Sidiropoulos, P.; Veitch-Michaelis, J.; Yershov, V.

    2016-06-01

    Within the EU FP-7 iMars project, a fully automated multi-resolution DTM processing chain, called Co-registration ASP-Gotcha Optimised (CASP-GO) has been developed, based on the open source NASA Ames Stereo Pipeline (ASP). CASP-GO includes tiepoint based multi-resolution image co-registration and an adaptive least squares correlation-based sub-pixel refinement method called Gotcha. The implemented system guarantees global geo-referencing compliance with respect to HRSC (and thence to MOLA), provides refined stereo matching completeness and accuracy based on the ASP normalised cross-correlation. We summarise issues discovered from experimenting with the use of the open-source ASP DTM processing chain and introduce our new working solutions. These issues include global co-registration accuracy, de-noising, dealing with failure in matching, matching confidence estimation, outlier definition and rejection scheme, various DTM artefacts, uncertainty estimation, and quality-efficiency trade-offs.

  11. Gaussian curvature analysis allows for automatic block placement in multi-block hexahedral meshing.

    PubMed

    Ramme, Austin J; Shivanna, Kiran H; Magnotta, Vincent A; Grosland, Nicole M

    2011-10-01

    Musculoskeletal finite element analysis (FEA) has been essential to research in orthopaedic biomechanics. The generation of a volumetric mesh is often the most challenging step in a FEA. Hexahedral meshing tools that are based on a multi-block approach rely on the manual placement of building blocks for their mesh generation scheme. We hypothesise that Gaussian curvature analysis could be used to automatically develop a building block structure for multi-block hexahedral mesh generation. The Automated Building Block Algorithm incorporates principles from differential geometry, combinatorics, statistical analysis and computer science to automatically generate a building block structure to represent a given surface without prior information. We have applied this algorithm to 29 bones of varying geometries and successfully generated a usable mesh in all cases. This work represents a significant advancement in automating the definition of building blocks.

  12. Lake Michigan Diversion Accounting land cover change estimation by use of the National Land Cover Dataset and raingage network partitioning analysis

    USGS Publications Warehouse

    Sharpe, Jennifer B.; Soong, David T.

    2015-01-01

    This study used the National Land Cover Dataset (NLCD) and developed an automated process for determining the area of the three land cover types, thereby allowing faster updating of future models, and for evaluating land cover changes by use of historical NLCD datasets. The study also carried out a raingage partitioning analysis so that the segmentation of land cover and rainfall in each modeled unit is directly applicable to the HSPF modeling. Historical and existing impervious, grass, and forest land acreages partitioned by percentages covered by two sets of raingages for the Lake Michigan diversion SCAs, gaged basins, and ungaged basins are presented.

  13. Web-based interactive access, analysis and comparison of remotely sensed and in situ measured temperature data

    NASA Astrophysics Data System (ADS)

    Eberle, Jonas; Urban, Marcel; Hüttich, Christian; Schmullius, Christiane

    2014-05-01

    Numerous datasets providing temperature information from meteorological stations or remote sensing satellites are available. However, the challenging issue is to search in the archives and process the time series information for further analysis. These steps can be automated for each individual product, if the pre-conditions are complied, e.g. data access through web services (HTTP, FTP) or legal rights to redistribute the datasets. Therefore a python-based package was developed to provide data access and data processing tools for MODIS Land Surface Temperature (LST) data, which is provided by NASA Land Processed Distributed Active Archive Center (LPDAAC), as well as the Global Surface Summary of the Day (GSOD) and the Global Historical Climatology Network (GHCN) daily datasets provided by NOAA National Climatic Data Center (NCDC). The package to access and process the information is available as web services used by an interactive web portal for simple data access and analysis. Tools for time series analysis were linked to the system, e.g. time series plotting, decomposition, aggregation (monthly, seasonal, etc.), trend analyses, and breakpoint detection. Especially for temperature data a plot was integrated for the comparison of two temperature datasets based on the work by Urban et al. (2013). As a first result, a kernel density plot compares daily MODIS LST from satellites Aqua and Terra with daily means from GSOD and GHCN datasets. Without any data download and data processing, the users can analyze different time series datasets in an easy-to-use web portal. As a first use case, we built up this complimentary system with remotely sensed MODIS data and in situ measurements from meteorological stations for Siberia within the Siberian Earth System Science Cluster (www.sibessc.uni-jena.de). References: Urban, Marcel; Eberle, Jonas; Hüttich, Christian; Schmullius, Christiane; Herold, Martin. 2013. "Comparison of Satellite-Derived Land Surface Temperature and Air Temperature from Meteorological Stations on the Pan-Arctic Scale." Remote Sens. 5, no. 5: 2348-2367. Further materials: Eberle, Jonas; Clausnitzer, Siegfried; Hüttich, Christian; Schmullius, Christiane. 2013. "Multi-Source Data Processing Middleware for Land Monitoring within a Web-Based Spatial Data Infrastructure for Siberia." ISPRS Int. J. Geo-Inf. 2, no. 3: 553-576.

  14. Nicotine replacement therapy decision based on fuzzy multi-criteria analysis

    NASA Astrophysics Data System (ADS)

    Tarmudi, Zamali; Matmali, Norfazillah; Abdullah, Mohd Lazim

    2017-08-01

    It has been observed that Nicotine Replacement Therapy (NRT) is one of the alternatives to control and reduce smoking addiction among smokers. Since the decision to choose the best NRT alternative involves uncertainty, ambiguity factors and diverse input datasets, thus, this paper proposes a fuzzy multi-criteria analysis (FMA) to overcome these issues. It focuses on how the fuzzy approach can unify the diversity of datasets based on NRT's decision-making problem. The analysis done employed the advantage of the cost-benefit criterion to unify the mixture of dataset input. The performance matrix was utilised to derive the performance scores. An empirical example regarding the NRT's decision-making problem was employed to illustrate the proposed approach. Based on the calculations, this analytical approach was found to be highly beneficial in terms of usability. It was also very applicable and efficient in dealing with the mixture of input datasets. Hence, the decision-making process can easily be used by experts and patients who are interested to join the therapy/cessation program.

  15. Joint Blind Source Separation by Multi-set Canonical Correlation Analysis

    PubMed Central

    Li, Yi-Ou; Adalı, Tülay; Wang, Wei; Calhoun, Vince D

    2009-01-01

    In this work, we introduce a simple and effective scheme to achieve joint blind source separation (BSS) of multiple datasets using multi-set canonical correlation analysis (M-CCA) [1]. We first propose a generative model of joint BSS based on the correlation of latent sources within and between datasets. We specify source separability conditions, and show that, when the conditions are satisfied, the group of corresponding sources from each dataset can be jointly extracted by M-CCA through maximization of correlation among the extracted sources. We compare source separation performance of the M-CCA scheme with other joint BSS methods and demonstrate the superior performance of the M-CCA scheme in achieving joint BSS for a large number of datasets, group of corresponding sources with heterogeneous correlation values, and complex-valued sources with circular and non-circular distributions. We apply M-CCA to analysis of functional magnetic resonance imaging (fMRI) data from multiple subjects and show its utility in estimating meaningful brain activations from a visuomotor task. PMID:20221319

  16. Similarity of markers identified from cancer gene expression studies: observations from GEO.

    PubMed

    Shi, Xingjie; Shen, Shihao; Liu, Jin; Huang, Jian; Zhou, Yong; Ma, Shuangge

    2014-09-01

    Gene expression profiling has been extensively conducted in cancer research. The analysis of multiple independent cancer gene expression datasets may provide additional information and complement single-dataset analysis. In this study, we conduct multi-dataset analysis and are interested in evaluating the similarity of cancer-associated genes identified from different datasets. The first objective of this study is to briefly review some statistical methods that can be used for such evaluation. Both marginal analysis and joint analysis methods are reviewed. The second objective is to apply those methods to 26 Gene Expression Omnibus (GEO) datasets on five types of cancers. Our analysis suggests that for the same cancer, the marker identification results may vary significantly across datasets, and different datasets share few common genes. In addition, datasets on different cancers share few common genes. The shared genetic basis of datasets on the same or different cancers, which has been suggested in the literature, is not observed in the analysis of GEO data. © The Author 2013. Published by Oxford University Press. For Permissions, please email: journals.permissions@oup.com.

  17. Automated Spatiotemporal Analysis of Fibrils and Coronal Rain Using the Rolling Hough Transform

    NASA Astrophysics Data System (ADS)

    Schad, Thomas

    2017-09-01

    A technique is presented that automates the direction characterization of curvilinear features in multidimensional solar imaging datasets. It is an extension of the Rolling Hough Transform (RHT) technique presented by Clark, Peek, and Putman ( Astrophys. J. 789, 82, 2014), and it excels at rapid quantification of spatial and spatiotemporal feature orientation even for applications with a low signal-to-noise ratio. It operates on a pixel-by-pixel basis within a dataset and reliably quantifies orientation even for locations not centered on a feature ridge, which is used here to derive a quasi-continuous map of the chromospheric fine-structure projection angle. For time-series analysis, a procedure is developed that uses a hierarchical application of the RHT to automatically derive the apparent motion of coronal rain observed off-limb. Essential to the success of this technique is the formulation presented in this article for the RHT error analysis as it provides a means to properly filter results.

  18. SEGMENTATION OF MITOCHONDRIA IN ELECTRON MICROSCOPY IMAGES USING ALGEBRAIC CURVES.

    PubMed

    Seyedhosseini, Mojtaba; Ellisman, Mark H; Tasdizen, Tolga

    2013-01-01

    High-resolution microscopy techniques have been used to generate large volumes of data with enough details for understanding the complex structure of the nervous system. However, automatic techniques are required to segment cells and intracellular structures in these multi-terabyte datasets and make anatomical analysis possible on a large scale. We propose a fully automated method that exploits both shape information and regional statistics to segment irregularly shaped intracellular structures such as mitochondria in electron microscopy (EM) images. The main idea is to use algebraic curves to extract shape features together with texture features from image patches. Then, these powerful features are used to learn a random forest classifier, which can predict mitochondria locations precisely. Finally, the algebraic curves together with regional information are used to segment the mitochondria at the predicted locations. We demonstrate that our method outperforms the state-of-the-art algorithms in segmentation of mitochondria in EM images.

  19. Using soft-hard fusion for misinformation detection and pattern of life analysis in OSINT

    NASA Astrophysics Data System (ADS)

    Levchuk, Georgiy; Shabarekh, Charlotte

    2017-05-01

    Today's battlefields are shifting to "denied areas", where the use of U.S. Military air and ground assets is limited. To succeed, the U.S. intelligence analysts increasingly rely on available open-source intelligence (OSINT) which is fraught with inconsistencies, biased reporting and fake news. Analysts need automated tools for retrieval of information from OSINT sources, and these solutions must identify and resolve conflicting and deceptive information. In this paper, we present a misinformation detection model (MDM) which converts text to attributed knowledge graphs and runs graph-based analytics to identify misinformation. At the core of our solution is identification of knowledge conflicts in the fused multi-source knowledge graph, and semi-supervised learning to compute locally consistent reliability and credibility scores for the documents and sources, respectively. We present validation of proposed method using an open source dataset constructed from the online investigations of MH17 downing in Eastern Ukraine.

  20. Device and methods for "gold standard" registration of clinical 3D and 2D cerebral angiograms

    NASA Astrophysics Data System (ADS)

    Madan, Hennadii; Likar, Boštjan; Pernuš, Franjo; Å piclin, Žiga

    2015-03-01

    Translation of any novel and existing 3D-2D image registration methods into clinical image-guidance systems is limited due to lack of their objective validation on clinical image datasets. The main reason is that, besides the calibration of the 2D imaging system, a reference or "gold standard" registration is very difficult to obtain on clinical image datasets. In the context of cerebral endovascular image-guided interventions (EIGIs), we present a calibration device in the form of a headband with integrated fiducial markers and, secondly, propose an automated pipeline comprising 3D and 2D image processing, analysis and annotation steps, the result of which is a retrospective calibration of the 2D imaging system and an optimal, i.e., "gold standard" registration of 3D and 2D images. The device and methods were used to create the "gold standard" on 15 datasets of 3D and 2D cerebral angiograms, whereas each dataset was acquired on a patient undergoing EIGI for either aneurysm coiling or embolization of arteriovenous malformation. The use of the device integrated seamlessly in the clinical workflow of EIGI. While the automated pipeline eliminated all manual input or interactive image processing, analysis or annotation. In this way, the time to obtain the "gold standard" was reduced from 30 to less than one minute and the "gold standard" of 3D-2D registration on all 15 datasets of cerebral angiograms was obtained with a sub-0.1 mm accuracy.

  1. Multi-azimuth 3D Seismic Exploration and Processing in the Jeju Basin, the Northern East China Sea

    NASA Astrophysics Data System (ADS)

    Yoon, Youngho; Kang, Moohee; Kim, Jin-Ho; Kim, Kyong-O.

    2015-04-01

    Multi-azimuth(MAZ) 3D seismic exploration is one of the most advanced seismic survey methods to improve illumination and multiple attenuation for better image of the subsurface structures. 3D multi-channel seismic data were collected in two phases during 2012, 2013, and 2014 in Jeju Basin, the northern part of the East China Sea Basin where several oil and gas fields were discovered. Phase 1 data were acquired at 135° and 315° azimuths in 2012 and 2013 comprised a full 3D marine seismic coverage of 160 km2. In 2014, phase 2 data were acquired at the azimuths 45° and 225°, perpendicular to those of phase 1. These two datasets were processed through the same processing workflow prior to velocity analysis and merged to one MAZ dataset. We performed velocity analysis on the MAZ dataset as well as two phases data individually and then stacked these three datasets separately. We were able to pick more accurate velocities in the MAZ dataset compare to phase 1 and 2 data while velocity picking. Consequently, the MAZ seismic volume provide us better resolution and improved images since different shooting directions illuminate different parts of the structures and stratigraphic features.

  2. Fully-automated segmentation of fluid regions in exudative age-related macular degeneration subjects: Kernel graph cut in neutrosophic domain

    PubMed Central

    Rashno, Abdolreza; Nazari, Behzad; Koozekanani, Dara D.; Drayna, Paul M.; Sadri, Saeed; Rabbani, Hossein

    2017-01-01

    A fully-automated method based on graph shortest path, graph cut and neutrosophic (NS) sets is presented for fluid segmentation in OCT volumes for exudative age related macular degeneration (EAMD) subjects. The proposed method includes three main steps: 1) The inner limiting membrane (ILM) and the retinal pigment epithelium (RPE) layers are segmented using proposed methods based on graph shortest path in NS domain. A flattened RPE boundary is calculated such that all three types of fluid regions, intra-retinal, sub-retinal and sub-RPE, are located above it. 2) Seed points for fluid (object) and tissue (background) are initialized for graph cut by the proposed automated method. 3) A new cost function is proposed in kernel space, and is minimized with max-flow/min-cut algorithms, leading to a binary segmentation. Important properties of the proposed steps are proven and quantitative performance of each step is analyzed separately. The proposed method is evaluated using a publicly available dataset referred as Optima and a local dataset from the UMN clinic. For fluid segmentation in 2D individual slices, the proposed method outperforms the previously proposed methods by 18%, 21% with respect to the dice coefficient and sensitivity, respectively, on the Optima dataset, and by 16%, 11% and 12% with respect to the dice coefficient, sensitivity and precision, respectively, on the local UMN dataset. Finally, for 3D fluid volume segmentation, the proposed method achieves true positive rate (TPR) and false positive rate (FPR) of 90% and 0.74%, respectively, with a correlation of 95% between automated and expert manual segmentations using linear regression analysis. PMID:29059257

  3. An automated procedure for detection of IDP's dwellings using VHR satellite imagery

    NASA Astrophysics Data System (ADS)

    Jenerowicz, Malgorzata; Kemper, Thomas; Soille, Pierre

    2011-11-01

    This paper presents the results for the estimation of dwellings structures in Al Salam IDP Camp, Southern Darfur, based on Very High Resolution multispectral satellite images obtained by implementation of Mathematical Morphology analysis. A series of image processing procedures, feature extraction methods and textural analysis have been applied in order to provide reliable information about dwellings structures. One of the issues in this context is related to similarity of the spectral response of thatched dwellings' roofs and the surroundings in the IDP camps, where the exploitation of multispectral information is crucial. This study shows the advantage of automatic extraction approach and highlights the importance of detailed spatial and spectral information analysis based on multi-temporal dataset. The additional data fusion of high-resolution panchromatic band with lower resolution multispectral bands of WorldView-2 satellite has positive influence on results and thereby can be useful for humanitarian aid agency, providing support of decisions and estimations of population especially in situations when frequent revisits by space imaging system are the only possibility of continued monitoring.

  4. Multi-Dimensional Signal Processing Research Program

    DTIC Science & Technology

    1981-09-30

    applications to real-time image processing and analysis. A specific long-range application is the automated processing of aerial reconnaissance imagery...Non-supervised image segmentation is a potentially im- portant operation in the automated processing of aerial reconnaissance pho- tographs since it

  5. RIEMS: a software pipeline for sensitive and comprehensive taxonomic classification of reads from metagenomics datasets.

    PubMed

    Scheuch, Matthias; Höper, Dirk; Beer, Martin

    2015-03-03

    Fuelled by the advent and subsequent development of next generation sequencing technologies, metagenomics became a powerful tool for the analysis of microbial communities both scientifically and diagnostically. The biggest challenge is the extraction of relevant information from the huge sequence datasets generated for metagenomics studies. Although a plethora of tools are available, data analysis is still a bottleneck. To overcome the bottleneck of data analysis, we developed an automated computational workflow called RIEMS - Reliable Information Extraction from Metagenomic Sequence datasets. RIEMS assigns every individual read sequence within a dataset taxonomically by cascading different sequence analyses with decreasing stringency of the assignments using various software applications. After completion of the analyses, the results are summarised in a clearly structured result protocol organised taxonomically. The high accuracy and performance of RIEMS analyses were proven in comparison with other tools for metagenomics data analysis using simulated sequencing read datasets. RIEMS has the potential to fill the gap that still exists with regard to data analysis for metagenomics studies. The usefulness and power of RIEMS for the analysis of genuine sequencing datasets was demonstrated with an early version of RIEMS in 2011 when it was used to detect the orthobunyavirus sequences leading to the discovery of Schmallenberg virus.

  6. GWATCH: a web platform for automated gene association discovery analysis.

    PubMed

    Svitin, Anton; Malov, Sergey; Cherkasov, Nikolay; Geerts, Paul; Rotkevich, Mikhail; Dobrynin, Pavel; Shevchenko, Andrey; Guan, Li; Troyer, Jennifer; Hendrickson, Sher; Dilks, Holli Hutcheson; Oleksyk, Taras K; Donfield, Sharyne; Gomperts, Edward; Jabs, Douglas A; Sezgin, Efe; Van Natta, Mark; Harrigan, P Richard; Brumme, Zabrina L; O'Brien, Stephen J

    2014-01-01

    As genome-wide sequence analyses for complex human disease determinants are expanding, it is increasingly necessary to develop strategies to promote discovery and validation of potential disease-gene associations. Here we present a dynamic web-based platform - GWATCH - that automates and facilitates four steps in genetic epidemiological discovery: 1) Rapid gene association search and discovery analysis of large genome-wide datasets; 2) Expanded visual display of gene associations for genome-wide variants (SNPs, indels, CNVs), including Manhattan plots, 2D and 3D snapshots of any gene region, and a dynamic genome browser illustrating gene association chromosomal regions; 3) Real-time validation/replication of candidate or putative genes suggested from other sources, limiting Bonferroni genome-wide association study (GWAS) penalties; 4) Open data release and sharing by eliminating privacy constraints (The National Human Genome Research Institute (NHGRI) Institutional Review Board (IRB), informed consent, The Health Insurance Portability and Accountability Act (HIPAA) of 1996 etc.) on unabridged results, which allows for open access comparative and meta-analysis. GWATCH is suitable for both GWAS and whole genome sequence association datasets. We illustrate the utility of GWATCH with three large genome-wide association studies for HIV-AIDS resistance genes screened in large multicenter cohorts; however, association datasets from any study can be uploaded and analyzed by GWATCH.

  7. Application of Huang-Hilbert Transforms to Geophysical Datasets

    NASA Technical Reports Server (NTRS)

    Duffy, Dean G.

    2003-01-01

    The Huang-Hilbert transform is a promising new method for analyzing nonstationary and nonlinear datasets. In this talk I will apply this technique to several important geophysical datasets. To understand the strengths and weaknesses of this method, multi- year, hourly datasets of the sea level heights and solar radiation will be analyzed. Then we will apply this transform to the analysis of gravity waves observed in a mesoscale observational net.

  8. The coming paradigm shift: A transition from manual to automated microscopy.

    PubMed

    Farahani, Navid; Monteith, Corey E

    2016-01-01

    The field of pathology has used light microscopy (LM) extensively since the mid-19(th) century for examination of histological tissue preparations. This technology has remained the foremost tool in use by pathologists even as other fields have undergone a great change in recent years through new technologies. However, as new microscopy techniques are perfected and made available, this reliance on the standard LM will likely begin to change. Advanced imaging involving both diffraction-limited and subdiffraction techniques are bringing nondestructive, high-resolution, molecular-level imaging to pathology. Some of these technologies can produce three-dimensional (3D) datasets from sampled tissues. In addition, block-face/tissue-sectioning techniques are already providing automated, large-scale 3D datasets of whole specimens. These datasets allow pathologists to see an entire sample with all of its spatial information intact, and furthermore allow image analysis such as detection, segmentation, and classification, which are impossible in standard LM. It is likely that these technologies herald a major paradigm shift in the field of pathology.

  9. Integrated approach using multi-platform sensors for enhanced high-resolution daily ice cover product

    NASA Astrophysics Data System (ADS)

    Bonev, George; Gladkova, Irina; Grossberg, Michael; Romanov, Peter; Helfrich, Sean

    2016-09-01

    The ultimate objective of this work is to improve characterization of the ice cover distribution in the polar areas, to improve sea ice mapping and to develop a new automated real-time high spatial resolution multi-sensor ice extent and ice edge product for use in operational applications. Despite a large number of currently available automated satellite-based sea ice extent datasets, analysts at the National Ice Center tend to rely on original satellite imagery (provided by satellite optical, passive microwave and active microwave sensors) mainly because the automated products derived from satellite optical data have gaps in the area coverage due to clouds and darkness, passive microwave products have poor spatial resolution, automated ice identifications based on radar data are not quite reliable due to a considerable difficulty in discriminating between the ice cover and rough ice-free ocean surface due to winds. We have developed a multisensor algorithm that first extracts maximum information on the sea ice cover from imaging instruments VIIRS and MODIS, including regions covered by thin, semitransparent clouds, then supplements the output by the microwave measurements and finally aggregates the results into a cloud gap free daily product. This ability to identify ice cover underneath thin clouds, which is usually masked out by traditional cloud detection algorithms, allows for expansion of the effective coverage of the sea ice maps and thus more accurate and detailed delineation of the ice edge. We have also developed a web-based monitoring system that allows comparison of our daily ice extent product with the several other independent operational daily products.

  10. The MATISSE analysis of large spectral datasets from the ESO Archive

    NASA Astrophysics Data System (ADS)

    Worley, C.; de Laverny, P.; Recio-Blanco, A.; Hill, V.; Vernisse, Y.; Ordenovic, C.; Bijaoui, A.

    2010-12-01

    The automated stellar classification algorithm, MATISSE, has been developed at the Observatoire de la Côte d'Azur (OCA) in order to determine stellar temperatures, gravities and chemical abundances for large datasets of stellar spectra. The Gaia Data Processing and Analysis Consortium (DPAC) has selected MATISSE as one of the key programmes to be used in the analysis of the Gaia Radial Velocity Spectrometer (RVS) spectra. MATISSE is currently being used to analyse large datasets of spectra from the ESO archive with the primary goal of producing advanced data products to be made available in the ESO database via the Virtual Observatory. This is also an invaluable opportunity to identify and address issues that can be encountered with the analysis large samples of real spectra prior to the launch of Gaia in 2012. The analysis of the archived spectra of the FEROS spectrograph is currently underway and preliminary results are presented.

  11. SeqFIRE: a web application for automated extraction of indel regions and conserved blocks from protein multiple sequence alignments.

    PubMed

    Ajawatanawong, Pravech; Atkinson, Gemma C; Watson-Haigh, Nathan S; Mackenzie, Bryony; Baldauf, Sandra L

    2012-07-01

    Analyses of multiple sequence alignments generally focus on well-defined conserved sequence blocks, while the rest of the alignment is largely ignored or discarded. This is especially true in phylogenomics, where large multigene datasets are produced through automated pipelines. However, some of the most powerful phylogenetic markers have been found in the variable length regions of multiple alignments, particularly insertions/deletions (indels) in protein sequences. We have developed Sequence Feature and Indel Region Extractor (SeqFIRE) to enable the automated identification and extraction of indels from protein sequence alignments. The program can also extract conserved blocks and identify fast evolving sites using a combination of conservation and entropy. All major variables can be adjusted by the user, allowing them to identify the sets of variables most suited to a particular analysis or dataset. Thus, all major tasks in preparing an alignment for further analysis are combined in a single flexible and user-friendly program. The output includes a numbered list of indels, alignments in NEXUS format with indels annotated or removed and indel-only matrices. SeqFIRE is a user-friendly web application, freely available online at www.seqfire.org/.

  12. The RABiT: a rapid automated biodosimetry tool for radiological triage. II. Technological developments.

    PubMed

    Garty, Guy; Chen, Youhua; Turner, Helen C; Zhang, Jian; Lyulko, Oleksandra V; Bertucci, Antonella; Xu, Yanping; Wang, Hongliang; Simaan, Nabil; Randers-Pehrson, Gerhard; Lawrence Yao, Y; Brenner, David J

    2011-08-01

    Over the past five years the Center for Minimally Invasive Radiation Biodosimetry at Columbia University has developed the Rapid Automated Biodosimetry Tool (RABiT), a completely automated, ultra-high throughput biodosimetry workstation. This paper describes recent upgrades and reliability testing of the RABiT. The RABiT analyses fingerstick-derived blood samples to estimate past radiation exposure or to identify individuals exposed above or below a cut-off dose. Through automated robotics, lymphocytes are extracted from fingerstick blood samples into filter-bottomed multi-well plates. Depending on the time since exposure, the RABiT scores either micronuclei or phosphorylation of the histone H2AX, in an automated robotic system, using filter-bottomed multi-well plates. Following lymphocyte culturing, fixation and staining, the filter bottoms are removed from the multi-well plates and sealed prior to automated high-speed imaging. Image analysis is performed online using dedicated image processing hardware. Both the sealed filters and the images are archived. We have developed a new robotic system for lymphocyte processing, making use of an upgraded laser power and parallel processing of four capillaries at once. This system has allowed acceleration of lymphocyte isolation, the main bottleneck of the RABiT operation, from 12 to 2 sec/sample. Reliability tests have been performed on all robotic subsystems. Parallel handling of multiple samples through the use of dedicated, purpose-built, robotics and high speed imaging allows analysis of up to 30,000 samples per day.

  13. The RABiT: A Rapid Automated Biodosimetry Tool For Radiological Triage. II. Technological Developments

    PubMed Central

    Garty, Guy; Chen, Youhua; Turner, Helen; Zhang, Jian; Lyulko, Oleksandra; Bertucci, Antonella; Xu, Yanping; Wang, Hongliang; Simaan, Nabil; Randers-Pehrson, Gerhard; Yao, Y. Lawrence; Brenner, David J.

    2011-01-01

    Purpose Over the past five years the Center for Minimally Invasive Radiation Biodosimetry at Columbia University has developed the Rapid Automated Biodosimetry Tool (RABiT), a completely automated, ultra-high throughput biodosimetry workstation. This paper describes recent upgrades and reliability testing of the RABiT. Materials and methods The RABiT analyzes fingerstick-derived blood samples to estimate past radiation exposure or to identify individuals exposed above or below a cutoff dose. Through automated robotics, lymphocytes are extracted from fingerstick blood samples into filter-bottomed multi-well plates. Depending on the time since exposure, the RABiT scores either micronuclei or phosphorylation of the histone H2AX, in an automated robotic system, using filter-bottomed multi-well plates. Following lymphocyte culturing, fixation and staining, the filter bottoms are removed from the multi-well plates and sealed prior to automated high-speed imaging. Image analysis is performed online using dedicated image processing hardware. Both the sealed filters and the images are archived. Results We have developed a new robotic system for lymphocyte processing, making use of an upgraded laser power and parallel processing of four capillaries at once. This system has allowed acceleration of lymphocyte isolation, the main bottleneck of the RABiT operation, from 12 to 2 sec/sample. Reliability tests have been performed on all robotic subsystems. Conclusions Parallel handling of multiple samples through the use of dedicated, purpose-built, robotics and high speed imaging allows analysis of up to 30,000 samples per day. PMID:21557703

  14. A large-scale dataset of solar event reports from automated feature recognition modules

    NASA Astrophysics Data System (ADS)

    Schuh, Michael A.; Angryk, Rafal A.; Martens, Petrus C.

    2016-05-01

    The massive repository of images of the Sun captured by the Solar Dynamics Observatory (SDO) mission has ushered in the era of Big Data for Solar Physics. In this work, we investigate the entire public collection of events reported to the Heliophysics Event Knowledgebase (HEK) from automated solar feature recognition modules operated by the SDO Feature Finding Team (FFT). With the SDO mission recently surpassing five years of operations, and over 280,000 event reports for seven types of solar phenomena, we present the broadest and most comprehensive large-scale dataset of the SDO FFT modules to date. We also present numerous statistics on these modules, providing valuable contextual information for better understanding and validating of the individual event reports and the entire dataset as a whole. After extensive data cleaning through exploratory data analysis, we highlight several opportunities for knowledge discovery from data (KDD). Through these important prerequisite analyses presented here, the results of KDD from Solar Big Data will be overall more reliable and better understood. As the SDO mission remains operational over the coming years, these datasets will continue to grow in size and value. Future versions of this dataset will be analyzed in the general framework established in this work and maintained publicly online for easy access by the community.

  15. Physiologic Waveform Analysis for Early Detection of Hemorrhage during Transport and Higher Echelon Medical Care of Combat Casualties

    DTIC Science & Technology

    2014-03-01

    waveforms that are easier to measure than ABP (e.g., pulse oximeter waveforms); (3) a NIH SBIR Phase I proposal with Retia Medical to develop automated...the training dataset. Integrating the technique with non-invasive pulse transit time (PTT) was most effective. The integrated technique specifically...the peripheral ABP waveforms in the training dataset. These techniques included the rudimentary mean ABP technique, the classic pulse pressure times

  16. Quantification of source impact to PM using three-dimensional weighted factor model analysis on multi-site data

    NASA Astrophysics Data System (ADS)

    Shi, Guoliang; Peng, Xing; Huangfu, Yanqi; Wang, Wei; Xu, Jiao; Tian, Yingze; Feng, Yinchang; Ivey, Cesunica E.; Russell, Armistead G.

    2017-07-01

    Source apportionment technologies are used to understand the impacts of important sources of particulate matter (PM) air quality, and are widely used for both scientific studies and air quality management. Generally, receptor models apportion speciated PM data from a single sampling site. With the development of large scale monitoring networks, PM speciation are observed at multiple sites in an urban area. For these situations, the models should account for three factors, or dimensions, of the PM, including the chemical species concentrations, sampling periods and sampling site information, suggesting the potential power of a three-dimensional source apportionment approach. However, the principle of three-dimensional Parallel Factor Analysis (Ordinary PARAFAC) model does not always work well in real environmental situations for multi-site receptor datasets. In this work, a new three-way receptor model, called "multi-site three way factor analysis" model is proposed to deal with the multi-site receptor datasets. Synthetic datasets were developed and introduced into the new model to test its performance. Average absolute error (AAE, between estimated and true contributions) for extracted sources were all less than 50%. Additionally, three-dimensional ambient datasets from a Chinese mega-city, Chengdu, were analyzed using this new model to assess the application. Four factors are extracted by the multi-site WFA3 model: secondary source have the highest contributions (64.73 and 56.24 μg/m3), followed by vehicular exhaust (30.13 and 33.60 μg/m3), crustal dust (26.12 and 29.99 μg/m3) and coal combustion (10.73 and 14.83 μg/m3). The model was also compared to PMF, with general agreement, though PMF suggested a lower crustal contribution.

  17. Linking Automated Data Analysis and Visualization with Applications in Developmental Biology and High-Energy Physics

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ruebel, Oliver

    2009-11-20

    Knowledge discovery from large and complex collections of today's scientific datasets is a challenging task. With the ability to measure and simulate more processes at increasingly finer spatial and temporal scales, the increasing number of data dimensions and data objects is presenting tremendous challenges for data analysis and effective data exploration methods and tools. Researchers are overwhelmed with data and standard tools are often insufficient to enable effective data analysis and knowledge discovery. The main objective of this thesis is to provide important new capabilities to accelerate scientific knowledge discovery form large, complex, and multivariate scientific data. The research coveredmore » in this thesis addresses these scientific challenges using a combination of scientific visualization, information visualization, automated data analysis, and other enabling technologies, such as efficient data management. The effectiveness of the proposed analysis methods is demonstrated via applications in two distinct scientific research fields, namely developmental biology and high-energy physics.Advances in microscopy, image analysis, and embryo registration enable for the first time measurement of gene expression at cellular resolution for entire organisms. Analysis of high-dimensional spatial gene expression datasets is a challenging task. By integrating data clustering and visualization, analysis of complex, time-varying, spatial gene expression patterns and their formation becomes possible. The analysis framework MATLAB and the visualization have been integrated, making advanced analysis tools accessible to biologist and enabling bioinformatic researchers to directly integrate their analysis with the visualization. Laser wakefield particle accelerators (LWFAs) promise to be a new compact source of high-energy particles and radiation, with wide applications ranging from medicine to physics. To gain insight into the complex physical processes of particle acceleration, physicists model LWFAs computationally. The datasets produced by LWFA simulations are (i) extremely large, (ii) of varying spatial and temporal resolution, (iii) heterogeneous, and (iv) high-dimensional, making analysis and knowledge discovery from complex LWFA simulation data a challenging task. To address these challenges this thesis describes the integration of the visualization system VisIt and the state-of-the-art index/query system FastBit, enabling interactive visual exploration of extremely large three-dimensional particle datasets. Researchers are especially interested in beams of high-energy particles formed during the course of a simulation. This thesis describes novel methods for automatic detection and analysis of particle beams enabling a more accurate and efficient data analysis process. By integrating these automated analysis methods with visualization, this research enables more accurate, efficient, and effective analysis of LWFA simulation data than previously possible.« less

  18. Global, Persistent, Real-time Multi-sensor Automated Satellite Image Analysis and Crop Forecasting in Commercial Cloud

    NASA Astrophysics Data System (ADS)

    Brumby, S. P.; Warren, M. S.; Keisler, R.; Chartrand, R.; Skillman, S.; Franco, E.; Kontgis, C.; Moody, D.; Kelton, T.; Mathis, M.

    2016-12-01

    Cloud computing, combined with recent advances in machine learning for computer vision, is enabling understanding of the world at a scale and at a level of space and time granularity never before feasible. Multi-decadal Earth remote sensing datasets at the petabyte scale (8×10^15 bits) are now available in commercial cloud, and new satellite constellations will generate daily global coverage at a few meters per pixel. Public and commercial satellite observations now provide a wide range of sensor modalities, from traditional visible/infrared to dual-polarity synthetic aperture radar (SAR). This provides the opportunity to build a continuously updated map of the world supporting the academic community and decision-makers in government, finanace and industry. We report on work demonstrating country-scale agricultural forecasting, and global-scale land cover/land, use mapping using a range of public and commercial satellite imagery. We describe processing over a petabyte of compressed raw data from 2.8 quadrillion pixels (2.8 petapixels) acquired by the US Landsat and MODIS programs over the past 40 years. Using commodity cloud computing resources, we convert the imagery to a calibrated, georeferenced, multiresolution tiled format suited for machine-learning analysis. We believe ours is the first application to process, in less than a day, on generally available resources, over a petabyte of scientific image data. We report on work combining this imagery with time-series SAR collected by ESA Sentinel 1. We report on work using this reprocessed dataset for experiments demonstrating country-scale food production monitoring, an indicator for famine early warning. We apply remote sensing science and machine learning algorithms to detect and classify agricultural crops and then estimate crop yields and detect threats to food security (e.g., flooding, drought). The software platform and analysis methodology also support monitoring water resources, forests and other general indicators of environmental health, and can detect growth and changes in cities that are displacing historical agricultural zones.

  19. The Challenges of Data Rate and Data Accuracy in the Analysis of Volcanic Systems: An Assessment Using Multi-Parameter Data from the 2012-2013 Eruption Sequence at White Island, New Zealand

    NASA Astrophysics Data System (ADS)

    Jolly, A. D.; Christenson, B. W.; Neuberg, J. W.; Fournier, N.; Mazot, A.; Kilgour, G.; Jolly, G. E.

    2014-12-01

    Volcano monitoring is usually undertaken with the collection of both automated and manual data that form a multi-parameter time-series having a wide range of sampling rates and measurement accuracies. Assessments of hazards and risks ultimately rely on incorporating this information into usable form, first for the scientists to interpret, and then for the public and relevant stakeholders. One important challenge is in building appropriate and efficient strategies to compare and interpret data from these exceptionally different datasets. The White Island volcanic system entered a new eruptive state beginning in mid-2012 and continuing through the present time. Eruptive activity during this period comprised small phreatic and phreato-magmatic events in August 2012, August 2013 and October 2013 and the intrusion of a small dome that was first observed in November 2012. We examine the chemical and geophysical dataset to assess the effects of small magma batches on the shallow hydrothermal system. The analysis incorporates high data rate (100 Hz) seismic, and infrasound data, lower data rate (1 Hz to 5 min sampling interval) GPS, tilt-meter, and gravity data and very low data rate geochemical time series (sampling intervals from days to months). The analysis is further informed by visual observations of lake level changes, geysering activity through crater lake vents, and changes in fumarolic discharges. We first focus on the problems of incorporating the range of observables into coherent time frame dependant conceptual models. We then show examples where high data rate information may be improved through new processing methods and where low data rate information may be collected more frequently without loss of fidelity. By this approach we hope to improve the accuracy and efficiency of interpretations of volcano unrest and thereby improve hazard assessments.

  20. Multi-body Dynamic Contact Analysis Tool for Transmission Design

    DTIC Science & Technology

    2003-04-01

    frequencies were computed in COSMIC NASTRAN, and were validated against the published experimental modal analysis [17]. • Using assumed time domain... modal superposition. • Results from the structural analysis (mode shapes or forced response) were converted into IDEAS universal format (dataset 55...ARMY RESEARCH LABORATORY Multi-body Dynamic Contact Analysis Tool for Transmission Design SBIR Phase II Final Report by

  1. An Automated Method to Identify Mesoscale Convective Complexes in the Regional Climate Model Evaluation System

    NASA Astrophysics Data System (ADS)

    Whitehall, K. D.; Jenkins, G. S.; Mattmann, C. A.; Waliser, D. E.; Kim, J.; Goodale, C. E.; Hart, A. F.; Ramirez, P.; Whittell, J.; Zimdars, P. A.

    2012-12-01

    Mesoscale convective complexes (MCCs) are large (2 - 3 x 105 km2) nocturnal convectively-driven weather systems that are generally associated with high precipitation events in short durations (less than 12hrs) in various locations through out the tropics and midlatitudes (Maddox 1980). These systems are particularly important for climate in the West Sahel region, where the precipitation associated with them is a principal component of the rainfall season (Laing and Fritsch 1993). These systems occur on weather timescales and are historically identified from weather data analysis via manual and more recently automated processes (Miller and Fritsch 1991, Nesbett 2006, Balmey and Reason 2012). The Regional Climate Model Evaluation System (RCMES) is an open source tool designed for easy evaluation of climate and Earth system data through access to standardized datasets, and intrinsic tools that perform common analysis and visualization tasks (Hart et al. 2011). The RCMES toolkit also provides the flexibility of user-defined subroutines for further metrics, visualization and even dataset manipulation. The purpose of this study is to present a methodology for identifying MCCs in observation datasets using the RCMES framework. TRMM 3 hourly datasets will be used to demonstrate the methodology for 2005 boreal summer. This method promotes the use of open source software for scientific data systems to address a concern to multiple stakeholders in the earth sciences. A historical MCC dataset provides a platform with regards to further studies of the variability of frequency on various timescales of MCCs that is important for many including climate scientists, meteorologists, water resource managers, and agriculturalists. The methodology of using RCMES for searching and clipping datasets will engender a new realm of studies as users of the system will no longer be restricted to solely using the datasets as they reside in their own local systems; instead will be afforded rapid, effective, and transparent access, processing and visualization of the wealth of remote sensing datasets and climate model outputs available.

  2. Automated synthesis, insertion and detection of polyps for CT colonography

    NASA Astrophysics Data System (ADS)

    Sezille, Nicolas; Sadleir, Robert J. T.; Whelan, Paul F.

    2003-03-01

    CT Colonography (CTC) is a new non-invasive colon imaging technique which has the potential to replace conventional colonoscopy for colorectal cancer screening. A novel system which facilitates automated detection of colorectal polyps at CTC is introduced. As exhaustive testing of such a system using real patient data is not feasible, more complete testing is achieved through synthesis of artificial polyps and insertion into real datasets. The polyp insertion is semi-automatic: candidate points are manually selected using a custom GUI, suitable points are determined automatically from an analysis of the local neighborhood surrounding each of the candidate points. Local density and orientation information are used to generate polyps based on an elliptical model. Anomalies are identified from the modified dataset by analyzing the axial images. Detected anomalies are classified as potential polyps or natural features using 3D morphological techniques. The final results are flagged for review. The system was evaluated using 15 scenarios. The sensitivity of the system was found to be 65% with 34% false positive detections. Automated diagnosis at CTC is possible and thorough testing is facilitated by augmenting real patient data with computer generated polyps. Ultimately, automated diagnosis will enhance standard CTC and increase performance.

  3. Assessment of Bias in the National Mosaic and Multi-Sensor QPE (NMQ/Q2) Reanalysis Radar-Only Estimate

    NASA Astrophysics Data System (ADS)

    Nelson, B. R.; Prat, O. P.; Stevens, S. E.; Seo, D. J.; Zhang, J.; Howard, K.

    2014-12-01

    The processing of radar-only precipitation via the reanalysis from the National Mosaic and Multi-Sensor QPE (NMQ/Q2) based on the WSR-88D Next-generation Radar (NEXRAD) network over Continental United States (CONUS) is nearly completed for the period covering from 2001 to 2012. Reanalysis data are available at 1-km and 5-minute resolution. An important step in generating the best possible precipitation data is to assess the bias in the radar-only product. In this work, we use data from a combination of rain gauge networks to assess the bias in the NMQ reanalysis. Rain gauge networks such as the Hydrometeorological Automated Data System (HADS), the Automated Surface Observing Systems (ASOS), the Climate Reference Network (CRN), and the Global Historical Climatology Network Daily (GHCN-D) are combined for use in the assessment. These rain gauge networks vary in spatial density and temporal resolution. The challenge hence is to optimally utilize them to assess the bias at the finest resolution possible. For initial assessment, we propose to subset the CONUS data in climatologically representative domains, and perform bias assessment using information in the Q2 dataset on precipitation type and phase.

  4. A DICOM-based 2nd generation Molecular Imaging Data Grid implementing the IHE XDS-i integration profile.

    PubMed

    Lee, Jasper; Zhang, Jianguo; Park, Ryan; Dagliyan, Grant; Liu, Brent; Huang, H K

    2012-07-01

    A Molecular Imaging Data Grid (MIDG) was developed to address current informatics challenges in archival, sharing, search, and distribution of preclinical imaging studies between animal imaging facilities and investigator sites. This manuscript presents a 2nd generation MIDG replacing the Globus Toolkit with a new system architecture that implements the IHE XDS-i integration profile. Implementation and evaluation were conducted using a 3-site interdisciplinary test-bed at the University of Southern California. The 2nd generation MIDG design architecture replaces the initial design's Globus Toolkit with dedicated web services and XML-based messaging for dedicated management and delivery of multi-modality DICOM imaging datasets. The Cross-enterprise Document Sharing for Imaging (XDS-i) integration profile from the field of enterprise radiology informatics was adopted into the MIDG design because streamlined image registration, management, and distribution dataflow are likewise needed in preclinical imaging informatics systems as in enterprise PACS application. Implementation of the MIDG is demonstrated at the University of Southern California Molecular Imaging Center (MIC) and two other sites with specified hardware, software, and network bandwidth. Evaluation of the MIDG involves data upload, download, and fault-tolerance testing scenarios using multi-modality animal imaging datasets collected at the USC Molecular Imaging Center. The upload, download, and fault-tolerance tests of the MIDG were performed multiple times using 12 collected animal study datasets. Upload and download times demonstrated reproducibility and improved real-world performance. Fault-tolerance tests showed that automated failover between Grid Node Servers has minimal impact on normal download times. Building upon the 1st generation concepts and experiences, the 2nd generation MIDG system improves accessibility of disparate animal-model molecular imaging datasets to users outside a molecular imaging facility's LAN using a new architecture, dataflow, and dedicated DICOM-based management web services. Productivity and efficiency of preclinical research for translational sciences investigators has been further streamlined for multi-center study data registration, management, and distribution.

  5. Comparison of unsupervised classification methods for brain tumor segmentation using multi-parametric MRI.

    PubMed

    Sauwen, N; Acou, M; Van Cauter, S; Sima, D M; Veraart, J; Maes, F; Himmelreich, U; Achten, E; Van Huffel, S

    2016-01-01

    Tumor segmentation is a particularly challenging task in high-grade gliomas (HGGs), as they are among the most heterogeneous tumors in oncology. An accurate delineation of the lesion and its main subcomponents contributes to optimal treatment planning, prognosis and follow-up. Conventional MRI (cMRI) is the imaging modality of choice for manual segmentation, and is also considered in the vast majority of automated segmentation studies. Advanced MRI modalities such as perfusion-weighted imaging (PWI), diffusion-weighted imaging (DWI) and magnetic resonance spectroscopic imaging (MRSI) have already shown their added value in tumor tissue characterization, hence there have been recent suggestions of combining different MRI modalities into a multi-parametric MRI (MP-MRI) approach for brain tumor segmentation. In this paper, we compare the performance of several unsupervised classification methods for HGG segmentation based on MP-MRI data including cMRI, DWI, MRSI and PWI. Two independent MP-MRI datasets with a different acquisition protocol were available from different hospitals. We demonstrate that a hierarchical non-negative matrix factorization variant which was previously introduced for MP-MRI tumor segmentation gives the best performance in terms of mean Dice-scores for the pathologic tissue classes on both datasets.

  6. Automatic nipple detection on 3D images of an automated breast ultrasound system (ABUS)

    NASA Astrophysics Data System (ADS)

    Javanshir Moghaddam, Mandana; Tan, Tao; Karssemeijer, Nico; Platel, Bram

    2014-03-01

    Recent studies have demonstrated that applying Automated Breast Ultrasound in addition to mammography in women with dense breasts can lead to additional detection of small, early stage breast cancers which are occult in corresponding mammograms. In this paper, we proposed a fully automatic method for detecting the nipple location in 3D ultrasound breast images acquired from Automated Breast Ultrasound Systems. The nipple location is a valuable landmark to report the position of possible abnormalities in a breast or to guide image registration. To detect the nipple location, all images were normalized. Subsequently, features have been extracted in a multi scale approach and classification experiments were performed using a gentle boost classifier to identify the nipple location. The method was applied on a dataset of 100 patients with 294 different 3D ultrasound views from Siemens and U-systems acquisition systems. Our database is a representative sample of cases obtained in clinical practice by four medical centers. The automatic method could accurately locate the nipple in 90% of AP (Anterior-Posterior) views and in 79% of the other views.

  7. Igloo-Plot: a tool for visualization of multidimensional datasets.

    PubMed

    Kuntal, Bhusan K; Ghosh, Tarini Shankar; Mande, Sharmila S

    2014-01-01

    Advances in science and technology have resulted in an exponential growth of multivariate (or multi-dimensional) datasets which are being generated from various research areas especially in the domain of biological sciences. Visualization and analysis of such data (with the objective of uncovering the hidden patterns therein) is an important and challenging task. We present a tool, called Igloo-Plot, for efficient visualization of multidimensional datasets. The tool addresses some of the key limitations of contemporary multivariate visualization and analysis tools. The visualization layout, not only facilitates an easy identification of clusters of data-points having similar feature compositions, but also the 'marker features' specific to each of these clusters. The applicability of the various functionalities implemented herein is demonstrated using several well studied multi-dimensional datasets. Igloo-Plot is expected to be a valuable resource for researchers working in multivariate data mining studies. Igloo-Plot is available for download from: http://metagenomics.atc.tcs.com/IglooPlot/. Copyright © 2014 Elsevier Inc. All rights reserved.

  8. Detection of QT prolongation using a novel ECG analysis algorithm applying intelligent automation: Prospective blinded evaluation using the Cardiac Safety Research Consortium ECG database

    PubMed Central

    Green, Cynthia L.; Kligfield, Paul; George, Samuel; Gussak, Ihor; Vajdic, Branislav; Sager, Philip; Krucoff, Mitchell W.

    2013-01-01

    Background The Cardiac Safety Research Consortium (CSRC) provides both “learning” and blinded “testing” digital ECG datasets from thorough QT (TQT) studies annotated for submission to the US Food and Drug Administration (FDA) to developers of ECG analysis technologies. This manuscript reports the first results from a blinded “testing” dataset that examines Developer re-analysis of original Sponsor-reported core laboratory data. Methods 11,925 anonymized ECGs including both moxifloxacin and placebo arms of a parallel-group TQT in 191 subjects were blindly analyzed using a novel ECG analysis algorithm applying intelligent automation. Developer measured ECG intervals were submitted to CSRC for unblinding, temporal reconstruction of the TQT exposures, and statistical comparison to core laboratory findings previously submitted to FDA by the pharmaceutical sponsor. Primary comparisons included baseline-adjusted interval measurements, baseline- and placebo-adjusted moxifloxacin QTcF changes (ddQTcF), and associated variability measures. Results Developer and Sponsor-reported baseline-adjusted data were similar with average differences less than 1 millisecond (ms) for all intervals. Both Developer and Sponsor-reported data demonstrated assay sensitivity with similar ddQTcF changes. Average within-subject standard deviation for triplicate QTcF measurements was significantly lower for Developer than Sponsor-reported data (5.4 ms and 7.2 ms, respectively; p<0.001). Conclusion The virtually automated ECG algorithm used for this analysis produced similar yet less variable TQT results compared to the Sponsor-reported study, without the use of a manual core laboratory. These findings indicate CSRC ECG datasets can be useful for evaluating novel methods and algorithms for determining QT/QTc prolongation by drugs. While the results should not constitute endorsement of specific algorithms by either CSRC or FDA, the value of a public domain digital ECG warehouse to provide prospective, blinded comparisons of ECG technologies applied for QT/QTc measurement is illustrated. PMID:22424006

  9. ANTONIA perfusion and stroke. A software tool for the multi-purpose analysis of MR perfusion-weighted datasets and quantitative ischemic stroke assessment.

    PubMed

    Forkert, N D; Cheng, B; Kemmling, A; Thomalla, G; Fiehler, J

    2014-01-01

    The objective of this work is to present the software tool ANTONIA, which has been developed to facilitate a quantitative analysis of perfusion-weighted MRI (PWI) datasets in general as well as the subsequent multi-parametric analysis of additional datasets for the specific purpose of acute ischemic stroke patient dataset evaluation. Three different methods for the analysis of DSC or DCE PWI datasets are currently implemented in ANTONIA, which can be case-specifically selected based on the study protocol. These methods comprise a curve fitting method as well as a deconvolution-based and deconvolution-free method integrating a previously defined arterial input function. The perfusion analysis is extended for the purpose of acute ischemic stroke analysis by additional methods that enable an automatic atlas-based selection of the arterial input function, an analysis of the perfusion-diffusion and DWI-FLAIR mismatch as well as segmentation-based volumetric analyses. For reliability evaluation, the described software tool was used by two observers for quantitative analysis of 15 datasets from acute ischemic stroke patients to extract the acute lesion core volume, FLAIR ratio, perfusion-diffusion mismatch volume with manually as well as automatically selected arterial input functions, and follow-up lesion volume. The results of this evaluation revealed that the described software tool leads to highly reproducible results for all parameters if the automatic arterial input function selection method is used. Due to the broad selection of processing methods that are available in the software tool, ANTONIA is especially helpful to support image-based perfusion and acute ischemic stroke research projects.

  10. Computer-aided liver volumetry: performance of a fully-automated, prototype post-processing solution for whole-organ and lobar segmentation based on MDCT imaging.

    PubMed

    Fananapazir, Ghaneh; Bashir, Mustafa R; Marin, Daniele; Boll, Daniel T

    2015-06-01

    To evaluate the performance of a prototype, fully-automated post-processing solution for whole-liver and lobar segmentation based on MDCT datasets. A polymer liver phantom was used to assess accuracy of post-processing applications comparing phantom volumes determined via Archimedes' principle with MDCT segmented datasets. For the IRB-approved, HIPAA-compliant study, 25 patients were enrolled. Volumetry performance compared the manual approach with the automated prototype, assessing intraobserver variability, and interclass correlation for whole-organ and lobar segmentation using ANOVA comparison. Fidelity of segmentation was evaluated qualitatively. Phantom volume was 1581.0 ± 44.7 mL, manually segmented datasets estimated 1628.0 ± 47.8 mL, representing a mean overestimation of 3.0%, automatically segmented datasets estimated 1601.9 ± 0 mL, representing a mean overestimation of 1.3%. Whole-liver and segmental volumetry demonstrated no significant intraobserver variability for neither manual nor automated measurements. For whole-liver volumetry, automated measurement repetitions resulted in identical values; reproducible whole-organ volumetry was also achieved with manual segmentation, p(ANOVA) 0.98. For lobar volumetry, automated segmentation improved reproducibility over manual approach, without significant measurement differences for either methodology, p(ANOVA) 0.95-0.99. Whole-organ and lobar segmentation results from manual and automated segmentation showed no significant differences, p(ANOVA) 0.96-1.00. Assessment of segmentation fidelity found that segments I-IV/VI showed greater segmentation inaccuracies compared to the remaining right hepatic lobe segments. Automated whole-liver segmentation showed non-inferiority of fully-automated whole-liver segmentation compared to manual approaches with improved reproducibility and post-processing duration; automated dual-seed lobar segmentation showed slight tendencies for underestimating the right hepatic lobe volume and greater variability in edge detection for the left hepatic lobe compared to manual segmentation.

  11. Object classification and outliers analysis in the forthcoming Gaia mission

    NASA Astrophysics Data System (ADS)

    Ordóñez-Blanco, D.; Arcay, B.; Dafonte, C.; Manteiga, M.; Ulla, A.

    2010-12-01

    Astrophysics is evolving towards the rational optimization of costly observational material by the intelligent exploitation of large astronomical databases from both terrestrial telescopes and spatial mission archives. However, there has been relatively little advance in the development of highly scalable data exploitation and analysis tools needed to generate the scientific returns from these large and expensively obtained datasets. Among the upcoming projects of astronomical instrumentation, Gaia is the next cornerstone ESA mission. The Gaia survey foresees the creation of a data archive and its future exploitation with automated or semi-automated analysis tools. This work reviews some of the work that is being developed by the Gaia Data Processing and Analysis Consortium for the object classification and analysis of outliers in the forthcoming mission.

  12. Software for the Integration of Multiomics Experiments in Bioconductor.

    PubMed

    Ramos, Marcel; Schiffer, Lucas; Re, Angela; Azhar, Rimsha; Basunia, Azfar; Rodriguez, Carmen; Chan, Tiffany; Chapman, Phil; Davis, Sean R; Gomez-Cabrero, David; Culhane, Aedin C; Haibe-Kains, Benjamin; Hansen, Kasper D; Kodali, Hanish; Louis, Marie S; Mer, Arvind S; Riester, Markus; Morgan, Martin; Carey, Vince; Waldron, Levi

    2017-11-01

    Multiomics experiments are increasingly commonplace in biomedical research and add layers of complexity to experimental design, data integration, and analysis. R and Bioconductor provide a generic framework for statistical analysis and visualization, as well as specialized data classes for a variety of high-throughput data types, but methods are lacking for integrative analysis of multiomics experiments. The MultiAssayExperiment software package, implemented in R and leveraging Bioconductor software and design principles, provides for the coordinated representation of, storage of, and operation on multiple diverse genomics data. We provide the unrestricted multiple 'omics data for each cancer tissue in The Cancer Genome Atlas as ready-to-analyze MultiAssayExperiment objects and demonstrate in these and other datasets how the software simplifies data representation, statistical analysis, and visualization. The MultiAssayExperiment Bioconductor package reduces major obstacles to efficient, scalable, and reproducible statistical analysis of multiomics data and enhances data science applications of multiple omics datasets. Cancer Res; 77(21); e39-42. ©2017 AACR . ©2017 American Association for Cancer Research.

  13. The MIND PALACE: A Multi-Spectral Imaging and Spectroscopy Database for Planetary Science

    NASA Astrophysics Data System (ADS)

    Eshelman, E.; Doloboff, I.; Hara, E. K.; Uckert, K.; Sapers, H. M.; Abbey, W.; Beegle, L. W.; Bhartia, R.

    2017-12-01

    The Multi-Instrument Database (MIND) is the web-based home to a well-characterized set of analytical data collected by a suite of deep-UV fluorescence/Raman instruments built at the Jet Propulsion Laboratory (JPL). Samples derive from a growing body of planetary surface analogs, mineral and microbial standards, meteorites, spacecraft materials, and other astrobiologically relevant materials. In addition to deep-UV spectroscopy, datasets stored in MIND are obtained from a variety of analytical techniques obtained over multiple spatial and spectral scales including electron microscopy, optical microscopy, infrared spectroscopy, X-ray fluorescence, and direct fluorescence imaging. Multivariate statistical analysis techniques, primarily Principal Component Analysis (PCA), are used to guide interpretation of these large multi-analytical spectral datasets. Spatial co-referencing of integrated spectral/visual maps is performed using QGIS (geographic information system software). Georeferencing techniques transform individual instrument data maps into a layered co-registered data cube for analysis across spectral and spatial scales. The body of data in MIND is intended to serve as a permanent, reliable, and expanding database of deep-UV spectroscopy datasets generated by this unique suite of JPL-based instruments on samples of broad planetary science interest.

  14. A multi-strategy approach to informative gene identification from gene expression data.

    PubMed

    Liu, Ziying; Phan, Sieu; Famili, Fazel; Pan, Youlian; Lenferink, Anne E G; Cantin, Christiane; Collins, Catherine; O'Connor-McCourt, Maureen D

    2010-02-01

    An unsupervised multi-strategy approach has been developed to identify informative genes from high throughput genomic data. Several statistical methods have been used in the field to identify differentially expressed genes. Since different methods generate different lists of genes, it is very challenging to determine the most reliable gene list and the appropriate method. This paper presents a multi-strategy method, in which a combination of several data analysis techniques are applied to a given dataset and a confidence measure is established to select genes from the gene lists generated by these techniques to form the core of our final selection. The remainder of the genes that form the peripheral region are subject to exclusion or inclusion into the final selection. This paper demonstrates this methodology through its application to an in-house cancer genomics dataset and a public dataset. The results indicate that our method provides more reliable list of genes, which are validated using biological knowledge, biological experiments, and literature search. We further evaluated our multi-strategy method by consolidating two pairs of independent datasets, each pair is for the same disease, but generated by different labs using different platforms. The results showed that our method has produced far better results.

  15. The Earth Observation Monitor - Automated monitoring and alerting for spatial time-series data based on OGC web services

    NASA Astrophysics Data System (ADS)

    Eberle, J.; Hüttich, C.; Schmullius, C.

    2014-12-01

    Spatial time series data are freely available around the globe from earth observation satellites and meteorological stations for many years until now. They provide useful and important information to detect ongoing changes of the environment; but for end-users it is often too complex to extract this information out of the original time series datasets. This issue led to the development of the Earth Observation Monitor (EOM), an operational framework and research project to provide simple access, analysis and monitoring tools for global spatial time series data. A multi-source data processing middleware in the backend is linked to MODIS data from Land Processes Distributed Archive Center (LP DAAC) and Google Earth Engine as well as daily climate station data from NOAA National Climatic Data Center. OGC Web Processing Services are used to integrate datasets from linked data providers or external OGC-compliant interfaces to the EOM. Users can either use the web portal (webEOM) or the mobile application (mobileEOM) to execute these processing services and to retrieve the requested data for a given point or polygon in userfriendly file formats (CSV, GeoTiff). Beside providing just data access tools, users can also do further time series analyses like trend calculations, breakpoint detections or the derivation of phenological parameters from vegetation time series data. Furthermore data from climate stations can be aggregated over a given time interval. Calculated results can be visualized in the client and downloaded for offline usage. Automated monitoring and alerting of the time series data integrated by the user is provided by an OGC Sensor Observation Service with a coupled OGC Web Notification Service. Users can decide which datasets and parameters are monitored with a given filter expression (e.g., precipitation value higher than x millimeter per day, occurrence of a MODIS Fire point, detection of a time series anomaly). Datasets integrated in the SOS service are updated in near-realtime based on the linked data providers mentioned above. An alert is automatically pushed to the user if the new data meets the conditions of the registered filter expression. This monitoring service is available on the web portal with alerting by email and within the mobile app with alerting by email and push notification.

  16. Multi-Scale Drought Analysis using Thermal Remote Sensing: A Case Study in Georgia’s Altamaha River Watershed

    NASA Astrophysics Data System (ADS)

    Jacobs, J. M.; Bhat, S.; Choi, M.; Mecikalski, J. R.; Anderson, M. C.

    2009-12-01

    The unprecedented recent droughts in the Southeast US caused reservoir levels to drop dangerously low, elevated wildfire hazard risks, reduced hydropower generation and caused severe economic hardships. Most drought indices are based on recent rainfall or changes in vegetation condition. However in heterogeneous landscapes, soils and vegetation (type and cover) combine to differentially stress regions even under similar weather conditions. This is particularly true for the heterogeneous landscapes and highly variable rainfall in the Southeastern United States. This research examines the spatiotemperal evolution of watershed scale drought using a remotely sensed stress index. Using thermal-infrared imagery, a fully automated inverse model of Atmosphere-Land Exchange (ALEXI), GIS datasets and analysis tools, modeled daily surface moisture stress is examined at a 10-km resolution grid covering central to southern Georgia. Regional results are presented for the 2000-2008 period. The ALEXI evaporative stress index (ESI) is compared to existing regional drought products and validated using local hydrologic measurements in Georgia’s Altamaha River watershed at scales from 10 to 10,000 km2.

  17. Automated matching of multiple terrestrial laser scans for stem mapping without the use of artificial references

    NASA Astrophysics Data System (ADS)

    Liu, Jingbin; Liang, Xinlian; Hyyppä, Juha; Yu, Xiaowei; Lehtomäki, Matti; Pyörälä, Jiri; Zhu, Lingli; Wang, Yunsheng; Chen, Ruizhi

    2017-04-01

    Terrestrial laser scanning has been widely used to analyze the 3D structure of a forest in detail and to generate data at the level of a reference plot for forest inventories without destructive measurements. Multi-scan terrestrial laser scanning is more commonly applied to collect plot-level data so that all of the stems can be detected and analyzed. However, it is necessary to match the point clouds of multiple scans to yield a point cloud with automated processing. Mismatches between datasets will lead to errors during the processing of multi-scan data. Classic registration methods based on flat surfaces cannot be directly applied in forest environments; therefore, artificial reference objects have conventionally been used to assist with scan matching. The use of artificial references requires additional labor and expertise, as well as greatly increasing the cost. In this study, we present an automated processing method for plot-level stem mapping that matches multiple scans without artificial references. In contrast to previous studies, the registration method developed in this study exploits the natural geometric characteristics among a set of tree stems in a plot and combines the point clouds of multiple scans into a unified coordinate system. Integrating multiple scans improves the overall performance of stem mapping in terms of the correctness of tree detection, as well as the bias and the root-mean-square errors of forest attributes such as diameter at breast height and tree height. In addition, the automated processing method makes stem mapping more reliable and consistent among plots, reduces the costs associated with plot-based stem mapping, and enhances the efficiency.

  18. Automated three-dimensional quantification of myocardial perfusion and brain SPECT.

    PubMed

    Slomka, P J; Radau, P; Hurwitz, G A; Dey, D

    2001-01-01

    To allow automated and objective reading of nuclear medicine tomography, we have developed a set of tools for clinical analysis of myocardial perfusion tomography (PERFIT) and Brain SPECT/PET (BRASS). We exploit algorithms for image registration and use three-dimensional (3D) "normal models" for individual patient comparisons to composite datasets on a "voxel-by-voxel basis" in order to automatically determine the statistically significant abnormalities. A multistage, 3D iterative inter-subject registration of patient images to normal templates is applied, including automated masking of the external activity before final fit. In separate projects, the software has been applied to the analysis of myocardial perfusion SPECT, as well as brain SPECT and PET data. Automatic reading was consistent with visual analysis; it can be applied to the whole spectrum of clinical images, and aid physicians in the daily interpretation of tomographic nuclear medicine images.

  19. Scalable Earth-observation Analytics for Geoscientists: Spacetime Extensions to the Array Database SciDB

    NASA Astrophysics Data System (ADS)

    Appel, Marius; Lahn, Florian; Pebesma, Edzer; Buytaert, Wouter; Moulds, Simon

    2016-04-01

    Today's amount of freely available data requires scientists to spend large parts of their work on data management. This is especially true in environmental sciences when working with large remote sensing datasets, such as obtained from earth-observation satellites like the Sentinel fleet. Many frameworks like SpatialHadoop or Apache Spark address the scalability but target programmers rather than data analysts, and are not dedicated to imagery or array data. In this work, we use the open-source data management and analytics system SciDB to bring large earth-observation datasets closer to analysts. Its underlying data representation as multidimensional arrays fits naturally to earth-observation datasets, distributes storage and computational load over multiple instances by multidimensional chunking, and also enables efficient time-series based analyses, which is usually difficult using file- or tile-based approaches. Existing interfaces to R and Python furthermore allow for scalable analytics with relatively little learning effort. However, interfacing SciDB and file-based earth-observation datasets that come as tiled temporal snapshots requires a lot of manual bookkeeping during ingestion, and SciDB natively only supports loading data from CSV-like and custom binary formatted files, which currently limits its practical use in earth-observation analytics. To make it easier to work with large multi-temporal datasets in SciDB, we developed software tools that enrich SciDB with earth observation metadata and allow working with commonly used file formats: (i) the SciDB extension library scidb4geo simplifies working with spatiotemporal arrays by adding relevant metadata to the database and (ii) the Geospatial Data Abstraction Library (GDAL) driver implementation scidb4gdal allows to ingest and export remote sensing imagery from and to a large number of file formats. Using added metadata on temporal resolution and coverage, the GDAL driver supports time-based ingestion of imagery to existing multi-temporal SciDB arrays. While our SciDB plugin works directly in the database, the GDAL driver has been specifically developed using a minimum amount of external dependencies (i.e. CURL). Source code for both tools is available from github [1]. We present these tools in a case-study that demonstrates the ingestion of multi-temporal tiled earth-observation data to SciDB, followed by a time-series analysis using R and SciDBR. Through the exclusive use of open-source software, our approach supports reproducibility in scalable large-scale earth-observation analytics. In the future, these tools can be used in an automated way to let scientists only work on ready-to-use SciDB arrays to significantly reduce the data management workload for domain scientists. [1] https://github.com/mappl/scidb4geo} and \\url{https://github.com/mappl/scidb4gdal

  20. Automated Morphological Analysis of Microglia After Stroke.

    PubMed

    Heindl, Steffanie; Gesierich, Benno; Benakis, Corinne; Llovera, Gemma; Duering, Marco; Liesz, Arthur

    2018-01-01

    Microglia are the resident immune cells of the brain and react quickly to changes in their environment with transcriptional regulation and morphological changes. Brain tissue injury such as ischemic stroke induces a local inflammatory response encompassing microglial activation. The change in activation status of a microglia is reflected in its gradual morphological transformation from a highly ramified into a less ramified or amoeboid cell shape. For this reason, the morphological changes of microglia are widely utilized to quantify microglial activation and studying their involvement in virtually all brain diseases. However, the currently available methods, which are mainly based on manual rating of immunofluorescent microscopic images, are often inaccurate, rater biased, and highly time consuming. To address these issues, we created a fully automated image analysis tool, which enables the analysis of microglia morphology from a confocal Z-stack and providing up to 59 morphological features. We developed the algorithm on an exploratory dataset of microglial cells from a stroke mouse model and validated the findings on an independent data set. In both datasets, we could demonstrate the ability of the algorithm to sensitively discriminate between the microglia morphology in the peri-infarct and the contralateral, unaffected cortex. Dimensionality reduction by principal component analysis allowed to generate a highly sensitive compound score for microglial shape analysis. Finally, we tested for concordance of results between the novel automated analysis tool and the conventional manual analysis and found a high degree of correlation. In conclusion, our novel method for the fully automatized analysis of microglia morphology shows excellent accuracy and time efficacy compared to traditional analysis methods. This tool, which we make openly available, could find application to study microglia morphology using fluorescence imaging in a wide range of brain disease models.

  1. Wide-Open: Accelerating public data release by automating detection of overdue datasets

    PubMed Central

    Poon, Hoifung; Howe, Bill

    2017-01-01

    Open data is a vital pillar of open science and a key enabler for reproducibility, data reuse, and novel discoveries. Enforcement of open-data policies, however, largely relies on manual efforts, which invariably lag behind the increasingly automated generation of biological data. To address this problem, we developed a general approach to automatically identify datasets overdue for public release by applying text mining to identify dataset references in published articles and parse query results from repositories to determine if the datasets remain private. We demonstrate the effectiveness of this approach on 2 popular National Center for Biotechnology Information (NCBI) repositories: Gene Expression Omnibus (GEO) and Sequence Read Archive (SRA). Our Wide-Open system identified a large number of overdue datasets, which spurred administrators to respond directly by releasing 400 datasets in one week. PMID:28594819

  2. Wide-Open: Accelerating public data release by automating detection of overdue datasets.

    PubMed

    Grechkin, Maxim; Poon, Hoifung; Howe, Bill

    2017-06-01

    Open data is a vital pillar of open science and a key enabler for reproducibility, data reuse, and novel discoveries. Enforcement of open-data policies, however, largely relies on manual efforts, which invariably lag behind the increasingly automated generation of biological data. To address this problem, we developed a general approach to automatically identify datasets overdue for public release by applying text mining to identify dataset references in published articles and parse query results from repositories to determine if the datasets remain private. We demonstrate the effectiveness of this approach on 2 popular National Center for Biotechnology Information (NCBI) repositories: Gene Expression Omnibus (GEO) and Sequence Read Archive (SRA). Our Wide-Open system identified a large number of overdue datasets, which spurred administrators to respond directly by releasing 400 datasets in one week.

  3. Improving long-term global precipitation dataset using multi-sensor surface soil moisture retrievals and the soil moisture analysis rainfall tool (SMART)

    USDA-ARS?s Scientific Manuscript database

    Using multiple historical satellite surface soil moisture products, the Kalman Filtering-based Soil Moisture Analysis Rainfall Tool (SMART) is applied to improve the accuracy of a multi-decadal global daily rainfall product that has been bias-corrected to match the monthly totals of available rain g...

  4. Fully automated analysis of multi-resolution four-channel micro-array genotyping data

    NASA Astrophysics Data System (ADS)

    Abbaspour, Mohsen; Abugharbieh, Rafeef; Podder, Mohua; Tebbutt, Scott J.

    2006-03-01

    We present a fully-automated and robust microarray image analysis system for handling multi-resolution images (down to 3-micron with sizes up to 80 MBs per channel). The system is developed to provide rapid and accurate data extraction for our recently developed microarray analysis and quality control tool (SNP Chart). Currently available commercial microarray image analysis applications are inefficient, due to the considerable user interaction typically required. Four-channel DNA microarray technology is a robust and accurate tool for determining genotypes of multiple genetic markers in individuals. It plays an important role in the state of the art trend where traditional medical treatments are to be replaced by personalized genetic medicine, i.e. individualized therapy based on the patient's genetic heritage. However, fast, robust, and precise image processing tools are required for the prospective practical use of microarray-based genetic testing for predicting disease susceptibilities and drug effects in clinical practice, which require a turn-around timeline compatible with clinical decision-making. In this paper we have developed a fully-automated image analysis platform for the rapid investigation of hundreds of genetic variations across multiple genes. Validation tests indicate very high accuracy levels for genotyping results. Our method achieves a significant reduction in analysis time, from several hours to just a few minutes, and is completely automated requiring no manual interaction or guidance.

  5. BioBlend: automating pipeline analyses within Galaxy and CloudMan.

    PubMed

    Sloggett, Clare; Goonasekera, Nuwan; Afgan, Enis

    2013-07-01

    We present BioBlend, a unified API in a high-level language (python) that wraps the functionality of Galaxy and CloudMan APIs. BioBlend makes it easy for bioinformaticians to automate end-to-end large data analysis, from scratch, in a way that is highly accessible to collaborators, by allowing them to both provide the required infrastructure and automate complex analyses over large datasets within the familiar Galaxy environment. http://bioblend.readthedocs.org/. Automated installation of BioBlend is available via PyPI (e.g. pip install bioblend). Alternatively, the source code is available from the GitHub repository (https://github.com/afgane/bioblend) under the MIT open source license. The library has been tested and is working on Linux, Macintosh and Windows-based systems.

  6. Multi-observation PET image analysis for patient follow-up quantitation and therapy assessment

    NASA Astrophysics Data System (ADS)

    David, S.; Visvikis, D.; Roux, C.; Hatt, M.

    2011-09-01

    In positron emission tomography (PET) imaging, an early therapeutic response is usually characterized by variations of semi-quantitative parameters restricted to maximum SUV measured in PET scans during the treatment. Such measurements do not reflect overall tumor volume and radiotracer uptake variations. The proposed approach is based on multi-observation image analysis for merging several PET acquisitions to assess tumor metabolic volume and uptake variations. The fusion algorithm is based on iterative estimation using a stochastic expectation maximization (SEM) algorithm. The proposed method was applied to simulated and clinical follow-up PET images. We compared the multi-observation fusion performance to threshold-based methods, proposed for the assessment of the therapeutic response based on functional volumes. On simulated datasets the adaptive threshold applied independently on both images led to higher errors than the ASEM fusion and on clinical datasets it failed to provide coherent measurements for four patients out of seven due to aberrant delineations. The ASEM method demonstrated improved and more robust estimation of the evaluation leading to more pertinent measurements. Future work will consist in extending the methodology and applying it to clinical multi-tracer datasets in order to evaluate its potential impact on the biological tumor volume definition for radiotherapy applications.

  7. Automated X-ray image analysis for cargo security: Critical review and future promise.

    PubMed

    Rogers, Thomas W; Jaccard, Nicolas; Morton, Edward J; Griffin, Lewis D

    2017-01-01

    We review the relatively immature field of automated image analysis for X-ray cargo imagery. There is increasing demand for automated analysis methods that can assist in the inspection and selection of containers, due to the ever-growing volumes of traded cargo and the increasing concerns that customs- and security-related threats are being smuggled across borders by organised crime and terrorist networks. We split the field into the classical pipeline of image preprocessing and image understanding. Preprocessing includes: image manipulation; quality improvement; Threat Image Projection (TIP); and material discrimination and segmentation. Image understanding includes: Automated Threat Detection (ATD); and Automated Contents Verification (ACV). We identify several gaps in the literature that need to be addressed and propose ideas for future research. Where the current literature is sparse we borrow from the single-view, multi-view, and CT X-ray baggage domains, which have some characteristics in common with X-ray cargo.

  8. Increasing conclusiveness of clinical breath analysis by improved baseline correction of multi capillary column - ion mobility spectrometry (MCC-IMS) data.

    PubMed

    Szymańska, Ewa; Tinnevelt, Gerjen H; Brodrick, Emma; Williams, Mark; Davies, Antony N; van Manen, Henk-Jan; Buydens, Lutgarde M C

    2016-08-05

    Current challenges of clinical breath analysis include large data size and non-clinically relevant variations observed in exhaled breath measurements, which should be urgently addressed with competent scientific data tools. In this study, three different baseline correction methods are evaluated within a previously developed data size reduction strategy for multi capillary column - ion mobility spectrometry (MCC-IMS) datasets. Introduced for the first time in breath data analysis, the Top-hat method is presented as the optimum baseline correction method. A refined data size reduction strategy is employed in the analysis of a large breathomic dataset on a healthy and respiratory disease population. New insights into MCC-IMS spectra differences associated with respiratory diseases are provided, demonstrating the additional value of the refined data analysis strategy in clinical breath analysis. Copyright © 2016 Elsevier B.V. All rights reserved.

  9. Statistical Projections for Multi-resolution, Multi-dimensional Visual Data Exploration and Analysis

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hoa T. Nguyen; Stone, Daithi; E. Wes Bethel

    2016-01-01

    An ongoing challenge in visual exploration and analysis of large, multi-dimensional datasets is how to present useful, concise information to a user for some specific visualization tasks. Typical approaches to this problem have proposed either reduced-resolution versions of data, or projections of data, or both. These approaches still have some limitations such as consuming high computation or suffering from errors. In this work, we explore the use of a statistical metric as the basis for both projections and reduced-resolution versions of data, with a particular focus on preserving one key trait in data, namely variation. We use two different casemore » studies to explore this idea, one that uses a synthetic dataset, and another that uses a large ensemble collection produced by an atmospheric modeling code to study long-term changes in global precipitation. The primary findings of our work are that in terms of preserving the variation signal inherent in data, that using a statistical measure more faithfully preserves this key characteristic across both multi-dimensional projections and multi-resolution representations than a methodology based upon averaging.« less

  10. Eco-hydrological Wireless Sensor Network and upscaling method research in the Heihe River Basin, China

    NASA Astrophysics Data System (ADS)

    Jin, Rui; kang, Jian

    2017-04-01

    Wireless Sensor Networks are recognized as one of most important near-surface components of GEOSS (Global Earth Observation System of Systems), with flourish development of low-cost, robust and integrated data loggers and sensors. A nested eco-hydrological wireless sensor network (EHWSN) was installed in the up- and middle-reaches of the Heihe River Basin, operated to obtain multi-scale observation of soil moisture, soil temperature and land surface temperature from 2012 till now. The spatial distribution of EHWSN was optimally designed based on the geo-statistical theory, with the aim to capture the spatial variations and temporal dynamics of soil moisture and soil temperature, and to produce ground truth at grid scale for validating the related remote sensing products and model simulation in the heterogeneous land surface. In terms of upscaling research, we have developed a set of method to aggregate multi-point WSN observations to grid scale ( 1km), including regression kriging estimation to utilize multi-resource remote sensing auxiliary information, block kriging with homogeneous measurement errors, and bayesian-based upscaling algorithm that utilizes MODIS-derived apparent thermal inertia. All the EHWSN observation are organized as datasets to be freely published at http://westdc.westgis.ac.cn/hiwater. EHWSN integrates distributed observation nodes to achieve an automated, intelligent and remote-controllable network that provides superior integrated, standardized and automated observation capabilities for hydrological and ecological processes research at the basin scale.

  11. Automated liver elasticity calculation for 3D MRE

    NASA Astrophysics Data System (ADS)

    Dzyubak, Bogdan; Glaser, Kevin J.; Manduca, Armando; Ehman, Richard L.

    2017-03-01

    Magnetic Resonance Elastography (MRE) is a phase-contrast MRI technique which calculates quantitative stiffness images, called elastograms, by imaging the propagation of acoustic waves in tissues. It is used clinically to diagnose liver fibrosis. Automated analysis of MRE is difficult as the corresponding MRI magnitude images (which contain anatomical information) are affected by intensity inhomogeneity, motion artifact, and poor tissue- and edge-contrast. Additionally, areas with low wave amplitude must be excluded. An automated algorithm has already been successfully developed and validated for clinical 2D MRE. 3D MRE acquires substantially more data and, due to accelerated acquisition, has exacerbated image artifacts. Also, the current 3D MRE processing does not yield a confidence map to indicate MRE wave quality and guide ROI selection, as is the case in 2D. In this study, extension of the 2D automated method, with a simple wave-amplitude metric, was developed and validated against an expert reader in a set of 57 patient exams with both 2D and 3D MRE. The stiffness discrepancy with the expert for 3D MRE was -0.8% +/- 9.45% and was better than discrepancy with the same reader for 2D MRE (-3.2% +/- 10.43%), and better than the inter-reader discrepancy observed in previous studies. There were no automated processing failures in this dataset. Thus, the automated liver elasticity calculation (ALEC) algorithm is able to calculate stiffness from 3D MRE data with minimal bias and good precision, while enabling stiffness measurements to be fully reproducible and to be easily performed on the large 3D MRE datasets.

  12. Astronaut Photography of the Earth: A Long-Term Dataset for Earth Systems Research, Applications, and Education

    NASA Technical Reports Server (NTRS)

    Stefanov, William L.

    2017-01-01

    The NASA Earth observations dataset obtained by humans in orbit using handheld film and digital cameras is freely accessible to the global community through the online searchable database at https://eol.jsc.nasa.gov, and offers a useful compliment to traditional ground-commanded sensor data. The dataset includes imagery from the NASA Mercury (1961) through present-day International Space Station (ISS) programs, and currently totals over 2.6 million individual frames. Geographic coverage of the dataset includes land and oceans areas between approximately 52 degrees North and South latitudes, but is spatially and temporally discontinuous. The photographic dataset includes some significant impediments for immediate research, applied, and educational use: commercial RGB films and camera systems with overlapping bandpasses; use of different focal length lenses, unconstrained look angles, and variable spacecraft altitudes; and no native geolocation information. Such factors led to this dataset being underutilized by the community but recent advances in automated and semi-automated image geolocation, image feature classification, and web-based services are adding new value to the astronaut-acquired imagery. A coupled ground software and on-orbit hardware system for the ISS is in development for planned deployment in mid-2017; this system will capture camera pose information for each astronaut photograph to allow automated, full georegistration of the data. The ground system component of the system is currently in use to fully georeference imagery collected in response to International Disaster Charter activations, and the auto-registration procedures are being applied to the extensive historical database of imagery to add value for research and educational purposes. In parallel, machine learning techniques are being applied to automate feature identification and classification throughout the dataset, in order to build descriptive metadata that will improve search capabilities. It is expected that these value additions will increase interest and use of the dataset by the global community.

  13. Automating Geospatial Visualizations with Smart Default Renderers for Data Exploration Web Applications

    NASA Astrophysics Data System (ADS)

    Ekenes, K.

    2017-12-01

    This presentation will outline the process of creating a web application for exploring large amounts of scientific geospatial data using modern automated cartographic techniques. Traditional cartographic methods, including data classification, may inadvertently hide geospatial and statistical patterns in the underlying data. This presentation demonstrates how to use smart web APIs that quickly analyze the data when it loads, and provides suggestions for the most appropriate visualizations based on the statistics of the data. Since there are just a few ways to visualize any given dataset well, it is imperative to provide smart default color schemes tailored to the dataset as opposed to static defaults. Since many users don't go beyond default values, it is imperative that they are provided with smart default visualizations. Multiple functions for automating visualizations are available in the Smart APIs, along with UI elements allowing users to create more than one visualization for a dataset since there isn't a single best way to visualize a given dataset. Since bivariate and multivariate visualizations are particularly difficult to create effectively, this automated approach removes the guesswork out of the process and provides a number of ways to generate multivariate visualizations for the same variables. This allows the user to choose which visualization is most appropriate for their presentation. The methods used in these APIs and the renderers generated by them are not available elsewhere. The presentation will show how statistics can be used as the basis for automating default visualizations of data along continuous ramps, creating more refined visualizations while revealing the spread and outliers of the data. Adding interactive components to instantaneously alter visualizations allows users to unearth spatial patterns previously unknown among one or more variables. These applications may focus on a single dataset that is frequently updated, or configurable for a variety of datasets from multiple sources.

  14. Using iterative cluster merging with improved gap statistics to perform online phenotype discovery in the context of high-throughput RNAi screens

    PubMed Central

    Yin, Zheng; Zhou, Xiaobo; Bakal, Chris; Li, Fuhai; Sun, Youxian; Perrimon, Norbert; Wong, Stephen TC

    2008-01-01

    Background The recent emergence of high-throughput automated image acquisition technologies has forever changed how cell biologists collect and analyze data. Historically, the interpretation of cellular phenotypes in different experimental conditions has been dependent upon the expert opinions of well-trained biologists. Such qualitative analysis is particularly effective in detecting subtle, but important, deviations in phenotypes. However, while the rapid and continuing development of automated microscope-based technologies now facilitates the acquisition of trillions of cells in thousands of diverse experimental conditions, such as in the context of RNA interference (RNAi) or small-molecule screens, the massive size of these datasets precludes human analysis. Thus, the development of automated methods which aim to identify novel and biological relevant phenotypes online is one of the major challenges in high-throughput image-based screening. Ideally, phenotype discovery methods should be designed to utilize prior/existing information and tackle three challenging tasks, i.e. restoring pre-defined biological meaningful phenotypes, differentiating novel phenotypes from known ones and clarifying novel phenotypes from each other. Arbitrarily extracted information causes biased analysis, while combining the complete existing datasets with each new image is intractable in high-throughput screens. Results Here we present the design and implementation of a novel and robust online phenotype discovery method with broad applicability that can be used in diverse experimental contexts, especially high-throughput RNAi screens. This method features phenotype modelling and iterative cluster merging using improved gap statistics. A Gaussian Mixture Model (GMM) is employed to estimate the distribution of each existing phenotype, and then used as reference distribution in gap statistics. This method is broadly applicable to a number of different types of image-based datasets derived from a wide spectrum of experimental conditions and is suitable to adaptively process new images which are continuously added to existing datasets. Validations were carried out on different dataset, including published RNAi screening using Drosophila embryos [Additional files 1, 2], dataset for cell cycle phase identification using HeLa cells [Additional files 1, 3, 4] and synthetic dataset using polygons, our methods tackled three aforementioned tasks effectively with an accuracy range of 85%–90%. When our method is implemented in the context of a Drosophila genome-scale RNAi image-based screening of cultured cells aimed to identifying the contribution of individual genes towards the regulation of cell-shape, it efficiently discovers meaningful new phenotypes and provides novel biological insight. We also propose a two-step procedure to modify the novelty detection method based on one-class SVM, so that it can be used to online phenotype discovery. In different conditions, we compared the SVM based method with our method using various datasets and our methods consistently outperformed SVM based method in at least two of three tasks by 2% to 5%. These results demonstrate that our methods can be used to better identify novel phenotypes in image-based datasets from a wide range of conditions and organisms. Conclusion We demonstrate that our method can detect various novel phenotypes effectively in complex datasets. Experiment results also validate that our method performs consistently under different order of image input, variation of starting conditions including the number and composition of existing phenotypes, and dataset from different screens. In our findings, the proposed method is suitable for online phenotype discovery in diverse high-throughput image-based genetic and chemical screens. PMID:18534020

  15. Multi-Task Convolutional Neural Network for Pose-Invariant Face Recognition

    NASA Astrophysics Data System (ADS)

    Yin, Xi; Liu, Xiaoming

    2018-02-01

    This paper explores multi-task learning (MTL) for face recognition. We answer the questions of how and why MTL can improve the face recognition performance. First, we propose a multi-task Convolutional Neural Network (CNN) for face recognition where identity classification is the main task and pose, illumination, and expression estimations are the side tasks. Second, we develop a dynamic-weighting scheme to automatically assign the loss weight to each side task, which is a crucial problem in MTL. Third, we propose a pose-directed multi-task CNN by grouping different poses to learn pose-specific identity features, simultaneously across all poses. Last but not least, we propose an energy-based weight analysis method to explore how CNN-based MTL works. We observe that the side tasks serve as regularizations to disentangle the variations from the learnt identity features. Extensive experiments on the entire Multi-PIE dataset demonstrate the effectiveness of the proposed approach. To the best of our knowledge, this is the first work using all data in Multi-PIE for face recognition. Our approach is also applicable to in-the-wild datasets for pose-invariant face recognition and achieves comparable or better performance than state of the art on LFW, CFP, and IJB-A datasets.

  16. Joint Facial Action Unit Detection and Feature Fusion: A Multi-conditional Learning Approach.

    PubMed

    Eleftheriadis, Stefanos; Rudovic, Ognjen; Pantic, Maja

    2016-10-05

    Automated analysis of facial expressions can benefit many domains, from marketing to clinical diagnosis of neurodevelopmental disorders. Facial expressions are typically encoded as a combination of facial muscle activations, i.e., action units. Depending on context, these action units co-occur in specific patterns, and rarely in isolation. Yet, most existing methods for automatic action unit detection fail to exploit dependencies among them, and the corresponding facial features. To address this, we propose a novel multi-conditional latent variable model for simultaneous fusion of facial features and joint action unit detection. Specifically, the proposed model performs feature fusion in a generative fashion via a low-dimensional shared subspace, while simultaneously performing action unit detection using a discriminative classification approach. We show that by combining the merits of both approaches, the proposed methodology outperforms existing purely discriminative/generative methods for the target task. To reduce the number of parameters, and avoid overfitting, a novel Bayesian learning approach based on Monte Carlo sampling is proposed, to integrate out the shared subspace. We validate the proposed method on posed and spontaneous data from three publicly available datasets (CK+, DISFA and Shoulder-pain), and show that both feature fusion and joint learning of action units leads to improved performance compared to the state-of-the-art methods for the task.

  17. The role of 3-D interactive visualization in blind surveys of H I in galaxies

    NASA Astrophysics Data System (ADS)

    Punzo, D.; van der Hulst, J. M.; Roerdink, J. B. T. M.; Oosterloo, T. A.; Ramatsoku, M.; Verheijen, M. A. W.

    2015-09-01

    Upcoming H I surveys will deliver large datasets, and automated processing using the full 3-D information (two positional dimensions and one spectral dimension) to find and characterize H I objects is imperative. In this context, visualization is an essential tool for enabling qualitative and quantitative human control on an automated source finding and analysis pipeline. We discuss how Visual Analytics, the combination of automated data processing and human reasoning, creativity and intuition, supported by interactive visualization, enables flexible and fast interaction with the 3-D data, helping the astronomer to deal with the analysis of complex sources. 3-D visualization, coupled to modeling, provides additional capabilities helping the discovery and analysis of subtle structures in the 3-D domain. The requirements for a fully interactive visualization tool are: coupled 1-D/2-D/3-D visualization, quantitative and comparative capabilities, combined with supervised semi-automated analysis. Moreover, the source code must have the following characteristics for enabling collaborative work: open, modular, well documented, and well maintained. We review four state of-the-art, 3-D visualization packages assessing their capabilities and feasibility for use in the case of 3-D astronomical data.

  18. Automated detection of retinal nerve fiber layer defects on fundus images: false positive reduction based on vessel likelihood

    NASA Astrophysics Data System (ADS)

    Muramatsu, Chisako; Ishida, Kyoko; Sawada, Akira; Hatanaka, Yuji; Yamamoto, Tetsuya; Fujita, Hiroshi

    2016-03-01

    Early detection of glaucoma is important to slow down or cease progression of the disease and for preventing total blindness. We have previously proposed an automated scheme for detection of retinal nerve fiber layer defect (NFLD), which is one of the early signs of glaucoma observed on retinal fundus images. In this study, a new multi-step detection scheme was included to improve detection of subtle and narrow NFLDs. In addition, new features were added to distinguish between NFLDs and blood vessels, which are frequent sites of false positives (FPs). The result was evaluated with a new test dataset consisted of 261 cases, including 130 cases with NFLDs. Using the proposed method, the initial detection rate was improved from 82% to 98%. At the sensitivity of 80%, the number of FPs per image was reduced from 4.25 to 1.36. The result indicates the potential usefulness of the proposed method for early detection of glaucoma.

  19. Iterative dataset optimization in automated planning: Implementation for breast and rectal cancer radiotherapy.

    PubMed

    Fan, Jiawei; Wang, Jiazhou; Zhang, Zhen; Hu, Weigang

    2017-06-01

    To develop a new automated treatment planning solution for breast and rectal cancer radiotherapy. The automated treatment planning solution developed in this study includes selection of the iterative optimized training dataset, dose volume histogram (DVH) prediction for the organs at risk (OARs), and automatic generation of clinically acceptable treatment plans. The iterative optimized training dataset is selected by an iterative optimization from 40 treatment plans for left-breast and rectal cancer patients who received radiation therapy. A two-dimensional kernel density estimation algorithm (noted as two parameters KDE) which incorporated two predictive features was implemented to produce the predicted DVHs. Finally, 10 additional new left-breast treatment plans are re-planned using the Pinnacle 3 Auto-Planning (AP) module (version 9.10, Philips Medical Systems) with the objective functions derived from the predicted DVH curves. Automatically generated re-optimized treatment plans are compared with the original manually optimized plans. By combining the iterative optimized training dataset methodology and two parameters KDE prediction algorithm, our proposed automated planning strategy improves the accuracy of the DVH prediction. The automatically generated treatment plans using the dose derived from the predicted DVHs can achieve better dose sparing for some OARs without compromising other metrics of plan quality. The proposed new automated treatment planning solution can be used to efficiently evaluate and improve the quality and consistency of the treatment plans for intensity-modulated breast and rectal cancer radiation therapy. © 2017 American Association of Physicists in Medicine.

  20. SPICODYN: A Toolbox for the Analysis of Neuronal Network Dynamics and Connectivity from Multi-Site Spike Signal Recordings.

    PubMed

    Pastore, Vito Paolo; Godjoski, Aleksandar; Martinoia, Sergio; Massobrio, Paolo

    2018-01-01

    We implemented an automated and efficient open-source software for the analysis of multi-site neuronal spike signals. The software package, named SPICODYN, has been developed as a standalone windows GUI application, using C# programming language with Microsoft Visual Studio based on .NET framework 4.5 development environment. Accepted input data formats are HDF5, level 5 MAT and text files, containing recorded or generated time series spike signals data. SPICODYN processes such electrophysiological signals focusing on: spiking and bursting dynamics and functional-effective connectivity analysis. In particular, for inferring network connectivity, a new implementation of the transfer entropy method is presented dealing with multiple time delays (temporal extension) and with multiple binary patterns (high order extension). SPICODYN is specifically tailored to process data coming from different Multi-Electrode Arrays setups, guarantying, in those specific cases, automated processing. The optimized implementation of the Delayed Transfer Entropy and the High-Order Transfer Entropy algorithms, allows performing accurate and rapid analysis on multiple spike trains from thousands of electrodes.

  1. High-throughput sample processing and sample management; the functional evolution of classical cytogenetic assay towards automation.

    PubMed

    Ramakumar, Adarsh; Subramanian, Uma; Prasanna, Pataje G S

    2015-11-01

    High-throughput individual diagnostic dose assessment is essential for medical management of radiation-exposed subjects after a mass casualty. Cytogenetic assays such as the Dicentric Chromosome Assay (DCA) are recognized as the gold standard by international regulatory authorities. DCA is a multi-step and multi-day bioassay. DCA, as described in the IAEA manual, can be used to assess dose up to 4-6 weeks post-exposure quite accurately but throughput is still a major issue and automation is very essential. The throughput is limited, both in terms of sample preparation as well as analysis of chromosome aberrations. Thus, there is a need to design and develop novel solutions that could utilize extensive laboratory automation for sample preparation, and bioinformatics approaches for chromosome-aberration analysis to overcome throughput issues. We have transitioned the bench-based cytogenetic DCA to a coherent process performing high-throughput automated biodosimetry for individual dose assessment ensuring quality control (QC) and quality assurance (QA) aspects in accordance with international harmonized protocols. A Laboratory Information Management System (LIMS) is designed, implemented and adapted to manage increased sample processing capacity, develop and maintain standard operating procedures (SOP) for robotic instruments, avoid data transcription errors during processing, and automate analysis of chromosome-aberrations using an image analysis platform. Our efforts described in this paper intend to bridge the current technological gaps and enhance the potential application of DCA for a dose-based stratification of subjects following a mass casualty. This paper describes one such potential integrated automated laboratory system and functional evolution of the classical DCA towards increasing critically needed throughput. Published by Elsevier B.V.

  2. SEGMA: An Automatic SEGMentation Approach for Human Brain MRI Using Sliding Window and Random Forests

    PubMed Central

    Serag, Ahmed; Wilkinson, Alastair G.; Telford, Emma J.; Pataky, Rozalia; Sparrow, Sarah A.; Anblagan, Devasuda; Macnaught, Gillian; Semple, Scott I.; Boardman, James P.

    2017-01-01

    Quantitative volumes from brain magnetic resonance imaging (MRI) acquired across the life course may be useful for investigating long term effects of risk and resilience factors for brain development and healthy aging, and for understanding early life determinants of adult brain structure. Therefore, there is an increasing need for automated segmentation tools that can be applied to images acquired at different life stages. We developed an automatic segmentation method for human brain MRI, where a sliding window approach and a multi-class random forest classifier were applied to high-dimensional feature vectors for accurate segmentation. The method performed well on brain MRI data acquired from 179 individuals, analyzed in three age groups: newborns (38–42 weeks gestational age), children and adolescents (4–17 years) and adults (35–71 years). As the method can learn from partially labeled datasets, it can be used to segment large-scale datasets efficiently. It could also be applied to different populations and imaging modalities across the life course. PMID:28163680

  3. Neural Network Target Identification System for False Alarm Reduction

    NASA Technical Reports Server (NTRS)

    Ye, David; Edens, Weston; Lu, Thomas T.; Chao, Tien-Hsin

    2009-01-01

    A multi-stage automated target recognition (ATR) system has been designed to perform computer vision tasks with adequate proficiency in mimicking human vision. The system is able to detect, identify, and track targets of interest. Potential regions of interest (ROIs) are first identified by the detection stage using an Optimum Trade-off Maximum Average Correlation Height (OT-MACH) filter combined with a wavelet transform. False positives are then eliminated by the verification stage using feature extraction methods in conjunction with neural networks. Feature extraction transforms the ROIs using filtering and binning algorithms to create feature vectors. A feed forward back propagation neural network (NN) is then trained to classify each feature vector and remove false positives. This paper discusses the test of the system performance and parameter optimizations process which adapts the system to various targets and datasets. The test results show that the system was successful in substantially reducing the false positive rate when tested on a sonar image dataset.

  4. The Regional Hydrologic Extremes Assessment System: A software framework for hydrologic modeling and data assimilation

    PubMed Central

    Das, Narendra; Stampoulis, Dimitrios; Ines, Amor; Fisher, Joshua B.; Granger, Stephanie; Kawata, Jessie; Han, Eunjin; Behrangi, Ali

    2017-01-01

    The Regional Hydrologic Extremes Assessment System (RHEAS) is a prototype software framework for hydrologic modeling and data assimilation that automates the deployment of water resources nowcasting and forecasting applications. A spatially-enabled database is a key component of the software that can ingest a suite of satellite and model datasets while facilitating the interfacing with Geographic Information System (GIS) applications. The datasets ingested are obtained from numerous space-borne sensors and represent multiple components of the water cycle. The object-oriented design of the software allows for modularity and extensibility, showcased here with the coupling of the core hydrologic model with a crop growth model. RHEAS can exploit multi-threading to scale with increasing number of processors, while the database allows delivery of data products and associated uncertainty through a variety of GIS platforms. A set of three example implementations of RHEAS in the United States and Kenya are described to demonstrate the different features of the system in real-world applications. PMID:28545077

  5. The Regional Hydrologic Extremes Assessment System: A software framework for hydrologic modeling and data assimilation.

    PubMed

    Andreadis, Konstantinos M; Das, Narendra; Stampoulis, Dimitrios; Ines, Amor; Fisher, Joshua B; Granger, Stephanie; Kawata, Jessie; Han, Eunjin; Behrangi, Ali

    2017-01-01

    The Regional Hydrologic Extremes Assessment System (RHEAS) is a prototype software framework for hydrologic modeling and data assimilation that automates the deployment of water resources nowcasting and forecasting applications. A spatially-enabled database is a key component of the software that can ingest a suite of satellite and model datasets while facilitating the interfacing with Geographic Information System (GIS) applications. The datasets ingested are obtained from numerous space-borne sensors and represent multiple components of the water cycle. The object-oriented design of the software allows for modularity and extensibility, showcased here with the coupling of the core hydrologic model with a crop growth model. RHEAS can exploit multi-threading to scale with increasing number of processors, while the database allows delivery of data products and associated uncertainty through a variety of GIS platforms. A set of three example implementations of RHEAS in the United States and Kenya are described to demonstrate the different features of the system in real-world applications.

  6. Development of Super-Ensemble techniques for ocean analyses: the Mediterranean Sea case

    NASA Astrophysics Data System (ADS)

    Pistoia, Jenny; Pinardi, Nadia; Oddo, Paolo; Collins, Matthew; Korres, Gerasimos; Drillet, Yann

    2017-04-01

    Short-term ocean analyses for Sea Surface Temperature SST in the Mediterranean Sea can be improved by a statistical post-processing technique, called super-ensemble. This technique consists in a multi-linear regression algorithm applied to a Multi-Physics Multi-Model Super-Ensemble (MMSE) dataset, a collection of different operational forecasting analyses together with ad-hoc simulations produced by modifying selected numerical model parameterizations. A new linear regression algorithm based on Empirical Orthogonal Function filtering techniques is capable to prevent overfitting problems, even if best performances are achieved when we add correlation to the super-ensemble structure using a simple spatial filter applied after the linear regression. Our outcomes show that super-ensemble performances depend on the selection of an unbiased operator and the length of the learning period, but the quality of the generating MMSE dataset has the largest impact on the MMSE analysis Root Mean Square Error (RMSE) evaluated with respect to observed satellite SST. Lower RMSE analysis estimates result from the following choices: 15 days training period, an overconfident MMSE dataset (a subset with the higher quality ensemble members), and the least square algorithm being filtered a posteriori.

  7. Integrative analysis of gene expression and DNA methylation using unsupervised feature extraction for detecting candidate cancer biomarkers.

    PubMed

    Moon, Myungjin; Nakai, Kenta

    2018-04-01

    Currently, cancer biomarker discovery is one of the important research topics worldwide. In particular, detecting significant genes related to cancer is an important task for early diagnosis and treatment of cancer. Conventional studies mostly focus on genes that are differentially expressed in different states of cancer; however, noise in gene expression datasets and insufficient information in limited datasets impede precise analysis of novel candidate biomarkers. In this study, we propose an integrative analysis of gene expression and DNA methylation using normalization and unsupervised feature extractions to identify candidate biomarkers of cancer using renal cell carcinoma RNA-seq datasets. Gene expression and DNA methylation datasets are normalized by Box-Cox transformation and integrated into a one-dimensional dataset that retains the major characteristics of the original datasets by unsupervised feature extraction methods, and differentially expressed genes are selected from the integrated dataset. Use of the integrated dataset demonstrated improved performance as compared with conventional approaches that utilize gene expression or DNA methylation datasets alone. Validation based on the literature showed that a considerable number of top-ranked genes from the integrated dataset have known relationships with cancer, implying that novel candidate biomarkers can also be acquired from the proposed analysis method. Furthermore, we expect that the proposed method can be expanded for applications involving various types of multi-omics datasets.

  8. Improved cancer detection in automated breast ultrasound by radiologists using Computer Aided Detection.

    PubMed

    van Zelst, J C M; Tan, T; Platel, B; de Jong, M; Steenbakkers, A; Mourits, M; Grivegnee, A; Borelli, C; Karssemeijer, N; Mann, R M

    2017-04-01

    To investigate the effect of dedicated Computer Aided Detection (CAD) software for automated breast ultrasound (ABUS) on the performance of radiologists screening for breast cancer. 90 ABUS views of 90 patients were randomly selected from a multi-institutional archive of cases collected between 2010 and 2013. This dataset included normal cases (n=40) with >1year of follow up, benign (n=30) lesions that were either biopsied or remained stable, and malignant lesions (n=20). Six readers evaluated all cases with and without CAD in two sessions. CAD-software included conventional CAD-marks and an intelligent minimum intensity projection of the breast tissue. Readers reported using a likelihood-of-malignancy scale from 0 to 100. Alternative free-response ROC analysis was used to measure the performance. Without CAD, the average area-under-the-curve (AUC) of the readers was 0.77 and significantly improved with CAD to 0.84 (p=0.001). Sensitivity of all readers improved (range 5.2-10.6%) by using CAD but specificity decreased in four out of six readers (range 1.4-5.7%). No significant difference was observed in the AUC between experienced radiologists and residents both with and without CAD. Dedicated CAD-software for ABUS has the potential to improve the cancer detection rates of radiologists screening for breast cancer. Copyright © 2017 Elsevier B.V. All rights reserved.

  9. XML-based data model and architecture for a knowledge-based grid-enabled problem-solving environment for high-throughput biological imaging.

    PubMed

    Ahmed, Wamiq M; Lenz, Dominik; Liu, Jia; Paul Robinson, J; Ghafoor, Arif

    2008-03-01

    High-throughput biological imaging uses automated imaging devices to collect a large number of microscopic images for analysis of biological systems and validation of scientific hypotheses. Efficient manipulation of these datasets for knowledge discovery requires high-performance computational resources, efficient storage, and automated tools for extracting and sharing such knowledge among different research sites. Newly emerging grid technologies provide powerful means for exploiting the full potential of these imaging techniques. Efficient utilization of grid resources requires the development of knowledge-based tools and services that combine domain knowledge with analysis algorithms. In this paper, we first investigate how grid infrastructure can facilitate high-throughput biological imaging research, and present an architecture for providing knowledge-based grid services for this field. We identify two levels of knowledge-based services. The first level provides tools for extracting spatiotemporal knowledge from image sets and the second level provides high-level knowledge management and reasoning services. We then present cellular imaging markup language, an extensible markup language-based language for modeling of biological images and representation of spatiotemporal knowledge. This scheme can be used for spatiotemporal event composition, matching, and automated knowledge extraction and representation for large biological imaging datasets. We demonstrate the expressive power of this formalism by means of different examples and extensive experimental results.

  10. The Role of Thermodynamic Processes in the Evolution of Single and Multi-banding within Winter Storms

    NASA Astrophysics Data System (ADS)

    Ganetis, Sara Anne

    Mesoscale precipitation bands within Northeast U.S. (NEUS) winter storms result in heterogeneous spatial and temporal snowfall. Several studies have provided analysis of snowbands focusing on larger, meso-beta scale bands with lengths (L) > 200 km known as single bands. NEUS winter storms can also exhibit multiple bands with meso-beta scale (L < 200 km) and similar spatial orientation and when ≥ 3 occur are termed multi-bands; however, the genesis and evolution of multi-bands is less well understood. Unlike single bands, there is no multi-bands climatological study. In addition, there has been little detailed thermodynamic analysis of snowbands. This dissertation utilizes radar observations, reanalyses, and high-resolution model simulations to explore the thermodynamic evolution of single and multi-bands. Bands are identified within 20 cool season (October-April) NEUS storms. The 110-case dataset was classified using a combination of automated and manual methods into: single band only (SINGLE), multi-bands only (MULTI), both single and multi-bands (BOTH), and non-banded (NONE). Multi-bands occur with the presence of a single band in 55.4% of times used in this study, without the presence of a single band 18.1% of the time, and precipitation exhibits no banded characteristics 23.8% of the time. Most MULTI events occur in the northeast quadrant of a developing cyclone poleward of weak-midlevel forcing along a warm front, whereas multi-bands associated with BOTH events mostly occur in the northwest quadrant of mature cyclones associated with strong mid-level frontogenesis and conditional symmetric instability. The non-banded precipitation associated with NONE events occur in the eastern quadrants of developing and mature cyclones lacking mid-level forcing to concentrate the precipitation into bands. A high-resolution mesoscale model is used to explore the evolution of single and multi-bands based on two case studies, one of a single band and one of multi-bands. The multi-bands form in response to intermittent mid-level frontogenetical forcing in a conditionally unstable environment. The bands within their genesis location southeast of the single band move northwest towards the single band by 700-hPa steering flow. This allows for the formation of new multi-bands within the genesis region, unlike the single band that remains fixed to a 700-hPa frontogenesis maximum. Latent heating within the band is shown to increase the intensity and duration of single and multi-bands through decreased geopotential height below the heating maximum that leads to increased convergence within the band.

  11. Human-machine interaction to disambiguate entities in unstructured text and structured datasets

    NASA Astrophysics Data System (ADS)

    Ward, Kevin; Davenport, Jack

    2017-05-01

    Creating entity network graphs is a manual, time consuming process for an intelligence analyst. Beyond the traditional big data problems of information overload, individuals are often referred to by multiple names and shifting titles as they advance in their organizations over time which quickly makes simple string or phonetic alignment methods for entities insufficient. Conversely, automated methods for relationship extraction and entity disambiguation typically produce questionable results with no way for users to vet results, correct mistakes or influence the algorithm's future results. We present an entity disambiguation tool, DRADIS, which aims to bridge the gap between human-centric and machinecentric methods. DRADIS automatically extracts entities from multi-source datasets and models them as a complex set of attributes and relationships. Entities are disambiguated across the corpus using a hierarchical model executed in Spark allowing it to scale to operational sized data. Resolution results are presented to the analyst complete with sourcing information for each mention and relationship allowing analysts to quickly vet the correctness of results as well as correct mistakes. Corrected results are used by the system to refine the underlying model allowing analysts to optimize the general model to better deal with their operational data. Providing analysts with the ability to validate and correct the model to produce a system they can trust enables them to better focus their time on producing higher quality analysis products.

  12. SHARP: Automated monitoring of spacecraft health and status

    NASA Technical Reports Server (NTRS)

    Atkinson, David J.; James, Mark L.; Martin, R. Gaius

    1991-01-01

    Briefly discussed here are the spacecraft and ground systems monitoring process at the Jet Propulsion Laboratory (JPL). Some of the difficulties associated with the existing technology used in mission operations are highlighted. A new automated system based on artificial intelligence technology is described which seeks to overcome many of these limitations. The system, called the Spacecraft Health Automated Reasoning Prototype (SHARP), is designed to automate health and status analysis for multi-mission spacecraft and ground data systems operations. The system has proved to be effective for detecting and analyzing potential spacecraft and ground systems problems by performing real-time analysis of spacecraft and ground data systems engineering telemetry. Telecommunications link analysis of the Voyager 2 spacecraft was the initial focus for evaluation of the system in real-time operations during the Voyager spacecraft encounter with Neptune in August 1989.

  13. SHARP - Automated monitoring of spacecraft health and status

    NASA Technical Reports Server (NTRS)

    Atkinson, David J.; James, Mark L.; Martin, R. G.

    1990-01-01

    Briefly discussed here are the spacecraft and ground systems monitoring process at the Jet Propulsion Laboratory (JPL). Some of the difficulties associated with the existing technology used in mission operations are highlighted. A new automated system based on artificial intelligence technology is described which seeks to overcome many of these limitations. The system, called the Spacecraft Health Automated Reasoning Prototype (SHARP), is designed to automate health and status analysis for multi-mission spacecraft and ground data systems operations. The system has proved to be effective for detecting and analyzing potential spacecraft and ground systems problems by performing real-time analysis of spacecraft and ground data systems engineering telemetry. Telecommunications link analysis of the Voyager 2 spacecraft was the initial focus for evaluation of the system in real-time operations during the Voyager spacecraft encounter with Neptune in August 1989.

  14. Accuracy of patient specific organ-dose estimates obtained using an automated image segmentation algorithm

    NASA Astrophysics Data System (ADS)

    Gilat-Schmidt, Taly; Wang, Adam; Coradi, Thomas; Haas, Benjamin; Star-Lack, Josh

    2016-03-01

    The overall goal of this work is to develop a rapid, accurate and fully automated software tool to estimate patient-specific organ doses from computed tomography (CT) scans using a deterministic Boltzmann Transport Equation solver and automated CT segmentation algorithms. This work quantified the accuracy of organ dose estimates obtained by an automated segmentation algorithm. The investigated algorithm uses a combination of feature-based and atlas-based methods. A multiatlas approach was also investigated. We hypothesize that the auto-segmentation algorithm is sufficiently accurate to provide organ dose estimates since random errors at the organ boundaries will average out when computing the total organ dose. To test this hypothesis, twenty head-neck CT scans were expertly segmented into nine regions. A leave-one-out validation study was performed, where every case was automatically segmented with each of the remaining cases used as the expert atlas, resulting in nineteen automated segmentations for each of the twenty datasets. The segmented regions were applied to gold-standard Monte Carlo dose maps to estimate mean and peak organ doses. The results demonstrated that the fully automated segmentation algorithm estimated the mean organ dose to within 10% of the expert segmentation for regions other than the spinal canal, with median error for each organ region below 2%. In the spinal canal region, the median error was 7% across all data sets and atlases, with a maximum error of 20%. The error in peak organ dose was below 10% for all regions, with a median error below 4% for all organ regions. The multiple-case atlas reduced the variation in the dose estimates and additional improvements may be possible with more robust multi-atlas approaches. Overall, the results support potential feasibility of an automated segmentation algorithm to provide accurate organ dose estimates.

  15. Identification of suitable fundus images using automated quality assessment methods.

    PubMed

    Şevik, Uğur; Köse, Cemal; Berber, Tolga; Erdöl, Hidayet

    2014-04-01

    Retinal image quality assessment (IQA) is a crucial process for automated retinal image analysis systems to obtain an accurate and successful diagnosis of retinal diseases. Consequently, the first step in a good retinal image analysis system is measuring the quality of the input image. We present an approach for finding medically suitable retinal images for retinal diagnosis. We used a three-class grading system that consists of good, bad, and outlier classes. We created a retinal image quality dataset with a total of 216 consecutive images called the Diabetic Retinopathy Image Database. We identified the suitable images within the good images for automatic retinal image analysis systems using a novel method. Subsequently, we evaluated our retinal image suitability approach using the Digital Retinal Images for Vessel Extraction and Standard Diabetic Retinopathy Database Calibration level 1 public datasets. The results were measured through the F1 metric, which is a harmonic mean of precision and recall metrics. The highest F1 scores of the IQA tests were 99.60%, 96.50%, and 85.00% for good, bad, and outlier classes, respectively. Additionally, the accuracy of our suitable image detection approach was 98.08%. Our approach can be integrated into any automatic retinal analysis system with sufficient performance scores.

  16. Automated analysis of time-lapse fluorescence microscopy images: from live cell images to intracellular foci.

    PubMed

    Dzyubachyk, Oleh; Essers, Jeroen; van Cappellen, Wiggert A; Baldeyron, Céline; Inagaki, Akiko; Niessen, Wiro J; Meijering, Erik

    2010-10-01

    Complete, accurate and reproducible analysis of intracellular foci from fluorescence microscopy image sequences of live cells requires full automation of all processing steps involved: cell segmentation and tracking followed by foci segmentation and pattern analysis. Integrated systems for this purpose are lacking. Extending our previous work in cell segmentation and tracking, we developed a new system for performing fully automated analysis of fluorescent foci in single cells. The system was validated by applying it to two common tasks: intracellular foci counting (in DNA damage repair experiments) and cell-phase identification based on foci pattern analysis (in DNA replication experiments). Experimental results show that the system performs comparably to expert human observers. Thus, it may replace tedious manual analyses for the considered tasks, and enables high-content screening. The described system was implemented in MATLAB (The MathWorks, Inc., USA) and compiled to run within the MATLAB environment. The routines together with four sample datasets are available at http://celmia.bigr.nl/. The software is planned for public release, free of charge for non-commercial use, after publication of this article.

  17. Integrative analysis of multi-omics data for identifying multi-markers for diagnosing pancreatic cancer

    PubMed Central

    2015-01-01

    Background microRNA (miRNA) expression plays an influential role in cancer classification and malignancy, and miRNAs are feasible as alternative diagnostic markers for pancreatic cancer, a highly aggressive neoplasm with silent early symptoms, high metastatic potential, and resistance to conventional therapies. Methods In this study, we evaluated the benefits of multi-omics data analysis by integrating miRNA and mRNA expression data in pancreatic cancer. Using support vector machine (SVM) modelling and leave-one-out cross validation (LOOCV), we evaluated the diagnostic performance of single- or multi-markers based on miRNA and mRNA expression profiles from 104 PDAC tissues and 17 benign pancreatic tissues. For selecting even more reliable and robust markers, we performed validation by independent datasets from the Gene Expression Omnibus (GEO) and the Cancer Genome Atlas (TCGA) data depositories. For validation, miRNA activity was estimated by miRNA-target gene interaction and mRNA expression datasets in pancreatic cancer. Results Using a comprehensive identification approach, we successfully identified 705 multi-markers having powerful diagnostic performance for PDAC. In addition, these marker candidates annotated with cancer pathways using gene ontology analysis. Conclusions Our prediction models have strong potential for the diagnosis of pancreatic cancer. PMID:26328610

  18. Heterogeneous access and processing of EO-Data on a Cloud based Infrastructure delivering operational Products

    NASA Astrophysics Data System (ADS)

    Niggemann, F.; Appel, F.; Bach, H.; de la Mar, J.; Schirpke, B.; Dutting, K.; Rucker, G.; Leimbach, D.

    2015-04-01

    To address the challenges of effective data handling faced by Small and Medium Sized Enterprises (SMEs) a cloud-based infrastructure for accessing and processing of Earth Observation(EO)-data has been developed within the project APPS4GMES(www.apps4gmes.de). To gain homogenous multi mission data access an Input Data Portal (IDP) been implemented on this infrastructure. The IDP consists of an Open Geospatial Consortium (OGC) conformant catalogue, a consolidation module for format conversion and an OGC-conformant ordering framework. Metadata of various EO-sources and with different standards is harvested and transferred to an OGC conformant Earth Observation Product standard and inserted into the catalogue by a Metadata Harvester. The IDP can be accessed for search and ordering of the harvested datasets by the services implemented on the cloud infrastructure. Different land-surface services have been realised by the project partners, using the implemented IDP and cloud infrastructure. Results of these are customer ready products, as well as pre-products (e.g. atmospheric corrected EO data), serving as a basis for other services. Within the IDP an automated access to ESA's Sentinel-1 Scientific Data Hub has been implemented. Searching and downloading of the SAR data can be performed in an automated way. With the implementation of the Sentinel-1 Toolbox and own software, for processing of the datasets for further use, for example for Vista's snow monitoring, delivering input for the flood forecast services, can also be performed in an automated way. For performance tests of the cloud environment a sophisticated model based atmospheric correction and pre-classification service has been implemented. Tests conducted an automated synchronised processing of one entire Landsat 8 (LS-8) coverage for Germany and performance comparisons to standard desktop systems. Results of these tests, showing a performance improvement by the factor of six, proved the high flexibility and computing power of the cloud environment. To make full use of the cloud capabilities a possibility for automated upscaling of the hardware resources has been implemented. Together with the IDP infrastructure fast and automated processing of various satellite sources to deliver market ready products can be realised, thus increasing customer needs and numbers can be satisfied without loss of accuracy and quality.

  19. The automated data processing architecture for the GPI Exoplanet Survey

    NASA Astrophysics Data System (ADS)

    Wang, Jason J.; Perrin, Marshall D.; Savransky, Dmitry; Arriaga, Pauline; Chilcote, Jeffrey K.; De Rosa, Robert J.; Millar-Blanchaer, Maxwell A.; Marois, Christian; Rameau, Julien; Wolff, Schuyler G.; Shapiro, Jacob; Ruffio, Jean-Baptiste; Graham, James R.; Macintosh, Bruce

    2017-09-01

    The Gemini Planet Imager Exoplanet Survey (GPIES) is a multi-year direct imaging survey of 600 stars to discover and characterize young Jovian exoplanets and their environments. We have developed an automated data architecture to process and index all data related to the survey uniformly. An automated and flexible data processing framework, which we term the GPIES Data Cruncher, combines multiple data reduction pipelines together to intelligently process all spectroscopic, polarimetric, and calibration data taken with GPIES. With no human intervention, fully reduced and calibrated data products are available less than an hour after the data are taken to expedite follow-up on potential objects of interest. The Data Cruncher can run on a supercomputer to reprocess all GPIES data in a single day as improvements are made to our data reduction pipelines. A backend MySQL database indexes all files, which are synced to the cloud, and a front-end web server allows for easy browsing of all files associated with GPIES. To help observers, quicklook displays show reduced data as they are processed in real-time, and chatbots on Slack post observing information as well as reduced data products. Together, the GPIES automated data processing architecture reduces our workload, provides real-time data reduction, optimizes our observing strategy, and maintains a homogeneously reduced dataset to study planet occurrence and instrument performance.

  20. Parallel Index and Query for Large Scale Data Analysis

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Chou, Jerry; Wu, Kesheng; Ruebel, Oliver

    2011-07-18

    Modern scientific datasets present numerous data management and analysis challenges. State-of-the-art index and query technologies are critical for facilitating interactive exploration of large datasets, but numerous challenges remain in terms of designing a system for process- ing general scientific datasets. The system needs to be able to run on distributed multi-core platforms, efficiently utilize underlying I/O infrastructure, and scale to massive datasets. We present FastQuery, a novel software framework that address these challenges. FastQuery utilizes a state-of-the-art index and query technology (FastBit) and is designed to process mas- sive datasets on modern supercomputing platforms. We apply FastQuery to processing ofmore » a massive 50TB dataset generated by a large scale accelerator modeling code. We demonstrate the scalability of the tool to 11,520 cores. Motivated by the scientific need to search for inter- esting particles in this dataset, we use our framework to reduce search time from hours to tens of seconds.« less

  1. An Automated Algorithm for Producing Land Cover Information from Landsat Surface Reflectance Data Acquired Between 1984 and Present

    NASA Astrophysics Data System (ADS)

    Rover, J.; Goldhaber, M. B.; Holen, C.; Dittmeier, R.; Wika, S.; Steinwand, D.; Dahal, D.; Tolk, B.; Quenzer, R.; Nelson, K.; Wylie, B. K.; Coan, M.

    2015-12-01

    Multi-year land cover mapping from remotely sensed data poses challenges. Producing land cover products at spatial and temporal scales required for assessing longer-term trends in land cover change are typically a resource-limited process. A recently developed approach utilizes open source software libraries to automatically generate datasets, decision tree classifications, and data products while requiring minimal user interaction. Users are only required to supply coordinates for an area of interest, land cover from an existing source such as National Land Cover Database and percent slope from a digital terrain model for the same area of interest, two target acquisition year-day windows, and the years of interest between 1984 and present. The algorithm queries the Landsat archive for Landsat data intersecting the area and dates of interest. Cloud-free pixels meeting the user's criteria are mosaicked to create composite images for training the classifiers and applying the classifiers. Stratification of training data is determined by the user and redefined during an iterative process of reviewing classifiers and resulting predictions. The algorithm outputs include yearly land cover raster format data, graphics, and supporting databases for further analysis. Additional analytical tools are also incorporated into the automated land cover system and enable statistical analysis after data are generated. Applications tested include the impact of land cover change and water permanence. For example, land cover conversions in areas where shrubland and grassland were replaced by shale oil pads during hydrofracking of the Bakken Formation were quantified. Analytical analysis of spatial and temporal changes in surface water included identifying wetlands in the Prairie Pothole Region of North Dakota with potential connectivity to ground water, indicating subsurface permeability and geochemistry.

  2. Automated high-throughput flow-through real-time diagnostic system

    DOEpatents

    Regan, John Frederick

    2012-10-30

    An automated real-time flow-through system capable of processing multiple samples in an asynchronous, simultaneous, and parallel fashion for nucleic acid extraction and purification, followed by assay assembly, genetic amplification, multiplex detection, analysis, and decontamination. The system is able to hold and access an unlimited number of fluorescent reagents that may be used to screen samples for the presence of specific sequences. The apparatus works by associating extracted and purified sample with a series of reagent plugs that have been formed in a flow channel and delivered to a flow-through real-time amplification detector that has a multiplicity of optical windows, to which the sample-reagent plugs are placed in an operative position. The diagnostic apparatus includes sample multi-position valves, a master sample multi-position valve, a master reagent multi-position valve, reagent multi-position valves, and an optical amplification/detection system.

  3. Automated analysis and classification of melanocytic tumor on skin whole slide images.

    PubMed

    Xu, Hongming; Lu, Cheng; Berendt, Richard; Jha, Naresh; Mandal, Mrinal

    2018-06-01

    This paper presents a computer-aided technique for automated analysis and classification of melanocytic tumor on skin whole slide biopsy images. The proposed technique consists of four main modules. First, skin epidermis and dermis regions are segmented by a multi-resolution framework. Next, epidermis analysis is performed, where a set of epidermis features reflecting nuclear morphologies and spatial distributions is computed. In parallel with epidermis analysis, dermis analysis is also performed, where dermal cell nuclei are segmented and a set of textural and cytological features are computed. Finally, the skin melanocytic image is classified into different categories such as melanoma, nevus or normal tissue by using a multi-class support vector machine (mSVM) with extracted epidermis and dermis features. Experimental results on 66 skin whole slide images indicate that the proposed technique achieves more than 95% classification accuracy, which suggests that the technique has the potential to be used for assisting pathologists on skin biopsy image analysis and classification. Copyright © 2018 Elsevier Ltd. All rights reserved.

  4. Automated Liver Elasticity Calculation for 3D MRE

    PubMed Central

    Dzyubak, Bogdan; Glaser, Kevin J.; Manduca, Armando; Ehman, Richard L.

    2017-01-01

    Magnetic Resonance Elastography (MRE) is a phase-contrast MRI technique which calculates quantitative stiffness images, called elastograms, by imaging the propagation of acoustic waves in tissues. It is used clinically to diagnose liver fibrosis. Automated analysis of MRE is difficult as the corresponding MRI magnitude images (which contain anatomical information) are affected by intensity inhomogeneity, motion artifact, and poor tissue- and edge-contrast. Additionally, areas with low wave amplitude must be excluded. An automated algorithm has already been successfully developed and validated for clinical 2D MRE. 3D MRE acquires substantially more data and, due to accelerated acquisition, has exacerbated image artifacts. Also, the current 3D MRE processing does not yield a confidence map to indicate MRE wave quality and guide ROI selection, as is the case in 2D. In this study, extension of the 2D automated method, with a simple wave-amplitude metric, was developed and validated against an expert reader in a set of 57 patient exams with both 2D and 3D MRE. The stiffness discrepancy with the expert for 3D MRE was −0.8% ± 9.45% and was better than discrepancy with the same reader for 2D MRE (−3.2% ± 10.43%), and better than the inter-reader discrepancy observed in previous studies. There were no automated processing failures in this dataset. Thus, the automated liver elasticity calculation (ALEC) algorithm is able to calculate stiffness from 3D MRE data with minimal bias and good precision, while enabling stiffness measurements to be fully reproducible and to be easily performed on the large 3D MRE datasets. PMID:29033488

  5. Design and Execution of make-like, distributed Analyses based on Spotify’s Pipelining Package Luigi

    NASA Astrophysics Data System (ADS)

    Erdmann, M.; Fischer, B.; Fischer, R.; Rieger, M.

    2017-10-01

    In high-energy particle physics, workflow management systems are primarily used as tailored solutions in dedicated areas such as Monte Carlo production. However, physicists performing data analyses are usually required to steer their individual workflows manually which is time-consuming and often leads to undocumented relations between particular workloads. We present a generic analysis design pattern that copes with the sophisticated demands of end-to-end HEP analyses and provides a make-like execution system. It is based on the open-source pipelining package Luigi which was developed at Spotify and enables the definition of arbitrary workloads, so-called Tasks, and the dependencies between them in a lightweight and scalable structure. Further features are multi-user support, automated dependency resolution and error handling, central scheduling, and status visualization in the web. In addition to already built-in features for remote jobs and file systems like Hadoop and HDFS, we added support for WLCG infrastructure such as LSF and CREAM job submission, as well as remote file access through the Grid File Access Library. Furthermore, we implemented automated resubmission functionality, software sandboxing, and a command line interface with auto-completion for a convenient working environment. For the implementation of a t \\overline{{{t}}} H cross section measurement, we created a generic Python interface that provides programmatic access to all external information such as datasets, physics processes, statistical models, and additional files and values. In summary, the setup enables the execution of the entire analysis in a parallelized and distributed fashion with a single command.

  6. Effective Materials Property Information Management for the 21st Century

    NASA Technical Reports Server (NTRS)

    Ren, Weiju; Cebon, David; Arnold, Steve

    2009-01-01

    This paper discusses key principles for the development of materials property information management software systems. There are growing needs for automated materials information management in various organizations. In part these are fueled by the demands for higher efficiency in material testing, product design and engineering analysis. But equally important, organizations are being driven by the need for consistency, quality and traceability of data, as well as control of access to sensitive information such as proprietary data. Further, the use of increasingly sophisticated nonlinear, anisotropic and multi-scale engineering analyses requires both processing of large volumes of test data for development of constitutive models and complex materials data input for Computer-Aided Engineering (CAE) software. And finally, the globalization of economy often generates great needs for sharing a single "gold source" of materials information between members of global engineering teams in extended supply chains. Fortunately, material property management systems have kept pace with the growing user demands and evolved to versatile data management systems that can be customized to specific user needs. The more sophisticated of these provide facilities for: (i) data management functions such as access, version, and quality controls; (ii) a wide range of data import, export and analysis capabilities; (iii) data "pedigree" traceability mechanisms; (iv) data searching, reporting and viewing tools; and (v) access to the information via a wide range of interfaces. In this paper the important requirements for advanced material data management systems, future challenges and opportunities such as automated error checking, data quality characterization, identification of gaps in datasets, as well as functionalities and business models to fuel database growth and maintenance are discussed.

  7. SciSpark's SRDD : A Scientific Resilient Distributed Dataset for Multidimensional Data

    NASA Astrophysics Data System (ADS)

    Palamuttam, R. S.; Wilson, B. D.; Mogrovejo, R. M.; Whitehall, K. D.; Mattmann, C. A.; McGibbney, L. J.; Ramirez, P.

    2015-12-01

    Remote sensing data and climate model output are multi-dimensional arrays of massive sizes locked away in heterogeneous file formats (HDF5/4, NetCDF 3/4) and metadata models (HDF-EOS, CF) making it difficult to perform multi-stage, iterative science processing since each stage requires writing and reading data to and from disk. We have developed SciSpark, a robust Big Data framework, that extends ApacheTM Spark for scaling scientific computations. Apache Spark improves the map-reduce implementation in ApacheTM Hadoop for parallel computing on a cluster, by emphasizing in-memory computation, "spilling" to disk only as needed, and relying on lazy evaluation. Central to Spark is the Resilient Distributed Dataset (RDD), an in-memory distributed data structure that extends the functional paradigm provided by the Scala programming language. However, RDDs are ideal for tabular or unstructured data, and not for highly dimensional data. The SciSpark project introduces the Scientific Resilient Distributed Dataset (sRDD), a distributed-computing array structure which supports iterative scientific algorithms for multidimensional data. SciSpark processes data stored in NetCDF and HDF files by partitioning them across time or space and distributing the partitions among a cluster of compute nodes. We show usability and extensibility of SciSpark by implementing distributed algorithms for geospatial operations on large collections of multi-dimensional grids. In particular we address the problem of scaling an automated method for finding Mesoscale Convective Complexes. SciSpark provides a tensor interface to support the pluggability of different matrix libraries. We evaluate performance of the various matrix libraries in distributed pipelines, such as Nd4jTM and BreezeTM. We detail the architecture and design of SciSpark, our efforts to integrate climate science algorithms, parallel ingest and partitioning (sharding) of A-Train satellite observations from model grids. These solutions are encompassed in SciSpark, an open-source software framework for distributed computing on scientific data.

  8. Automating expert role to determine design concept in Kansei Engineering

    NASA Astrophysics Data System (ADS)

    Lokman, Anitawati Mohd; Haron, Mohammad Bakri Che; Abidin, Siti Zaleha Zainal; Khalid, Noor Elaiza Abd

    2016-02-01

    Affect has become imperative in product quality. In affective design field, Kansei Engineering (KE) has been recognized as a technology that enables discovery of consumer's emotion and formulation of guide to design products that win consumers in the competitive market. Albeit powerful technology, there is no rule of thumb in its analysis and interpretation process. KE expertise is required to determine sets of related Kansei and the significant concept of emotion. Many research endeavors become handicapped with the limited number of available and accessible KE experts. This work is performed to simulate the role of experts with the use of Natphoric algorithm thus providing sound solution to the complexity and flexibility in KE. The algorithm is designed to learn the process by implementing training datasets taken from previous KE research works. A framework for automated KE is then designed to realize the development of automated KE system. A comparative analysis is performed to determine feasibility of the developed prototype to automate the process. The result shows that the significant Kansei is determined by manual KE implementation and the automated process is highly similar. KE research advocates will benefit this system to automatically determine significant design concepts.

  9. Segmentation of 3D ultrasound computer tomography reflection images using edge detection and surface fitting

    NASA Astrophysics Data System (ADS)

    Hopp, T.; Zapf, M.; Ruiter, N. V.

    2014-03-01

    An essential processing step for comparison of Ultrasound Computer Tomography images to other modalities, as well as for the use in further image processing, is to segment the breast from the background. In this work we present a (semi-) automated 3D segmentation method which is based on the detection of the breast boundary in coronal slice images and a subsequent surface fitting. The method was evaluated using a software phantom and in-vivo data. The fully automatically processed phantom results showed that a segmentation of approx. 10% of the slices of a dataset is sufficient to recover the overall breast shape. Application to 16 in-vivo datasets was performed successfully using semi-automated processing, i.e. using a graphical user interface for manual corrections of the automated breast boundary detection. The processing time for the segmentation of an in-vivo dataset could be significantly reduced by a factor of four compared to a fully manual segmentation. Comparison to manually segmented images identified a smoother surface for the semi-automated segmentation with an average of 11% of differing voxels and an average surface deviation of 2mm. Limitations of the edge detection may be overcome by future updates of the KIT USCT system, allowing a fully-automated usage of our segmentation approach.

  10. Quantitative monitoring of Arabidopsis thaliana growth and development using high-throughput plant phenotyping

    PubMed Central

    Arend, Daniel; Lange, Matthias; Pape, Jean-Michel; Weigelt-Fischer, Kathleen; Arana-Ceballos, Fernando; Mücke, Ingo; Klukas, Christian; Altmann, Thomas; Scholz, Uwe; Junker, Astrid

    2016-01-01

    With the implementation of novel automated, high throughput methods and facilities in the last years, plant phenomics has developed into a highly interdisciplinary research domain integrating biology, engineering and bioinformatics. Here we present a dataset of a non-invasive high throughput plant phenotyping experiment, which uses image- and image analysis- based approaches to monitor the growth and development of 484 Arabidopsis thaliana plants (thale cress). The result is a comprehensive dataset of images and extracted phenotypical features. Such datasets require detailed documentation, standardized description of experimental metadata as well as sustainable data storage and publication in order to ensure the reproducibility of experiments, data reuse and comparability among the scientific community. Therefore the here presented dataset has been annotated using the standardized ISA-Tab format and considering the recently published recommendations for the semantical description of plant phenotyping experiments. PMID:27529152

  11. Quantitative monitoring of Arabidopsis thaliana growth and development using high-throughput plant phenotyping.

    PubMed

    Arend, Daniel; Lange, Matthias; Pape, Jean-Michel; Weigelt-Fischer, Kathleen; Arana-Ceballos, Fernando; Mücke, Ingo; Klukas, Christian; Altmann, Thomas; Scholz, Uwe; Junker, Astrid

    2016-08-16

    With the implementation of novel automated, high throughput methods and facilities in the last years, plant phenomics has developed into a highly interdisciplinary research domain integrating biology, engineering and bioinformatics. Here we present a dataset of a non-invasive high throughput plant phenotyping experiment, which uses image- and image analysis- based approaches to monitor the growth and development of 484 Arabidopsis thaliana plants (thale cress). The result is a comprehensive dataset of images and extracted phenotypical features. Such datasets require detailed documentation, standardized description of experimental metadata as well as sustainable data storage and publication in order to ensure the reproducibility of experiments, data reuse and comparability among the scientific community. Therefore the here presented dataset has been annotated using the standardized ISA-Tab format and considering the recently published recommendations for the semantical description of plant phenotyping experiments.

  12. Polinsar Experiments of Multi-Mode X-Band Data Over South Area of China

    NASA Astrophysics Data System (ADS)

    Lu, L.; Yan, Q.; Duan, M.; Zhang, Y.

    2012-08-01

    This paper makes the polarimetric and polarimetric interferometric synthetic aperture radar (PolInSAR) experiments with the high-resolution X-band data acquired by Multi-mode airborne SAR system over an area around Linshui, south of China containing tropic vegetation and urban areas. Polarimetric analysis for typical tropic vegetations and man-made objects are presented, some polarimetric descriptors sensitive to vegetations and man-made objects are selected. Then, the PolInSAR information contained in the data is investigated, considering characteristics of the Multi-mode-XSAR dataset, a dual-baseline polarimetric interferometry method is proposed in this paper. The method both guarantees the high coherence on fully polarimetric data and combines the benefits of short and long baseline that helpful to the phase unwrapping and height sensitivity promotion. PolInSAR experiment results displayed demonstrates Multi-mode-XSAR datasets have intuitive capabilities for amount of application of land classification, objects detection and DSM mapping.

  13. ProDaMa: an open source Python library to generate protein structure datasets.

    PubMed

    Armano, Giuliano; Manconi, Andrea

    2009-10-02

    The huge difference between the number of known sequences and known tertiary structures has justified the use of automated methods for protein analysis. Although a general methodology to solve these problems has not been yet devised, researchers are engaged in developing more accurate techniques and algorithms whose training plays a relevant role in determining their performance. From this perspective, particular importance is given to the training data used in experiments, and researchers are often engaged in the generation of specialized datasets that meet their requirements. To facilitate the task of generating specialized datasets we devised and implemented ProDaMa, an open source Python library than provides classes for retrieving, organizing, updating, analyzing, and filtering protein data. ProDaMa has been used to generate specialized datasets useful for secondary structure prediction and to develop a collaborative web application aimed at generating and sharing protein structure datasets. The library, the related database, and the documentation are freely available at the URL http://iasc.diee.unica.it/prodama.

  14. Automated quantitative muscle biopsy analysis system

    NASA Technical Reports Server (NTRS)

    Castleman, Kenneth R. (Inventor)

    1980-01-01

    An automated system to aid the diagnosis of neuromuscular diseases by producing fiber size histograms utilizing histochemically stained muscle biopsy tissue. Televised images of the microscopic fibers are processed electronically by a multi-microprocessor computer, which isolates, measures, and classifies the fibers and displays the fiber size distribution. The architecture of the multi-microprocessor computer, which is iterated to any required degree of complexity, features a series of individual microprocessors P.sub.n each receiving data from a shared memory M.sub.n-1 and outputing processed data to a separate shared memory M.sub.n+1 under control of a program stored in dedicated memory M.sub.n.

  15. Large-scale, high-performance and cloud-enabled multi-model analytics experiments in the context of the Earth System Grid Federation

    NASA Astrophysics Data System (ADS)

    Fiore, S.; Płóciennik, M.; Doutriaux, C.; Blanquer, I.; Barbera, R.; Williams, D. N.; Anantharaj, V. G.; Evans, B. J. K.; Salomoni, D.; Aloisio, G.

    2017-12-01

    The increased models resolution in the development of comprehensive Earth System Models is rapidly leading to very large climate simulations output that pose significant scientific data management challenges in terms of data sharing, processing, analysis, visualization, preservation, curation, and archiving.Large scale global experiments for Climate Model Intercomparison Projects (CMIP) have led to the development of the Earth System Grid Federation (ESGF), a federated data infrastructure which has been serving the CMIP5 experiment, providing access to 2PB of data for the IPCC Assessment Reports. In such a context, running a multi-model data analysis experiment is very challenging, as it requires the availability of a large amount of data related to multiple climate models simulations and scientific data management tools for large-scale data analytics. To address these challenges, a case study on climate models intercomparison data analysis has been defined and implemented in the context of the EU H2020 INDIGO-DataCloud project. The case study has been tested and validated on CMIP5 datasets, in the context of a large scale, international testbed involving several ESGF sites (LLNL, ORNL and CMCC), one orchestrator site (PSNC) and one more hosting INDIGO PaaS services (UPV). Additional ESGF sites, such as NCI (Australia) and a couple more in Europe, are also joining the testbed. The added value of the proposed solution is summarized in the following: it implements a server-side paradigm which limits data movement; it relies on a High-Performance Data Analytics (HPDA) stack to address performance; it exploits the INDIGO PaaS layer to support flexible, dynamic and automated deployment of software components; it provides user-friendly web access based on the INDIGO Future Gateway; and finally it integrates, complements and extends the support currently available through ESGF. Overall it provides a new "tool" for climate scientists to run multi-model experiments. At the time this contribution is being written, the proposed testbed represents the first implementation of a distributed large-scale, multi-model experiment in the ESGF/CMIP context, joining together server-side approaches for scientific data analysis, HPDA frameworks, end-to-end workflow management, and cloud computing.

  16. The Function Biomedical Informatics Research Network Data Repository

    PubMed Central

    Keator, David B.; van Erp, Theo G.M.; Turner, Jessica A.; Glover, Gary H.; Mueller, Bryon A.; Liu, Thomas T.; Voyvodic, James T.; Rasmussen, Jerod; Calhoun, Vince D.; Lee, Hyo Jong; Toga, Arthur W.; McEwen, Sarah; Ford, Judith M.; Mathalon, Daniel H.; Diaz, Michele; O’Leary, Daniel S.; Bockholt, H. Jeremy; Gadde, Syam; Preda, Adrian; Wible, Cynthia G.; Stern, Hal S.; Belger, Aysenil; McCarthy, Gregory; Ozyurt, Burak; Potkin, Steven G.

    2015-01-01

    The Function Biomedical Informatics Research Network (FBIRN) developed methods and tools for conducting multi-scanner functional magnetic resonance imaging (fMRI) studies. Method and tool development were based on two major goals: 1) to assess the major sources of variation in fMRI studies conducted across scanners, including instrumentation, acquisition protocols, challenge tasks, and analysis methods, and 2) to provide a distributed network infrastructure and an associated federated database to host and query large, multi-site, fMRI and clinical datasets. In the process of achieving these goals the FBIRN test bed generated several multi-scanner brain imaging data sets to be shared with the wider scientific community via the BIRN Data Repository (BDR). The FBIRN Phase 1 dataset consists of a traveling subject study of 5 healthy subjects, each scanned on 10 different 1.5 to 4 Tesla scanners. The FBIRN Phase 2 and Phase 3 datasets consist of subjects with schizophrenia or schizoaffective disorder along with healthy comparison subjects scanned at multiple sites. In this paper, we provide concise descriptions of FBIRN’s multi-scanner brain imaging data sets and details about the BIRN Data Repository instance of the Human Imaging Database (HID) used to publicly share the data. PMID:26364863

  17. Identifying spatially similar gene expression patterns in early stage fruit fly embryo images: binary feature versus invariant moment digital representations

    PubMed Central

    Gurunathan, Rajalakshmi; Van Emden, Bernard; Panchanathan, Sethuraman; Kumar, Sudhir

    2004-01-01

    Background Modern developmental biology relies heavily on the analysis of embryonic gene expression patterns. Investigators manually inspect hundreds or thousands of expression patterns to identify those that are spatially similar and to ultimately infer potential gene interactions. However, the rapid accumulation of gene expression pattern data over the last two decades, facilitated by high-throughput techniques, has produced a need for the development of efficient approaches for direct comparison of images, rather than their textual descriptions, to identify spatially similar expression patterns. Results The effectiveness of the Binary Feature Vector (BFV) and Invariant Moment Vector (IMV) based digital representations of the gene expression patterns in finding biologically meaningful patterns was compared for a small (226 images) and a large (1819 images) dataset. For each dataset, an ordered list of images, with respect to a query image, was generated to identify overlapping and similar gene expression patterns, in a manner comparable to what a developmental biologist might do. The results showed that the BFV representation consistently outperforms the IMV representation in finding biologically meaningful matches when spatial overlap of the gene expression pattern and the genes involved are considered. Furthermore, we explored the value of conducting image-content based searches in a dataset where individual expression components (or domains) of multi-domain expression patterns were also included separately. We found that this technique improves performance of both IMV and BFV based searches. Conclusions We conclude that the BFV representation consistently produces a more extensive and better list of biologically useful patterns than the IMV representation. The high quality of results obtained scales well as the search database becomes larger, which encourages efforts to build automated image query and retrieval systems for spatial gene expression patterns. PMID:15603586

  18. MIMoSA: An Automated Method for Intermodal Segmentation Analysis of Multiple Sclerosis Brain Lesions.

    PubMed

    Valcarcel, Alessandra M; Linn, Kristin A; Vandekar, Simon N; Satterthwaite, Theodore D; Muschelli, John; Calabresi, Peter A; Pham, Dzung L; Martin, Melissa Lynne; Shinohara, Russell T

    2018-03-08

    Magnetic resonance imaging (MRI) is crucial for in vivo detection and characterization of white matter lesions (WMLs) in multiple sclerosis. While WMLs have been studied for over two decades using MRI, automated segmentation remains challenging. Although the majority of statistical techniques for the automated segmentation of WMLs are based on single imaging modalities, recent advances have used multimodal techniques for identifying WMLs. Complementary modalities emphasize different tissue properties, which help identify interrelated features of lesions. Method for Inter-Modal Segmentation Analysis (MIMoSA), a fully automatic lesion segmentation algorithm that utilizes novel covariance features from intermodal coupling regression in addition to mean structure to model the probability lesion is contained in each voxel, is proposed. MIMoSA was validated by comparison with both expert manual and other automated segmentation methods in two datasets. The first included 98 subjects imaged at Johns Hopkins Hospital in which bootstrap cross-validation was used to compare the performance of MIMoSA against OASIS and LesionTOADS, two popular automatic segmentation approaches. For a secondary validation, a publicly available data from a segmentation challenge were used for performance benchmarking. In the Johns Hopkins study, MIMoSA yielded average Sørensen-Dice coefficient (DSC) of .57 and partial AUC of .68 calculated with false positive rates up to 1%. This was superior to performance using OASIS and LesionTOADS. The proposed method also performed competitively in the segmentation challenge dataset. MIMoSA resulted in statistically significant improvements in lesion segmentation performance compared with LesionTOADS and OASIS, and performed competitively in an additional validation study. Copyright © 2018 by the American Society of Neuroimaging.

  19. Automated segmentation of chronic stroke lesions using LINDA: Lesion Identification with Neighborhood Data Analysis

    PubMed Central

    Pustina, Dorian; Coslett, H. Branch; Turkeltaub, Peter E.; Tustison, Nicholas; Schwartz, Myrna F.; Avants, Brian

    2015-01-01

    The gold standard for identifying stroke lesions is manual tracing, a method that is known to be observer dependent and time consuming, thus impractical for big data studies. We propose LINDA (Lesion Identification with Neighborhood Data Analysis), an automated segmentation algorithm capable of learning the relationship between existing manual segmentations and a single T1-weighted MRI. A dataset of 60 left hemispheric chronic stroke patients is used to build the method and test it with k-fold and leave-one-out procedures. With respect to manual tracings, predicted lesion maps showed a mean dice overlap of 0.696±0.16, Hausdorff distance of 17.9±9.8mm, and average displacement of 2.54±1.38mm. The manual and predicted lesion volumes correlated at r=0.961. An additional dataset of 45 patients was utilized to test LINDA with independent data, achieving high accuracy rates and confirming its cross-institutional applicability. To investigate the cost of moving from manual tracings to automated segmentation, we performed comparative lesion-to-symptom mapping (LSM) on five behavioral scores. Predicted and manual lesions produced similar neuro-cognitive maps, albeit with some discussed discrepancies. Of note, region-wise LSM was more robust to the prediction error than voxel-wise LSM. Our results show that, while several limitations exist, our current results compete with or exceed the state-of-the-art, producing consistent predictions, very low failure rates, and transferable knowledge between labs. This work also establishes a new viewpoint on evaluating automated methods not only with segmentation accuracy but also with brain-behavior relationships. LINDA is made available online with trained models from over 100 patients. PMID:26756101

  20. An Innovative Infrastructure with a Universal Geo-Spatiotemporal Data Representation Supporting Cost-Effective Integration of Diverse Earth Science Data

    NASA Technical Reports Server (NTRS)

    Rilee, Michael Lee; Kuo, Kwo-Sen

    2017-01-01

    The SpatioTemporal Adaptive Resolution Encoding (STARE) is a unifying scheme encoding geospatial and temporal information for organizing data on scalable computing/storage resources, minimizing expensive data transfers. STARE provides a compact representation that turns set-logic functions into integer operations, e.g. conditional sub-setting, taking into account representative spatiotemporal resolutions of the data in the datasets. STARE geo-spatiotemporally aligns data placements of diverse data on massive parallel resources to maximize performance. Automating important scientific functions (e.g. regridding) and computational functions (e.g. data placement) allows scientists to focus on domain-specific questions instead of expending their efforts and expertise on data processing. With STARE-enabled automation, SciDB (Scientific Database) plus STARE provides a database interface, reducing costly data preparation, increasing the volume and variety of interoperable data, and easing result sharing. Using SciDB plus STARE as part of an integrated analysis infrastructure dramatically eases combining diametrically different datasets.

  1. Data Prospecting Framework - a new approach to explore "big data" in Earth Science

    NASA Astrophysics Data System (ADS)

    Ramachandran, R.; Rushing, J.; Lin, A.; Kuo, K.

    2012-12-01

    Due to advances in sensors, computation and storage, cost and effort required to produce large datasets have been significantly reduced. As a result, we are seeing a proliferation of large-scale data sets being assembled in almost every science field, especially in geosciences. Opportunities to exploit the "big data" are enormous as new hypotheses can be generated by combining and analyzing large amounts of data. However, such a data-driven approach to science discovery assumes that scientists can find and isolate relevant subsets from vast amounts of available data. Current Earth Science data systems only provide data discovery through simple metadata and keyword-based searches and are not designed to support data exploration capabilities based on the actual content. Consequently, scientists often find themselves downloading large volumes of data, struggling with large amounts of storage and learning new analysis technologies that will help them separate the wheat from the chaff. New mechanisms of data exploration are needed to help scientists discover the relevant subsets We present data prospecting, a new content-based data analysis paradigm to support data-intensive science. Data prospecting allows the researchers to explore big data in determining and isolating data subsets for further analysis. This is akin to geo-prospecting in which mineral sites of interest are determined over the landscape through screening methods. The resulting "data prospects" only provide an interaction with and feel for the data through first-look analytics; the researchers would still have to download the relevant datasets and analyze them deeply using their favorite analytical tools to determine if the datasets will yield new hypotheses. Data prospecting combines two traditional categories of data analysis, data exploration and data mining within the discovery step. Data exploration utilizes manual/interactive methods for data analysis such as standard statistical analysis and visualization, usually on small datasets. On the other hand, data mining utilizes automated algorithms to extract useful information. Humans guide these automated algorithms and specify algorithm parameters (training samples, clustering size, etc.). Data Prospecting combines these two approaches using high performance computing and the new techniques for efficient distributed file access.

  2. Fuzzy multinomial logistic regression analysis: A multi-objective programming approach

    NASA Astrophysics Data System (ADS)

    Abdalla, Hesham A.; El-Sayed, Amany A.; Hamed, Ramadan

    2017-05-01

    Parameter estimation for multinomial logistic regression is usually based on maximizing the likelihood function. For large well-balanced datasets, Maximum Likelihood (ML) estimation is a satisfactory approach. Unfortunately, ML can fail completely or at least produce poor results in terms of estimated probabilities and confidence intervals of parameters, specially for small datasets. In this study, a new approach based on fuzzy concepts is proposed to estimate parameters of the multinomial logistic regression. The study assumes that the parameters of multinomial logistic regression are fuzzy. Based on the extension principle stated by Zadeh and Bárdossy's proposition, a multi-objective programming approach is suggested to estimate these fuzzy parameters. A simulation study is used to evaluate the performance of the new approach versus Maximum likelihood (ML) approach. Results show that the new proposed model outperforms ML in cases of small datasets.

  3. BisQue: cloud-based system for management, annotation, visualization, analysis and data mining of underwater and remote sensing imagery

    NASA Astrophysics Data System (ADS)

    Fedorov, D.; Miller, R. J.; Kvilekval, K. G.; Doheny, B.; Sampson, S.; Manjunath, B. S.

    2016-02-01

    Logistical and financial limitations of underwater operations are inherent in marine science, including biodiversity observation. Imagery is a promising way to address these challenges, but the diversity of organisms thwarts simple automated analysis. Recent developments in computer vision methods, such as convolutional neural networks (CNN), are promising for automated classification and detection tasks but are typically very computationally expensive and require extensive training on large datasets. Therefore, managing and connecting distributed computation, large storage and human annotations of diverse marine datasets is crucial for effective application of these methods. BisQue is a cloud-based system for management, annotation, visualization, analysis and data mining of underwater and remote sensing imagery and associated data. Designed to hide the complexity of distributed storage, large computational clusters, diversity of data formats and inhomogeneous computational environments behind a user friendly web-based interface, BisQue is built around an idea of flexible and hierarchical annotations defined by the user. Such textual and graphical annotations can describe captured attributes and the relationships between data elements. Annotations are powerful enough to describe cells in fluorescent 4D images, fish species in underwater videos and kelp beds in aerial imagery. Presently we are developing BisQue-based analysis modules for automated identification of benthic marine organisms. Recent experiments with drop-out and CNN based classification of several thousand annotated underwater images demonstrated an overall accuracy above 70% for the 15 best performing species and above 85% for the top 5 species. Based on these promising results, we have extended bisque with a CNN-based classification system allowing continuous training on user-provided data.

  4. Who shares? Who doesn't? Factors associated with openly archiving raw research data.

    PubMed

    Piwowar, Heather A

    2011-01-01

    Many initiatives encourage investigators to share their raw datasets in hopes of increasing research efficiency and quality. Despite these investments of time and money, we do not have a firm grasp of who openly shares raw research data, who doesn't, and which initiatives are correlated with high rates of data sharing. In this analysis I use bibliometric methods to identify patterns in the frequency with which investigators openly archive their raw gene expression microarray datasets after study publication. Automated methods identified 11,603 articles published between 2000 and 2009 that describe the creation of gene expression microarray data. Associated datasets in best-practice repositories were found for 25% of these articles, increasing from less than 5% in 2001 to 30%-35% in 2007-2009. Accounting for sensitivity of the automated methods, approximately 45% of recent gene expression studies made their data publicly available. First-order factor analysis on 124 diverse bibliometric attributes of the data creation articles revealed 15 factors describing authorship, funding, institution, publication, and domain environments. In multivariate regression, authors were most likely to share data if they had prior experience sharing or reusing data, if their study was published in an open access journal or a journal with a relatively strong data sharing policy, or if the study was funded by a large number of NIH grants. Authors of studies on cancer and human subjects were least likely to make their datasets available. These results suggest research data sharing levels are still low and increasing only slowly, and data is least available in areas where it could make the biggest impact. Let's learn from those with high rates of sharing to embrace the full potential of our research output.

  5. Merging of multi-string BWTs with applications

    PubMed Central

    Holt, James; McMillan, Leonard

    2014-01-01

    Motivation: The throughput of genomic sequencing has increased to the point that is overrunning the rate of downstream analysis. This, along with the desire to revisit old data, has led to a situation where large quantities of raw, and nearly impenetrable, sequence data are rapidly filling the hard drives of modern biology labs. These datasets can be compressed via a multi-string variant of the Burrows–Wheeler Transform (BWT), which provides the side benefit of searches for arbitrary k-mers within the raw data as well as the ability to reconstitute arbitrary reads as needed. We propose a method for merging such datasets for both increased compression and downstream analysis. Results: We present a novel algorithm that merges multi-string BWTs in O(LCS×N) time where LCS is the length of their longest common substring between any of the inputs, and N is the total length of all inputs combined (number of symbols) using O(N×log2(F)) bits where F is the number of multi-string BWTs merged. This merged multi-string BWT is also shown to have a higher compressibility compared with the input multi-string BWTs separately. Additionally, we explore some uses of a merged multi-string BWT for bioinformatics applications. Availability and implementation: The MSBWT package is available through PyPI with source code located at https://code.google.com/p/msbwt/. Contact: holtjma@cs.unc.edu PMID:25172922

  6. Multi-modal data fusion using source separation: Two effective models based on ICA and IVA and their properties

    PubMed Central

    Adali, Tülay; Levin-Schwartz, Yuri; Calhoun, Vince D.

    2015-01-01

    Fusion of information from multiple sets of data in order to extract a set of features that are most useful and relevant for the given task is inherent to many problems we deal with today. Since, usually, very little is known about the actual interaction among the datasets, it is highly desirable to minimize the underlying assumptions. This has been the main reason for the growing importance of data-driven methods, and in particular of independent component analysis (ICA) as it provides useful decompositions with a simple generative model and using only the assumption of statistical independence. A recent extension of ICA, independent vector analysis (IVA) generalizes ICA to multiple datasets by exploiting the statistical dependence across the datasets, and hence, as we discuss in this paper, provides an attractive solution to fusion of data from multiple datasets along with ICA. In this paper, we focus on two multivariate solutions for multi-modal data fusion that let multiple modalities fully interact for the estimation of underlying features that jointly report on all modalities. One solution is the Joint ICA model that has found wide application in medical imaging, and the second one is the the Transposed IVA model introduced here as a generalization of an approach based on multi-set canonical correlation analysis. In the discussion, we emphasize the role of diversity in the decompositions achieved by these two models, present their properties and implementation details to enable the user make informed decisions on the selection of a model along with its associated parameters. Discussions are supported by simulation results to help highlight the main issues in the implementation of these methods. PMID:26525830

  7. Kernel-aligned multi-view canonical correlation analysis for image recognition

    NASA Astrophysics Data System (ADS)

    Su, Shuzhi; Ge, Hongwei; Yuan, Yun-Hao

    2016-09-01

    Existing kernel-based correlation analysis methods mainly adopt a single kernel in each view. However, only a single kernel is usually insufficient to characterize nonlinear distribution information of a view. To solve the problem, we transform each original feature vector into a 2-dimensional feature matrix by means of kernel alignment, and then propose a novel kernel-aligned multi-view canonical correlation analysis (KAMCCA) method on the basis of the feature matrices. Our proposed method can simultaneously employ multiple kernels to better capture the nonlinear distribution information of each view, so that correlation features learned by KAMCCA can have well discriminating power in real-world image recognition. Extensive experiments are designed on five real-world image datasets, including NIR face images, thermal face images, visible face images, handwritten digit images, and object images. Promising experimental results on the datasets have manifested the effectiveness of our proposed method.

  8. Data-Driven Surface Traversability Analysis for Mars 2020 Landing Site Selection

    NASA Technical Reports Server (NTRS)

    Ono, Masahiro; Rothrock, Brandon; Almeida, Eduardo; Ansar, Adnan; Otero, Richard; Huertas, Andres; Heverly, Matthew

    2015-01-01

    The objective of this paper is three-fold: 1) to describe the engineering challenges in the surface mobility of the Mars 2020 Rover mission that are considered in the landing site selection processs, 2) to introduce new automated traversability analysis capabilities, and 3) to present the preliminary analysis results for top candidate landing sites. The analysis capabilities presented in this paper include automated terrain classification, automated rock detection, digital elevation model (DEM) generation, and multi-ROI (region of interest) route planning. These analysis capabilities enable to fully utilize the vast volume of high-resolution orbiter imagery, quantitatively evaluate surface mobility requirements for each candidate site, and reject subjectivity in the comparison between sites in terms of engineering considerations. The analysis results supported the discussion in the Second Landing Site Workshop held in August 2015, which resulted in selecting eight candidate sites that will be considered in the third workshop.

  9. Upper Blue Nile basin water budget from a multi-model perspective

    NASA Astrophysics Data System (ADS)

    Jung, Hahn Chul; Getirana, Augusto; Policelli, Frederick; McNally, Amy; Arsenault, Kristi R.; Kumar, Sujay; Tadesse, Tsegaye; Peters-Lidard, Christa D.

    2017-12-01

    Improved understanding of the water balance in the Blue Nile is of critical importance because of increasingly frequent hydroclimatic extremes under a changing climate. The intercomparison and evaluation of multiple land surface models (LSMs) associated with different meteorological forcing and precipitation datasets can offer a moderate range of water budget variable estimates. In this context, two LSMs, Noah version 3.3 (Noah3.3) and Catchment LSM version Fortuna 2.5 (CLSMF2.5) coupled with the Hydrological Modeling and Analysis Platform (HyMAP) river routing scheme are used to produce hydrological estimates over the region. The two LSMs were forced with different combinations of two reanalysis-based meteorological datasets from the Modern-Era Retrospective analysis for Research and Applications datasets (i.e., MERRA-Land and MERRA-2) and three observation-based precipitation datasets, generating a total of 16 experiments. Modeled evapotranspiration (ET), streamflow, and terrestrial water storage estimates were evaluated against the Atmosphere-Land Exchange Inverse (ALEXI) ET, in-situ streamflow observations, and NASA Gravity Recovery and Climate Experiment (GRACE) products, respectively. Results show that CLSMF2.5 provided better representation of the water budget variables than Noah3.3 in terms of Nash-Sutcliffe coefficient when considering all meteorological forcing datasets and precipitation datasets. The model experiments forced with observation-based products, the Climate Hazards group Infrared Precipitation with Stations (CHIRPS) and the Tropical Rainfall Measuring Mission (TRMM) Multi-Satellite Precipitation Analysis (TMPA), outperform those run with MERRA-Land and MERRA-2 precipitation. The results presented in this paper would suggest that the Famine Early Warning Systems Network (FEWS NET) Land Data Assimilation System incorporate CLSMF2.5 and HyMAP routing scheme to better represent the water balance in this region.

  10. An Integrated Nonlinear Analysis library - (INA) for solar system plasma turbulence

    NASA Astrophysics Data System (ADS)

    Munteanu, Costel; Kovacs, Peter; Echim, Marius; Koppan, Andras

    2014-05-01

    We present an integrated software library dedicated to the analysis of time series recorded in space and adapted to investigate turbulence, intermittency and multifractals. The library is written in MATLAB and provides a graphical user interface (GUI) customized for the analysis of space physics data available online like: Coordinated Data Analysis Web (CDAWeb), Automated Multi Dataset Analysis system (AMDA), Planetary Science Archive (PSA), World Data Center Kyoto (WDC), Ulysses Final Archive (UFA) and Cluster Active Archive (CAA). Three main modules are already implemented in INA : the Power Spectral Density (PSD) Analysis, the Wavelet and Intemittency Analysis and the Probability Density Functions (PDF) analysis.The layered structure of the software allows the user to easily switch between different modules/methods while retaining the same time interval for the analysis. The wavelet analysis module includes algorithms to compute and analyse the PSD, the Scalogram, the Local Intermittency Measure (LIM) or the Flatness parameter. The PDF analysis module includes algorithms for computing the PDFs for a range of scales and parameters fully customizable by the user; it also computes the Flatness parameter and enables fast comparison with standard PDF profiles like, for instance, the Gaussian PDF. The library has been already tested on Cluster and Venus Express data and we will show relevant examples. Research supported by the European Community's Seventh Framework Programme (FP7/2007-2013) under grant agreement no 313038/STORM, and a grant of the Romanian Ministry of National Education, CNCS UEFISCDI, project number PN-II-ID PCE-2012-4-0418.

  11. FamPlex: a resource for entity recognition and relationship resolution of human protein families and complexes in biomedical text mining.

    PubMed

    Bachman, John A; Gyori, Benjamin M; Sorger, Peter K

    2018-06-28

    For automated reading of scientific publications to extract useful information about molecular mechanisms it is critical that genes, proteins and other entities be correctly associated with uniform identifiers, a process known as named entity linking or "grounding." Correct grounding is essential for resolving relationships among mined information, curated interaction databases, and biological datasets. The accuracy of this process is largely dependent on the availability of machine-readable resources associating synonyms and abbreviations commonly found in biomedical literature with uniform identifiers. In a task involving automated reading of ∼215,000 articles using the REACH event extraction software we found that grounding was disproportionately inaccurate for multi-protein families (e.g., "AKT") and complexes with multiple subunits (e.g."NF- κB"). To address this problem we constructed FamPlex, a manually curated resource defining protein families and complexes as they are commonly encountered in biomedical text. In FamPlex the gene-level constituents of families and complexes are defined in a flexible format allowing for multi-level, hierarchical membership. To create FamPlex, text strings corresponding to entities were identified empirically from literature and linked manually to uniform identifiers; these identifiers were also mapped to equivalent entries in multiple related databases. FamPlex also includes curated prefix and suffix patterns that improve named entity recognition and event extraction. Evaluation of REACH extractions on a test corpus of ∼54,000 articles showed that FamPlex significantly increased grounding accuracy for families and complexes (from 15 to 71%). The hierarchical organization of entities in FamPlex also made it possible to integrate otherwise unconnected mechanistic information across families, subfamilies, and individual proteins. Applications of FamPlex to the TRIPS/DRUM reading system and the Biocreative VI Bioentity Normalization Task dataset demonstrated the utility of FamPlex in other settings. FamPlex is an effective resource for improving named entity recognition, grounding, and relationship resolution in automated reading of biomedical text. The content in FamPlex is available in both tabular and Open Biomedical Ontology formats at https://github.com/sorgerlab/famplex under the Creative Commons CC0 license and has been integrated into the TRIPS/DRUM and REACH reading systems.

  12. Design, Development and Testing of Web Services for Multi-Sensor Snow Cover Mapping

    NASA Astrophysics Data System (ADS)

    Kadlec, Jiri

    This dissertation presents the design, development and validation of new data integration methods for mapping the extent of snow cover based on open access ground station measurements, remote sensing images, volunteer observer snow reports, and cross country ski track recordings from location-enabled mobile devices. The first step of the data integration procedure includes data discovery, data retrieval, and data quality control of snow observations at ground stations. The WaterML R package developed in this work enables hydrologists to retrieve and analyze data from multiple organizations that are listed in the Consortium of Universities for the Advancement of Hydrologic Sciences Inc (CUAHSI) Water Data Center catalog directly within the R statistical software environment. Using the WaterML R package is demonstrated by running an energy balance snowpack model in R with data inputs from CUAHSI, and by automating uploads of real time sensor observations to CUAHSI HydroServer. The second step of the procedure requires efficient access to multi-temporal remote sensing snow images. The Snow Inspector web application developed in this research enables the users to retrieve a time series of fractional snow cover from the Moderate Resolution Imaging Spectroradiometer (MODIS) for any point on Earth. The time series retrieval method is based on automated data extraction from tile images provided by a Web Map Tile Service (WMTS). The average required time for retrieving 100 days of data using this technique is 5.4 seconds, which is significantly faster than other methods that require the download of large satellite image files. The presented data extraction technique and space-time visualization user interface can be used as a model for working with other multi-temporal hydrologic or climate data WMTS services. The third, final step of the data integration procedure is generating continuous daily snow cover maps. A custom inverse distance weighting method has been developed to combine volunteer snow reports, cross-country ski track reports and station measurements to fill cloud gaps in the MODIS snow cover product. The method is demonstrated by producing a continuous daily time step snow presence probability map dataset for the Czech Republic region. The ability of the presented methodology to reconstruct MODIS snow cover under cloud is validated by simulating cloud cover datasets and comparing estimated snow cover to actual MODIS snow cover. The percent correctly classified indicator showed accuracy between 80 and 90% using this method. Using crowdsourcing data (volunteer snow reports and ski tracks) improves the map accuracy by 0.7--1.2%. The output snow probability map data sets are published online using web applications and web services. Keywords: crowdsourcing, image analysis, interpolation, MODIS, R statistical software, snow cover, snowpack probability, Tethys platform, time series, WaterML, web services, winter sports.

  13. Detection of neuron membranes in electron microscopy images using a serial neural network architecture.

    PubMed

    Jurrus, Elizabeth; Paiva, Antonio R C; Watanabe, Shigeki; Anderson, James R; Jones, Bryan W; Whitaker, Ross T; Jorgensen, Erik M; Marc, Robert E; Tasdizen, Tolga

    2010-12-01

    Study of nervous systems via the connectome, the map of connectivities of all neurons in that system, is a challenging problem in neuroscience. Towards this goal, neurobiologists are acquiring large electron microscopy datasets. However, the shear volume of these datasets renders manual analysis infeasible. Hence, automated image analysis methods are required for reconstructing the connectome from these very large image collections. Segmentation of neurons in these images, an essential step of the reconstruction pipeline, is challenging because of noise, anisotropic shapes and brightness, and the presence of confounding structures. The method described in this paper uses a series of artificial neural networks (ANNs) in a framework combined with a feature vector that is composed of image intensities sampled over a stencil neighborhood. Several ANNs are applied in series allowing each ANN to use the classification context provided by the previous network to improve detection accuracy. We develop the method of serial ANNs and show that the learned context does improve detection over traditional ANNs. We also demonstrate advantages over previous membrane detection methods. The results are a significant step towards an automated system for the reconstruction of the connectome. Copyright 2010 Elsevier B.V. All rights reserved.

  14. An Automated Method for High-Throughput Screening of Arabidopsis Rosette Growth in Multi-Well Plates and Its Validation in Stress Conditions.

    PubMed

    De Diego, Nuria; Fürst, Tomáš; Humplík, Jan F; Ugena, Lydia; Podlešáková, Kateřina; Spíchal, Lukáš

    2017-01-01

    High-throughput plant phenotyping platforms provide new possibilities for automated, fast scoring of several plant growth and development traits, followed over time using non-invasive sensors. Using Arabidops is as a model offers important advantages for high-throughput screening with the opportunity to extrapolate the results obtained to other crops of commercial interest. In this study we describe the development of a highly reproducible high-throughput Arabidopsis in vitro bioassay established using our OloPhen platform, suitable for analysis of rosette growth in multi-well plates. This method was successfully validated on example of multivariate analysis of Arabidopsis rosette growth in different salt concentrations and the interaction with varying nutritional composition of the growth medium. Several traits such as changes in the rosette area, relative growth rate, survival rate and homogeneity of the population are scored using fully automated RGB imaging and subsequent image analysis. The assay can be used for fast screening of the biological activity of chemical libraries, phenotypes of transgenic or recombinant inbred lines, or to search for potential quantitative trait loci. It is especially valuable for selecting genotypes or growth conditions that improve plant stress tolerance.

  15. The Dubious Benefits of Multi-Level Modeling

    ERIC Educational Resources Information Center

    Gorard, Stephen

    2007-01-01

    This paper presents an argument against the wider adoption of complex forms of data analysis, using multi-level modeling (MLM) as an extended case study. MLM was devised to overcome some deficiencies in existing datasets, such as the bias caused by clustering. The paper suggests that MLM has an unclear theoretical and empirical basis, has not led…

  16. Management and assimilation of diverse, distributed watershed datasets

    NASA Astrophysics Data System (ADS)

    Varadharajan, C.; Faybishenko, B.; Versteeg, R.; Agarwal, D.; Hubbard, S. S.; Hendrix, V.

    2016-12-01

    The U.S. Department of Energy's (DOE) Watershed Function Scientific Focus Area (SFA) seeks to determine how perturbations to mountainous watersheds (e.g., floods, drought, early snowmelt) impact the downstream delivery of water, nutrients, carbon, and metals over seasonal to decadal timescales. We are building a software platform that enables integration of diverse and disparate field, laboratory, and simulation datasets, of various types including hydrological, geological, meteorological, geophysical, geochemical, ecological and genomic datasets across a range of spatial and temporal scales within the Rifle floodplain and the East River watershed, Colorado. We are using agile data management and assimilation approaches, to enable web-based integration of heterogeneous, multi-scale dataSensor-based observations of water-level, vadose zone and groundwater temperature, water quality, meteorology as well as biogeochemical analyses of soil and groundwater samples have been curated and archived in federated databases. Quality Assurance and Quality Control (QA/QC) are performed on priority datasets needed for on-going scientific analyses, and hydrological and geochemical modeling. Automated QA/QC methods are used to identify and flag issues in the datasets. Data integration is achieved via a brokering service that dynamically integrates data from distributed databases via web services, based on user queries. The integrated results are presented to users in a portal that enables intuitive search, interactive visualization and download of integrated datasets. The concepts, approaches and codes being used are shared across various data science components of various large DOE-funded projects such as the Watershed Function SFA, Next Generation Ecosystem Experiment (NGEE) Tropics, Ameriflux/FLUXNET, and Advanced Simulation Capability for Environmental Management (ASCEM), and together contribute towards DOE's cyberinfrastructure for data management and model-data integration.

  17. Exploring the complementarity of THz pulse imaging and DCE-MRIs: Toward a unified multi-channel classification and a deep learning framework.

    PubMed

    Yin, X-X; Zhang, Y; Cao, J; Wu, J-L; Hadjiloucas, S

    2016-12-01

    We provide a comprehensive account of recent advances in biomedical image analysis and classification from two complementary imaging modalities: terahertz (THz) pulse imaging and dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI). The work aims to highlight underlining commonalities in both data structures so that a common multi-channel data fusion framework can be developed. Signal pre-processing in both datasets is discussed briefly taking into consideration advances in multi-resolution analysis and model based fractional order calculus system identification. Developments in statistical signal processing using principal component and independent component analysis are also considered. These algorithms have been developed independently by the THz-pulse imaging and DCE-MRI communities, and there is scope to place them in a common multi-channel framework to provide better software standardization at the pre-processing de-noising stage. A comprehensive discussion of feature selection strategies is also provided and the importance of preserving textural information is highlighted. Feature extraction and classification methods taking into consideration recent advances in support vector machine (SVM) and extreme learning machine (ELM) classifiers and their complex extensions are presented. An outlook on Clifford algebra classifiers and deep learning techniques suitable to both types of datasets is also provided. The work points toward the direction of developing a new unified multi-channel signal processing framework for biomedical image analysis that will explore synergies from both sensing modalities for inferring disease proliferation. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.

  18. Assembling Large, Multi-Sensor Climate Datasets Using the SciFlo Grid Workflow System

    NASA Astrophysics Data System (ADS)

    Wilson, B. D.; Manipon, G.; Xing, Z.; Fetzer, E.

    2008-12-01

    NASA's Earth Observing System (EOS) is the world's most ambitious facility for studying global climate change. The mandate now is to combine measurements from the instruments on the A-Train platforms (AIRS, AMSR-E, MODIS, MISR, MLS, and CloudSat) and other Earth probes to enable large-scale studies of climate change over periods of years to decades. However, moving from predominantly single-instrument studies to a multi-sensor, measurement-based model for long-duration analysis of important climate variables presents serious challenges for large-scale data mining and data fusion. For example, one might want to compare temperature and water vapor retrievals from one instrument (AIRS) to another instrument (MODIS), and to a model (ECMWF), stratify the comparisons using a classification of the cloud scenes from CloudSat, and repeat the entire analysis over years of AIRS data. To perform such an analysis, one must discover & access multiple datasets from remote sites, find the space/time matchups between instruments swaths and model grids, understand the quality flags and uncertainties for retrieved physical variables, and assemble merged datasets for further scientific and statistical analysis. To meet these large-scale challenges, we are utilizing a Grid computing and dataflow framework, named SciFlo, in which we are deploying a set of versatile and reusable operators for data query, access, subsetting, co-registration, mining, fusion, and advanced statistical analysis. SciFlo is a semantically-enabled ("smart") Grid Workflow system that ties together a peer-to-peer network of computers into an efficient engine for distributed computation. The SciFlo workflow engine enables scientists to do multi-instrument Earth Science by assembling remotely-invokable Web Services (SOAP or http GET URLs), native executables, command-line scripts, and Python codes into a distributed computing flow. A scientist visually authors the graph of operation in the VizFlow GUI, or uses a text editor to modify the simple XML workflow documents. The SciFlo client & server engines optimize the execution of such distributed workflows and allow the user to transparently find and use datasets and operators without worrying about the actual location of the Grid resources. The engine transparently moves data to the operators, and moves operators to the data (on the dozen trusted SciFlo nodes). SciFlo also deploys a variety of Data Grid services to: query datasets in space and time, locate & retrieve on-line data granules, provide on-the-fly variable and spatial subsetting, and perform pairwise instrument matchups for A-Train datasets. These services are combined into efficient workflows to assemble the desired large-scale, merged climate datasets. SciFlo is currently being applied in several large climate studies: comparisons of aerosol optical depth between MODIS, MISR, AERONET ground network, and U. Michigan's IMPACT aerosol transport model; characterization of long-term biases in microwave and infrared instruments (AIRS, MLS) by comparisons to GPS temperature retrievals accurate to 0.1 degrees Kelvin; and construction of a decade-long, multi-sensor water vapor climatology stratified by classified cloud scene by bringing together datasets from AIRS/AMSU, AMSR-E, MLS, MODIS, and CloudSat (NASA MEASUREs grant, Fetzer PI). The presentation will discuss the SciFlo technologies, their application in these distributed workflows, and the many challenges encountered in assembling and analyzing these massive datasets.

  19. Assembling Large, Multi-Sensor Climate Datasets Using the SciFlo Grid Workflow System

    NASA Astrophysics Data System (ADS)

    Wilson, B.; Manipon, G.; Xing, Z.; Fetzer, E.

    2009-04-01

    NASA's Earth Observing System (EOS) is an ambitious facility for studying global climate change. The mandate now is to combine measurements from the instruments on the "A-Train" platforms (AIRS, AMSR-E, MODIS, MISR, MLS, and CloudSat) and other Earth probes to enable large-scale studies of climate change over periods of years to decades. However, moving from predominantly single-instrument studies to a multi-sensor, measurement-based model for long-duration analysis of important climate variables presents serious challenges for large-scale data mining and data fusion. For example, one might want to compare temperature and water vapor retrievals from one instrument (AIRS) to another instrument (MODIS), and to a model (ECMWF), stratify the comparisons using a classification of the "cloud scenes" from CloudSat, and repeat the entire analysis over years of AIRS data. To perform such an analysis, one must discover & access multiple datasets from remote sites, find the space/time "matchups" between instruments swaths and model grids, understand the quality flags and uncertainties for retrieved physical variables, assemble merged datasets, and compute fused products for further scientific and statistical analysis. To meet these large-scale challenges, we are utilizing a Grid computing and dataflow framework, named SciFlo, in which we are deploying a set of versatile and reusable operators for data query, access, subsetting, co-registration, mining, fusion, and advanced statistical analysis. SciFlo is a semantically-enabled ("smart") Grid Workflow system that ties together a peer-to-peer network of computers into an efficient engine for distributed computation. The SciFlo workflow engine enables scientists to do multi-instrument Earth Science by assembling remotely-invokable Web Services (SOAP or http GET URLs), native executables, command-line scripts, and Python codes into a distributed computing flow. A scientist visually authors the graph of operation in the VizFlow GUI, or uses a text editor to modify the simple XML workflow documents. The SciFlo client & server engines optimize the execution of such distributed workflows and allow the user to transparently find and use datasets and operators without worrying about the actual location of the Grid resources. The engine transparently moves data to the operators, and moves operators to the data (on the dozen trusted SciFlo nodes). SciFlo also deploys a variety of Data Grid services to: query datasets in space and time, locate & retrieve on-line data granules, provide on-the-fly variable and spatial subsetting, perform pairwise instrument matchups for A-Train datasets, and compute fused products. These services are combined into efficient workflows to assemble the desired large-scale, merged climate datasets. SciFlo is currently being applied in several large climate studies: comparisons of aerosol optical depth between MODIS, MISR, AERONET ground network, and U. Michigan's IMPACT aerosol transport model; characterization of long-term biases in microwave and infrared instruments (AIRS, MLS) by comparisons to GPS temperature retrievals accurate to 0.1 degrees Kelvin; and construction of a decade-long, multi-sensor water vapor climatology stratified by classified cloud scene by bringing together datasets from AIRS/AMSU, AMSR-E, MLS, MODIS, and CloudSat (NASA MEASUREs grant, Fetzer PI). The presentation will discuss the SciFlo technologies, their application in these distributed workflows, and the many challenges encountered in assembling and analyzing these massive datasets.

  20. An automated method for depth-dependent crustal anisotropy detection with receiver function

    NASA Astrophysics Data System (ADS)

    Licciardi, Andrea; Piana Agostinetti, Nicola

    2015-04-01

    Crustal seismic anisotropy can be generated by a variety of geological factors (e.g. alignment of minerals/cracks, presence of fluids etc...). In the case of transversely isotropic media approximation, information about strength and orientation of the anisotropic symmetry axis (including dip) can be extracted from the analysis of P-to-S conversions by means of teleseismic receiver functions (RF). Classically this has been achieved through probabilistic inversion encoding a forward solver for anisotropic media. This approach strongly relies on apriori choices regarding Earth's crust parameterization and velocity structure, requires an extensive knowledge of the RF method and involves time consuming trial and error steps. We present an automated method for reducing the non-uniqueness in this kind of inversions and for retrieving depth-dependent seismic anisotropy parameters in the crust with a resolution of some hundreds of meters. The method involves a multi-frequency approach (for better absolute Vs determination) and the decomposition of the RF data-set in its azimuthal harmonics (to separate the effects of isotropic and anisotropic component). A first inversion of the isotropic component (Zero-order harmonics) by means of a Reversible jump Markov Chain Monte Carlo (RjMCMC) provides the posterior probability distribution for the position of the velocity jumps at depth, from which information on the number of layers and the S-wave velocity structure below a broadband seismic station can be extracted. This information together with that encoded in the first order harmonic is jointly used in an automated way to: (1) determine the number of anisotropic layers and their approximate position at depth, and (2) narrow the search boundaries for layer thickness and S-wave velocity. Finaly, an inversion is carried out with a Neighbourhood Algorithm (NA), where the free parameters are represented by the anisotropic structure beneath the seismic station. We tested the method against synthetic RF with correlated Gaussian noise to investigate the resolution power for multiple and thin (1-5 km) anisotropic layers in the crust. The algorithm correctly retrieves the true models for the number and the position of the anisotropic layers, their strength and orientation of the anisotropic symmetry axis, although the trend direction is better constrained than the dip angle. The method is then applied to a real data-set and the results compared with previous RF studies.

  1. Analysis of PNGase F-resistant N-glycopeptides using SugarQb for Proteome Discoverer 2.1 reveals cryptic substrate specificities.

    PubMed

    Stadlmann, Johannes; Hoi, David M; Taubenschmid, Jasmin; Mechtler, Karl; Penninger, Josef M

    2018-05-18

    SugarQb (www.imba.oeaw.ac.at/sugarqb) is a freely available collection of computational tools for the automated identification of intact glycopeptides from high-resolution HCD MS/MS data-sets in the Proteome Discoverer environment. We report the migration of SugarQb to the latest and free version of Proteome Discoverer 2.1, and apply it to the analysis of PNGase F-resistant N-glycopeptides from mouse embryonic stem cells. The analysis of intact glycopeptides highlights unexpected technical limitations to PNGase F-dependent glycoproteomic workflows at the proteome level, and warrants a critical re-interpretation of seminal data-sets in the context of N-glycosylation-site prediction. This article is protected by copyright. All rights reserved. This article is protected by copyright. All rights reserved.

  2. Analysis of calibrated seafloor backscatter for habitat classification methodology and case study of 158 spots in the Bay of Biscay and Celtic Sea

    NASA Astrophysics Data System (ADS)

    Fezzani, Ridha; Berger, Laurent

    2018-06-01

    An automated signal-based method was developed in order to analyse the seafloor backscatter data logged by calibrated multibeam echosounder. The processing consists first in the clustering of each survey sub-area into a small number of homogeneous sediment types, based on the backscatter average level at one or several incidence angles. Second, it uses their local average angular response to extract discriminant descriptors, obtained by fitting the field data to the Generic Seafloor Acoustic Backscatter parametric model. Third, the descriptors are used for seafloor type classification. The method was tested on the multi-year data recorded by a calibrated 90-kHz Simrad ME70 multibeam sonar operated in the Bay of Biscay, France and Celtic Sea, Ireland. It was applied for seafloor-type classification into 12 classes, to a dataset of 158 spots surveyed for demersal and benthic fauna study and monitoring. Qualitative analyses and classified clusters using extracted parameters show a good discriminatory potential, indicating the robustness of this approach.

  3. Automatic liver volume segmentation and fibrosis classification

    NASA Astrophysics Data System (ADS)

    Bal, Evgeny; Klang, Eyal; Amitai, Michal; Greenspan, Hayit

    2018-02-01

    In this work, we present an automatic method for liver segmentation and fibrosis classification in liver computed-tomography (CT) portal phase scans. The input is a full abdomen CT scan with an unknown number of slices, and the output is a liver volume segmentation mask and a fibrosis grade. A multi-stage analysis scheme is applied to each scan, including: volume segmentation, texture features extraction and SVM based classification. Data contains portal phase CT examinations from 80 patients, taken with different scanners. Each examination has a matching Fibroscan grade. The dataset was subdivided into two groups: first group contains healthy cases and mild fibrosis, second group contains moderate fibrosis, severe fibrosis and cirrhosis. Using our automated algorithm, we achieved an average dice index of 0.93 ± 0.05 for segmentation and a sensitivity of 0.92 and specificity of 0.81for classification. To the best of our knowledge, this is a first end to end automatic framework for liver fibrosis classification; an approach that, once validated, can have a great potential value in the clinic.

  4. Inaugural Genomics Automation Congress and the coming deluge of sequencing data.

    PubMed

    Creighton, Chad J

    2010-10-01

    Presentations at Select Biosciences's first 'Genomics Automation Congress' (Boston, MA, USA) in 2010 focused on next-generation sequencing and the platforms and methodology around them. The meeting provided an overview of sequencing technologies, both new and emerging. Speakers shared their recent work on applying sequencing to profile cells for various levels of biomolecular complexity, including DNA sequences, DNA copy, DNA methylation, mRNA and microRNA. With sequencing time and costs continuing to drop dramatically, a virtual explosion of very large sequencing datasets is at hand, which will probably present challenges and opportunities for high-level data analysis and interpretation, as well as for information technology infrastructure.

  5. [Comparative Study of Patient Identifications for Conventional and Portable Chest Radiographs Utilizing ROC Analysis].

    PubMed

    Kawashima, Hiroki; Hayashi, Norio; Ohno, Naoki; Matsuura, Yukihiro; Sanada, Shigeru

    2015-08-01

    To evaluate the patient identification ability of radiographers, previous and current chest radiographs were assessed with observer study utilizing a receiver operating characteristics (ROCs) analysis. This study included portable and conventional chest radiographs from 43 same and 43 different patients. The dataset used in this study was divided into the three following groups: (1) a pair of portable radiographs, (2) a pair of conventional radiographs, and (3) a combination of each type of radiograph. Seven observers participated in this ROC study, which aimed to identify same or different patients, using these datasets. ROC analysis was conducted to calculate the average area under ROC curve obtained by each observer (AUCave), and a statistical test was performed using the multi-reader multi-case method. Comparable results were obtained with pairs of portable (AUCave: 0.949) and conventional radiographs (AUCave: 0.951). In a comparison between the same modality, there were no significant differences. In contrast, the ability to identify patients by comparing a portable and conventional radiograph (AUCave: 0.873) was lower than with the matching datasets (p=0.002 and p=0.004, respectively). In conclusion, the use of different imaging modalities reduces radiographers' ability to identify their patients.

  6. Towards Careful Practices for Automated Linguistic Analysis of Group Learning

    ERIC Educational Resources Information Center

    Howley, Iris; Rosé, Carolyn Penstein

    2016-01-01

    The multifaceted nature of collaborative learning environments necessitates theory to investigate the cognitive, motivational, and relational dimensions of collaboration. Several existing frameworks include aspects related to each of these three. This article explores the capability of multi-dimensional frameworks for analysis of collaborative…

  7. Automated Essay Grading using Machine Learning Algorithm

    NASA Astrophysics Data System (ADS)

    Ramalingam, V. V.; Pandian, A.; Chetry, Prateek; Nigam, Himanshu

    2018-04-01

    Essays are paramount for of assessing the academic excellence along with linking the different ideas with the ability to recall but are notably time consuming when they are assessed manually. Manual grading takes significant amount of evaluator’s time and hence it is an expensive process. Automated grading if proven effective will not only reduce the time for assessment but comparing it with human scores will also make the score realistic. The project aims to develop an automated essay assessment system by use of machine learning techniques by classifying a corpus of textual entities into small number of discrete categories, corresponding to possible grades. Linear regression technique will be utilized for training the model along with making the use of various other classifications and clustering techniques. We intend to train classifiers on the training set, make it go through the downloaded dataset, and then measure performance our dataset by comparing the obtained values with the dataset values. We have implemented our model using java.

  8. Multiscale analysis of the correlation of processing parameters on viscidity of composites fabricated by automated fiber placement

    NASA Astrophysics Data System (ADS)

    Han, Zhenyu; Sun, Shouzheng; Fu, Yunzhong; Fu, Hongya

    2017-10-01

    Viscidity is an important physical indicator for assessing fluidity of resin that is beneficial to contact resin with the fibers effectively and reduce manufacturing defects during automated fiber placement (AFP) process. However, the effect of processing parameters on viscidity evolution is rarely studied during AFP process. In this paper, viscidities under different scales are analyzed based on multi-scale analysis method. Firstly, viscous dissipation energy (VDE) within meso-unit under different processing parameters is assessed by using finite element method (FEM). According to multi-scale energy transfer model, meso-unit energy is used as the boundary condition for microscopic analysis. Furthermore, molecular structure of micro-system is built by molecular dynamics (MD) method. And viscosity curves are then obtained by integrating stress autocorrelation function (SACF) with time. Finally, the correlation characteristics of processing parameters to viscosity are revealed by using gray relational analysis method (GRAM). A group of processing parameters is found out to achieve the stability of viscosity and better fluidity of resin.

  9. MARVEL: A knowledge-based productivity enhancement tool for real-time multi-mission and multi-subsystem spacecraft operations

    NASA Astrophysics Data System (ADS)

    Schwuttke, Ursula M.; Veregge, John, R.; Angelino, Robert; Childs, Cynthia L.

    1990-10-01

    The Monitor/Analyzer of Real-time Voyager Engineering Link (MARVEL) is described. It is the first automation tool to be used in an online mode for telemetry monitoring and analysis in mission operations. MARVEL combines standard automation techniques with embedded knowledge base systems to simultaneously provide real time monitoring of data from subsystems, near real time analysis of anomaly conditions, and both real time and non-real time user interface functions. MARVEL is currently capable of monitoring the Computer Command Subsystem (CCS), Flight Data Subsystem (FDS), and Attitude and Articulation Control Subsystem (AACS) for both Voyager spacecraft, simultaneously, on a single workstation. The goal of MARVEL is to provide cost savings and productivity enhancement in mission operations and to reduce the need for constant availability of subsystem expertise.

  10. Look@NanoSIMS--a tool for the analysis of nanoSIMS data in environmental microbiology.

    PubMed

    Polerecky, Lubos; Adam, Birgit; Milucka, Jana; Musat, Niculina; Vagner, Tomas; Kuypers, Marcel M M

    2012-04-01

    We describe an open-source freeware programme for high throughput analysis of nanoSIMS (nanometre-scale secondary ion mass spectrometry) data. The programme implements basic data processing and analytical functions, including display and drift-corrected accumulation of scanned planes, interactive and semi-automated definition of regions of interest (ROIs), and export of the ROIs' elemental and isotopic composition in graphical and text-based formats. Additionally, the programme offers new functions that were custom-designed to address the needs of environmental microbiologists. Specifically, it allows manual and automated classification of ROIs based on the information that is derived either from the nanoSIMS dataset itself (e.g. from labelling achieved by halogen in situ hybridization) or is provided externally (e.g. as a fluorescence in situ hybridization image). Moreover, by implementing post-processing routines coupled to built-in statistical tools, the programme allows rapid synthesis and comparative analysis of results from many different datasets. After validation of the programme, we illustrate how these new processing and analytical functions increase flexibility, efficiency and depth of the nanoSIMS data analysis. Through its custom-made and open-source design, the programme provides an efficient, reliable and easily expandable tool that can help a growing community of environmental microbiologists and researchers from other disciplines process and analyse their nanoSIMS data. © 2012 Society for Applied Microbiology and Blackwell Publishing Ltd.

  11. SU-F-I-45: An Automated Technique to Measure Image Contrast in Clinical CT Images

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Sanders, J; Abadi, E; Meng, B

    Purpose: To develop and validate an automated technique for measuring image contrast in chest computed tomography (CT) exams. Methods: An automated computer algorithm was developed to measure the distribution of Hounsfield units (HUs) inside four major organs: the lungs, liver, aorta, and bones. These organs were first segmented or identified using computer vision and image processing techniques. Regions of interest (ROIs) were automatically placed inside the lungs, liver, and aorta and histograms of the HUs inside the ROIs were constructed. The mean and standard deviation of each histogram were computed for each CT dataset. Comparison of the mean and standardmore » deviation of the HUs in the different organs provides different contrast values. The ROI for the bones is simply the segmentation mask of the bones. Since the histogram for bones does not follow a Gaussian distribution, the 25th and 75th percentile were computed instead of the mean. The sensitivity and accuracy of the algorithm was investigated by comparing the automated measurements with manual measurements. Fifteen contrast enhanced and fifteen non-contrast enhanced chest CT clinical datasets were examined in the validation procedure. Results: The algorithm successfully measured the histograms of the four organs in both contrast and non-contrast enhanced chest CT exams. The automated measurements were in agreement with manual measurements. The algorithm has sufficient sensitivity as indicated by the near unity slope of the automated versus manual measurement plots. Furthermore, the algorithm has sufficient accuracy as indicated by the high coefficient of determination, R2, values ranging from 0.879 to 0.998. Conclusion: Patient-specific image contrast can be measured from clinical datasets. The algorithm can be run on both contrast enhanced and non-enhanced clinical datasets. The method can be applied to automatically assess the contrast characteristics of clinical chest CT images and quantify dependencies that may not be captured in phantom data.« less

  12. PyFDAP: automated analysis of fluorescence decay after photoconversion (FDAP) experiments.

    PubMed

    Bläßle, Alexander; Müller, Patrick

    2015-03-15

    We developed the graphical user interface PyFDAP for the fitting of linear and non-linear decay functions to data from fluorescence decay after photoconversion (FDAP) experiments. PyFDAP structures and analyses large FDAP datasets and features multiple fitting and plotting options. PyFDAP was written in Python and runs on Ubuntu Linux, Mac OS X and Microsoft Windows operating systems. The software, a user guide and a test FDAP dataset are freely available for download from http://people.tuebingen.mpg.de/mueller-lab. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  13. Comparing algorithms for automated vessel segmentation in computed tomography scans of the lung: the VESSEL12 study

    PubMed Central

    Rudyanto, Rina D.; Kerkstra, Sjoerd; van Rikxoort, Eva M.; Fetita, Catalin; Brillet, Pierre-Yves; Lefevre, Christophe; Xue, Wenzhe; Zhu, Xiangjun; Liang, Jianming; Öksüz, İlkay; Ünay, Devrim; Kadipaşaogandcaron;lu, Kamuran; Estépar, Raúl San José; Ross, James C.; Washko, George R.; Prieto, Juan-Carlos; Hoyos, Marcela Hernández; Orkisz, Maciej; Meine, Hans; Hüllebrand, Markus; Stöcker, Christina; Mir, Fernando Lopez; Naranjo, Valery; Villanueva, Eliseo; Staring, Marius; Xiao, Changyan; Stoel, Berend C.; Fabijanska, Anna; Smistad, Erik; Elster, Anne C.; Lindseth, Frank; Foruzan, Amir Hossein; Kiros, Ryan; Popuri, Karteek; Cobzas, Dana; Jimenez-Carretero, Daniel; Santos, Andres; Ledesma-Carbayo, Maria J.; Helmberger, Michael; Urschler, Martin; Pienn, Michael; Bosboom, Dennis G.H.; Campo, Arantza; Prokop, Mathias; de Jong, Pim A.; Ortiz-de-Solorzano, Carlos; Muñoz-Barrutia, Arrate; van Ginneken, Bram

    2016-01-01

    The VESSEL12 (VESsel SEgmentation in the Lung) challenge objectively compares the performance of different algorithms to identify vessels in thoracic computed tomography (CT) scans. Vessel segmentation is fundamental in computer aided processing of data generated by 3D imaging modalities. As manual vessel segmentation is prohibitively time consuming, any real world application requires some form of automation. Several approaches exist for automated vessel segmentation, but judging their relative merits is difficult due to a lack of standardized evaluation. We present an annotated reference dataset containing 20 CT scans and propose nine categories to perform a comprehensive evaluation of vessel segmentation algorithms from both academia and industry. Twenty algorithms participated in the VESSEL12 challenge, held at International Symposium on Biomedical Imaging (ISBI) 2012. All results have been published at the VESSEL12 website http://vessel12.grand-challenge.org. The challenge remains ongoing and open to new participants. Our three contributions are: (1) an annotated reference dataset available online for evaluation of new algorithms; (2) a quantitative scoring system for objective comparison of algorithms; and (3) performance analysis of the strengths and weaknesses of the various vessel segmentation methods in the presence of various lung diseases. PMID:25113321

  14. Automated brain tumour detection and segmentation using superpixel-based extremely randomized trees in FLAIR MRI.

    PubMed

    Soltaninejad, Mohammadreza; Yang, Guang; Lambrou, Tryphon; Allinson, Nigel; Jones, Timothy L; Barrick, Thomas R; Howe, Franklyn A; Ye, Xujiong

    2017-02-01

    We propose a fully automated method for detection and segmentation of the abnormal tissue associated with brain tumour (tumour core and oedema) from Fluid- Attenuated Inversion Recovery (FLAIR) Magnetic Resonance Imaging (MRI). The method is based on superpixel technique and classification of each superpixel. A number of novel image features including intensity-based, Gabor textons, fractal analysis and curvatures are calculated from each superpixel within the entire brain area in FLAIR MRI to ensure a robust classification. Extremely randomized trees (ERT) classifier is compared with support vector machine (SVM) to classify each superpixel into tumour and non-tumour. The proposed method is evaluated on two datasets: (1) Our own clinical dataset: 19 MRI FLAIR images of patients with gliomas of grade II to IV, and (2) BRATS 2012 dataset: 30 FLAIR images with 10 low-grade and 20 high-grade gliomas. The experimental results demonstrate the high detection and segmentation performance of the proposed method using ERT classifier. For our own cohort, the average detection sensitivity, balanced error rate and the Dice overlap measure for the segmented tumour against the ground truth are 89.48 %, 6 % and 0.91, respectively, while, for the BRATS dataset, the corresponding evaluation results are 88.09 %, 6 % and 0.88, respectively. This provides a close match to expert delineation across all grades of glioma, leading to a faster and more reproducible method of brain tumour detection and delineation to aid patient management.

  15. Progress connecting multi-disciplinary geoscience communities through the VIVO semantic web application

    NASA Astrophysics Data System (ADS)

    Gross, M. B.; Mayernik, M. S.; Rowan, L. R.; Khan, H.; Boler, F. M.; Maull, K. E.; Stott, D.; Williams, S.; Corson-Rikert, J.; Johns, E. M.; Daniels, M. D.; Krafft, D. B.

    2015-12-01

    UNAVCO, UCAR, and Cornell University are working together to leverage semantic web technologies to enable discovery of people, datasets, publications and other research products, as well as the connections between them. The EarthCollab project, an EarthCube Building Block, is enhancing an existing open-source semantic web application, VIVO, to address connectivity gaps across distributed networks of researchers and resources related to the following two geoscience-based communities: (1) the Bering Sea Project, an interdisciplinary field program whose data archive is hosted by NCAR's Earth Observing Laboratory (EOL), and (2) UNAVCO, a geodetic facility and consortium that supports diverse research projects informed by geodesy. People, publications, datasets and grant information have been mapped to an extended version of the VIVO-ISF ontology and ingested into VIVO's database. Data is ingested using a custom set of scripts that include the ability to perform basic automated and curated disambiguation. VIVO can display a page for every object ingested, including connections to other objects in the VIVO database. A dataset page, for example, includes the dataset type, time interval, DOI, related publications, and authors. The dataset type field provides a connection to all other datasets of the same type. The author's page will show, among other information, related datasets and co-authors. Information previously spread across several unconnected databases is now stored in a single location. In addition to VIVO's default display, the new database can also be queried using SPARQL, a query language for semantic data. EarthCollab will also extend the VIVO web application. One such extension is the ability to cross-link separate VIVO instances across institutions, allowing local display of externally curated information. For example, Cornell's VIVO faculty pages will display UNAVCO's dataset information and UNAVCO's VIVO will display Cornell faculty member contact and position information. Additional extensions, including enhanced geospatial capabilities, will be developed following task-centered usability testing.

  16. Depth-area-duration characteristics of storm rainfall in Texas using Multi-Sensor Precipitation Estimates

    NASA Astrophysics Data System (ADS)

    McEnery, J. A.; Jitkajornwanich, K.

    2012-12-01

    This presentation will describe the methodology and overall system development by which a benchmark dataset of precipitation information has been used to characterize the depth-area-duration relations in heavy rain storms occurring over regions of Texas. Over the past two years project investigators along with the National Weather Service (NWS) West Gulf River Forecast Center (WGRFC) have developed and operated a gateway data system to ingest, store, and disseminate NWS multi-sensor precipitation estimates (MPE). As a pilot project of the Integrated Water Resources Science and Services (IWRSS) initiative, this testbed uses a Standard Query Language (SQL) server to maintain a full archive of current and historic MPE values within the WGRFC service area. These time series values are made available for public access as web services in the standard WaterML format. Having this volume of information maintained in a comprehensive database now allows the use of relational analysis capabilities within SQL to leverage these multi-sensor precipitation values and produce a valuable derivative product. The area of focus for this study is North Texas and will utilize values that originated from the West Gulf River Forecast Center (WGRFC); one of three River Forecast Centers currently represented in the holdings of this data system. Over the past two decades, NEXRAD radar has dramatically improved the ability to record rainfall. The resulting hourly MPE values, distributed over an approximate 4 km by 4 km grid, are considered by the NWS to be the "best estimate" of rainfall. The data server provides an accepted standard interface for internet access to the largest time-series dataset of NEXRAD based MPE values ever assembled. An automated script has been written to search and extract storms over the 18 year period of record from the contents of this massive historical precipitation database. Not only can it extract site-specific storms, but also duration-specific storms and storms separated by user defined inter-event periods. A separate storm database has been created to store the selected output. By storing output within tables in a separate database, we can make use of powerful SQL capabilities to perform flexible pattern analysis. Previous efforts have made use of historic data from limited clusters of irregularly spaced physical gauges. Spatial extent of the observational network has been a limiting factor. The relatively dense distribution of MPE provides a virtual mesh of observations stretched over the landscape. This work combines a unique hydrologic data resource with programming and database analysis to characterize storm depth-area-duration relationships.

  17. Automated, contour-based tracking and analysis of cell behaviour over long time scales in environments of varying complexity and cell density.

    PubMed

    Baker, Richard M; Brasch, Megan E; Manning, M Lisa; Henderson, James H

    2014-08-06

    Understanding single and collective cell motility in model environments is foundational to many current research efforts in biology and bioengineering. To elucidate subtle differences in cell behaviour despite cell-to-cell variability, we introduce an algorithm for tracking large numbers of cells for long time periods and present a set of physics-based metrics that quantify differences in cell trajectories. Our algorithm, termed automated contour-based tracking for in vitro environments (ACTIVE), was designed for adherent cell populations subject to nuclear staining or transfection. ACTIVE is distinct from existing tracking software because it accommodates both variability in image intensity and multi-cell interactions, such as divisions and occlusions. When applied to low-contrast images from live-cell experiments, ACTIVE reduced error in analysing cell occlusion events by as much as 43% compared with a benchmark-tracking program while simultaneously tracking cell divisions and resulting daughter-daughter cell relationships. The large dataset generated by ACTIVE allowed us to develop metrics that capture subtle differences between cell trajectories on different substrates. We present cell motility data for thousands of cells studied at varying densities on shape-memory-polymer-based nanotopographies and identify several quantitative differences, including an unanticipated difference between two 'control' substrates. We expect that ACTIVE will be immediately useful to researchers who require accurate, long-time-scale motility data for many cells. © 2014 The Author(s) Published by the Royal Society. All rights reserved.

  18. Image Quality Ranking Method for Microscopy

    PubMed Central

    Koho, Sami; Fazeli, Elnaz; Eriksson, John E.; Hänninen, Pekka E.

    2016-01-01

    Automated analysis of microscope images is necessitated by the increased need for high-resolution follow up of events in time. Manually finding the right images to be analyzed, or eliminated from data analysis are common day-to-day problems in microscopy research today, and the constantly growing size of image datasets does not help the matter. We propose a simple method and a software tool for sorting images within a dataset, according to their relative quality. We demonstrate the applicability of our method in finding good quality images in a STED microscope sample preparation optimization image dataset. The results are validated by comparisons to subjective opinion scores, as well as five state-of-the-art blind image quality assessment methods. We also show how our method can be applied to eliminate useless out-of-focus images in a High-Content-Screening experiment. We further evaluate the ability of our image quality ranking method to detect out-of-focus images, by extensive simulations, and by comparing its performance against previously published, well-established microscopy autofocus metrics. PMID:27364703

  19. MEGA-CC: computing core of molecular evolutionary genetics analysis program for automated and iterative data analysis.

    PubMed

    Kumar, Sudhir; Stecher, Glen; Peterson, Daniel; Tamura, Koichiro

    2012-10-15

    There is a growing need in the research community to apply the molecular evolutionary genetics analysis (MEGA) software tool for batch processing a large number of datasets and to integrate it into analysis workflows. Therefore, we now make available the computing core of the MEGA software as a stand-alone executable (MEGA-CC), along with an analysis prototyper (MEGA-Proto). MEGA-CC provides users with access to all the computational analyses available through MEGA's graphical user interface version. This includes methods for multiple sequence alignment, substitution model selection, evolutionary distance estimation, phylogeny inference, substitution rate and pattern estimation, tests of natural selection and ancestral sequence inference. Additionally, we have upgraded the source code for phylogenetic analysis using the maximum likelihood methods for parallel execution on multiple processors and cores. Here, we describe MEGA-CC and outline the steps for using MEGA-CC in tandem with MEGA-Proto for iterative and automated data analysis. http://www.megasoftware.net/.

  20. Modeling EEG Waveforms with Semi-Supervised Deep Belief Nets: Fast Classification and Anomaly Measurement

    PubMed Central

    Wulsin, D. F.; Gupta, J. R.; Mani, R.; Blanco, J. A.; Litt, B.

    2011-01-01

    Clinical electroencephalography (EEG) records vast amounts of human complex data yet is still reviewed primarily by human readers. Deep Belief Nets (DBNs) are a relatively new type of multi-layer neural network commonly tested on two-dimensional image data, but are rarely applied to times-series data such as EEG. We apply DBNs in a semi-supervised paradigm to model EEG waveforms for classification and anomaly detection. DBN performance was comparable to standard classifiers on our EEG dataset, and classification time was found to be 1.7 to 103.7 times faster than the other high-performing classifiers. We demonstrate how the unsupervised step of DBN learning produces an autoencoder that can naturally be used in anomaly measurement. We compare the use of raw, unprocessed data—a rarity in automated physiological waveform analysis—to hand-chosen features and find that raw data produces comparable classification and better anomaly measurement performance. These results indicate that DBNs and raw data inputs may be more effective for online automated EEG waveform recognition than other common techniques. PMID:21525569

  1. Autoreject: Automated artifact rejection for MEG and EEG data.

    PubMed

    Jas, Mainak; Engemann, Denis A; Bekhti, Yousra; Raimondo, Federico; Gramfort, Alexandre

    2017-10-01

    We present an automated algorithm for unified rejection and repair of bad trials in magnetoencephalography (MEG) and electroencephalography (EEG) signals. Our method capitalizes on cross-validation in conjunction with a robust evaluation metric to estimate the optimal peak-to-peak threshold - a quantity commonly used for identifying bad trials in M/EEG. This approach is then extended to a more sophisticated algorithm which estimates this threshold for each sensor yielding trial-wise bad sensors. Depending on the number of bad sensors, the trial is then repaired by interpolation or by excluding it from subsequent analysis. All steps of the algorithm are fully automated thus lending itself to the name Autoreject. In order to assess the practical significance of the algorithm, we conducted extensive validation and comparisons with state-of-the-art methods on four public datasets containing MEG and EEG recordings from more than 200 subjects. The comparisons include purely qualitative efforts as well as quantitatively benchmarking against human supervised and semi-automated preprocessing pipelines. The algorithm allowed us to automate the preprocessing of MEG data from the Human Connectome Project (HCP) going up to the computation of the evoked responses. The automated nature of our method minimizes the burden of human inspection, hence supporting scalability and reliability demanded by data analysis in modern neuroscience. Copyright © 2017 Elsevier Inc. All rights reserved.

  2. IntelliHealth: A medical decision support application using a novel weighted multi-layer classifier ensemble framework.

    PubMed

    Bashir, Saba; Qamar, Usman; Khan, Farhan Hassan

    2016-02-01

    Accuracy plays a vital role in the medical field as it concerns with the life of an individual. Extensive research has been conducted on disease classification and prediction using machine learning techniques. However, there is no agreement on which classifier produces the best results. A specific classifier may be better than others for a specific dataset, but another classifier could perform better for some other dataset. Ensemble of classifiers has been proved to be an effective way to improve classification accuracy. In this research we present an ensemble framework with multi-layer classification using enhanced bagging and optimized weighting. The proposed model called "HM-BagMoov" overcomes the limitations of conventional performance bottlenecks by utilizing an ensemble of seven heterogeneous classifiers. The framework is evaluated on five different heart disease datasets, four breast cancer datasets, two diabetes datasets, two liver disease datasets and one hepatitis dataset obtained from public repositories. The analysis of the results show that ensemble framework achieved the highest accuracy, sensitivity and F-Measure when compared with individual classifiers for all the diseases. In addition to this, the ensemble framework also achieved the highest accuracy when compared with the state of the art techniques. An application named "IntelliHealth" is also developed based on proposed model that may be used by hospitals/doctors for diagnostic advice. Copyright © 2015 Elsevier Inc. All rights reserved.

  3. A New Femtosecond Laser-Based Three-Dimensional Tomography Technique

    NASA Astrophysics Data System (ADS)

    Echlin, McLean P.

    2011-12-01

    Tomographic imaging has dramatically changed science, most notably in the fields of medicine and biology, by producing 3D views of structures which are too complex to understand in any other way. Current tomographic techniques require extensive time both for post-processing and data collection. Femtosecond laser based tomographic techniques have been developed in both standard atmosphere (femtosecond laser-based serial sectioning technique - FSLSS) and in vacuum (Tri-Beam System) for the fast collection (10 5mum3/s) of mm3 sized 3D datasets. Both techniques use femtosecond laser pulses to selectively remove layer-by-layer areas of material with low collateral damage and a negligible heat affected zone. To the authors knowledge, femtosecond lasers have never been used to serial section and these techniques have been entirely and uniquely developed by the author and his collaborators at the University of Michigan and University of California Santa Barbara. The FSLSS was applied to measure the 3D distribution of TiN particles in a 4330 steel. Single pulse ablation morphologies and rates were measured and collected from literature. Simultaneous two-phase ablation of TiN and steel matrix was shown to occur at fluences of 0.9-2 J/cm2. Laser scanning protocols were developed minimizing surface roughness to 0.1-0.4 mum for laser-based sectioning. The FSLSS technique was used to section and 3D reconstruct titanium nitride (TiN) containing 4330 steel. Statistical analysis of 3D TiN particle sizes, distribution parameters, and particle density were measured. A methodology was developed to use the 3D datasets to produce statistical volume elements (SVEs) for toughness modeling. Six FSLSS TiN datasets were sub-sampled into 48 SVEs for statistical analysis and toughness modeling using the Rice-Tracey and Garrison-Moody models. A two-parameter Weibull analysis was performed and variability in the toughness data agreed well with Ruggieri et al. bulk toughness measurements. The Tri-Beam system combines the benefits of laser based material removal (speed, low-damage, automated) with detectors that collect chemical, structural, and topological information. Multi-modal sectioning information was collected after many laser scanning passes demonstrating the capability of the Tri-Beam system.

  4. Automatic cerebrospinal fluid segmentation in non-contrast CT images using a 3D convolutional network

    NASA Astrophysics Data System (ADS)

    Patel, Ajay; van de Leemput, Sil C.; Prokop, Mathias; van Ginneken, Bram; Manniesing, Rashindra

    2017-03-01

    Segmentation of anatomical structures is fundamental in the development of computer aided diagnosis systems for cerebral pathologies. Manual annotations are laborious, time consuming and subject to human error and observer variability. Accurate quantification of cerebrospinal fluid (CSF) can be employed as a morphometric measure for diagnosis and patient outcome prediction. However, segmenting CSF in non-contrast CT images is complicated by low soft tissue contrast and image noise. In this paper we propose a state-of-the-art method using a multi-scale three-dimensional (3D) fully convolutional neural network (CNN) to automatically segment all CSF within the cranial cavity. The method is trained on a small dataset comprised of four manually annotated cerebral CT images. Quantitative evaluation of a separate test dataset of four images shows a mean Dice similarity coefficient of 0.87 +/- 0.01 and mean absolute volume difference of 4.77 +/- 2.70 %. The average prediction time was 68 seconds. Our method allows for fast and fully automated 3D segmentation of cerebral CSF in non-contrast CT, and shows promising results despite a limited amount of training data.

  5. Progress on Platforms, Sensors and Applications with Unmanned Aerial Vehicles in soil science and geomorphology

    NASA Astrophysics Data System (ADS)

    Anders, Niels; Suomalainen, Juha; Seeger, Manuel; Keesstra, Saskia; Bartholomeus, Harm; Paron, Paolo

    2014-05-01

    The recent increase of performance and endurance of electronically controlled flying platforms, such as multi-copters and fixed-wing airplanes, and decreasing size and weight of different sensors and batteries leads to increasing popularity of Unmanned Aerial Systems (UAS) for scientific purposes. Modern workflows that implement UAS include guided flight plan generation, 3D GPS navigation for fully automated piloting, and automated processing with new techniques such as "Structure from Motion" photogrammetry. UAS are often equipped with normal RGB cameras, multi- and hyperspectral sensors, radar, or other sensors, and provide a cheap and flexible solution for creating multi-temporal data sets. UAS revolutionized multi-temporal research allowing new applications related to change analysis and process monitoring. The EGU General Assembly 2014 is hosting a session on platforms, sensors and applications with UAS in soil science and geomorphology. This presentation briefly summarizes the outcome of this session, addressing the current state and future challenges of small-platform data acquisition in soil science and geomorphology.

  6. Cell nuclei segmentation in fluorescence microscopy images using inter- and intra-region discriminative information.

    PubMed

    Song, Yang; Cai, Weidong; Feng, David Dagan; Chen, Mei

    2013-01-01

    Automated segmentation of cell nuclei in microscopic images is critical to high throughput analysis of the ever increasing amount of data. Although cell nuclei are generally visually distinguishable for human, automated segmentation faces challenges when there is significant intensity inhomogeneity among cell nuclei or in the background. In this paper, we propose an effective method for automated cell nucleus segmentation using a three-step approach. It first obtains an initial segmentation by extracting salient regions in the image, then reduces false positives using inter-region feature discrimination, and finally refines the boundary of the cell nuclei using intra-region contrast information. This method has been evaluated on two publicly available datasets of fluorescence microscopic images with 4009 cells, and has achieved superior performance compared to popular state of the art methods using established metrics.

  7. Statistical link between external climate forcings and modes of ocean variability

    NASA Astrophysics Data System (ADS)

    Malik, Abdul; Brönnimann, Stefan; Perona, Paolo

    2017-07-01

    In this study we investigate statistical link between external climate forcings and modes of ocean variability on inter-annual (3-year) to centennial (100-year) timescales using de-trended semi-partial-cross-correlation analysis technique. To investigate this link we employ observations (AD 1854-1999), climate proxies (AD 1600-1999), and coupled Atmosphere-Ocean-Chemistry Climate Model simulations with SOCOL-MPIOM (AD 1600-1999). We find robust statistical evidence that Atlantic multi-decadal oscillation (AMO) has intrinsic positive correlation with solar activity in all datasets employed. The strength of the relationship between AMO and solar activity is modulated by volcanic eruptions and complex interaction among modes of ocean variability. The observational dataset reveals that El Niño southern oscillation (ENSO) has statistically significant negative intrinsic correlation with solar activity on decadal to multi-decadal timescales (16-27-year) whereas there is no evidence of a link on a typical ENSO timescale (2-7-year). In the observational dataset, the volcanic eruptions do not have a link with AMO on a typical AMO timescale (55-80-year) however the long-term datasets (proxies and SOCOL-MPIOM output) show that volcanic eruptions have intrinsic negative correlation with AMO on inter-annual to multi-decadal timescales. The Pacific decadal oscillation has no link with solar activity, however, it has positive intrinsic correlation with volcanic eruptions on multi-decadal timescales (47-54-year) in reconstruction and decadal to multi-decadal timescales (16-32-year) in climate model simulations. We also find evidence of a link between volcanic eruptions and ENSO, however, the sign of relationship is not consistent between observations/proxies and climate model simulations.

  8. Automated video feature extraction : workshop summary report October 10-11 2012.

    DOT National Transportation Integrated Search

    2012-12-01

    This report summarizes a 2-day workshop on automated video feature extraction. Discussion focused on the Naturalistic Driving : Study, funded by the second Strategic Highway Research Program, and also involved the companion roadway inventory dataset....

  9. I'll take that to go: Big data bags and minimal identifiers for exchange of large, complex datasets

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Chard, Kyle; D'Arcy, Mike; Heavner, Benjamin D.

    Big data workflows often require the assembly and exchange of complex, multi-element datasets. For example, in biomedical applications, the input to an analytic pipeline can be a dataset consisting thousands of images and genome sequences assembled from diverse repositories, requiring a description of the contents of the dataset in a concise and unambiguous form. Typical approaches to creating datasets for big data workflows assume that all data reside in a single location, requiring costly data marshaling and permitting errors of omission and commission because dataset members are not explicitly specified. We address these issues by proposing simple methods and toolsmore » for assembling, sharing, and analyzing large and complex datasets that scientists can easily integrate into their daily workflows. These tools combine a simple and robust method for describing data collections (BDBags), data descriptions (Research Objects), and simple persistent identifiers (Minids) to create a powerful ecosystem of tools and services for big data analysis and sharing. We present these tools and use biomedical case studies to illustrate their use for the rapid assembly, sharing, and analysis of large datasets.« less

  10. Empirical study of seven data mining algorithms on different characteristics of datasets for biomedical classification applications.

    PubMed

    Zhang, Yiyan; Xin, Yi; Li, Qin; Ma, Jianshe; Li, Shuai; Lv, Xiaodan; Lv, Weiqi

    2017-11-02

    Various kinds of data mining algorithms are continuously raised with the development of related disciplines. The applicable scopes and their performances of these algorithms are different. Hence, finding a suitable algorithm for a dataset is becoming an important emphasis for biomedical researchers to solve practical problems promptly. In this paper, seven kinds of sophisticated active algorithms, namely, C4.5, support vector machine, AdaBoost, k-nearest neighbor, naïve Bayes, random forest, and logistic regression, were selected as the research objects. The seven algorithms were applied to the 12 top-click UCI public datasets with the task of classification, and their performances were compared through induction and analysis. The sample size, number of attributes, number of missing values, and the sample size of each class, correlation coefficients between variables, class entropy of task variable, and the ratio of the sample size of the largest class to the least class were calculated to character the 12 research datasets. The two ensemble algorithms reach high accuracy of classification on most datasets. Moreover, random forest performs better than AdaBoost on the unbalanced dataset of the multi-class task. Simple algorithms, such as the naïve Bayes and logistic regression model are suitable for a small dataset with high correlation between the task and other non-task attribute variables. K-nearest neighbor and C4.5 decision tree algorithms perform well on binary- and multi-class task datasets. Support vector machine is more adept on the balanced small dataset of the binary-class task. No algorithm can maintain the best performance in all datasets. The applicability of the seven data mining algorithms on the datasets with different characteristics was summarized to provide a reference for biomedical researchers or beginners in different fields.

  11. Effective Materials Property Information Management for the 21st Century

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ren, Weiju; Cebon, David; Barabash, Oleg M

    2011-01-01

    This paper discusses key principles for the development of materials property information management software systems. There are growing needs for automated materials information management in various organizations. In part these are fuelled by the demands for higher efficiency in material testing, product design and engineering analysis. But equally important, organizations are being driven by the needs for consistency, quality and traceability of data, as well as control of access to proprietary or sensitive information. Further, the use of increasingly sophisticated nonlinear, anisotropic and multi-scale engineering analyses requires both processing of large volumes of test data for development of constitutive modelsmore » and complex materials data input for Computer-Aided Engineering (CAE) software. And finally, the globalization of economy often generates great needs for sharing a single gold source of materials information between members of global engineering teams in extended supply-chains. Fortunately material property management systems have kept pace with the growing user demands and evolved to versatile data management systems that can be customized to specific user needs. The more sophisticated of these provide facilities for: (i) data management functions such as access, version, and quality controls; (ii) a wide range of data import, export and analysis capabilities; (iii) data pedigree traceability mechanisms; (iv) data searching, reporting and viewing tools; and (v) access to the information via a wide range of interfaces. In this paper the important requirements for advanced material data management systems, future challenges and opportunities such as automated error checking, data quality characterization, identification of gaps in datasets, as well as functionalities and business models to fuel database growth and maintenance are discussed.« less

  12. Automated image-based phenotypic analysis in zebrafish embryos

    PubMed Central

    Vogt, Andreas; Cholewinski, Andrzej; Shen, Xiaoqiang; Nelson, Scott; Lazo, John S.; Tsang, Michael; Hukriede, Neil A.

    2009-01-01

    Presently, the zebrafish is the only vertebrate model compatible with contemporary paradigms of drug discovery. Zebrafish embryos are amenable to automation necessary for high-throughput chemical screens, and optical transparency makes them potentially suited for image-based screening. However, the lack of tools for automated analysis of complex images presents an obstacle to utilizing the zebrafish as a high-throughput screening model. We have developed an automated system for imaging and analyzing zebrafish embryos in multi-well plates regardless of embryo orientation and without user intervention. Images of fluorescent embryos were acquired on a high-content reader and analyzed using an artificial intelligence-based image analysis method termed Cognition Network Technology (CNT). CNT reliably detected transgenic fluorescent embryos (Tg(fli1:EGFP)y1) arrayed in 96-well plates and quantified intersegmental blood vessel development in embryos treated with small molecule inhibitors of anigiogenesis. The results demonstrate it is feasible to adapt image-based high-content screening methodology to measure complex whole organism phenotypes. PMID:19235725

  13. Development of an Uncertainty Quantification Predictive Chemical Reaction Model for Syngas Combustion

    DOE PAGES

    Slavinskaya, N. A.; Abbasi, M.; Starcke, J. H.; ...

    2017-01-24

    An automated data-centric infrastructure, Process Informatics Model (PrIMe), was applied to validation and optimization of a syngas combustion model. The Bound-to-Bound Data Collaboration (B2BDC) module of PrIMe was employed to discover the limits of parameter modifications based on uncertainty quantification (UQ) and consistency analysis of the model–data system and experimental data, including shock-tube ignition delay times and laminar flame speeds. Existing syngas reaction models are reviewed, and the selected kinetic data are described in detail. Empirical rules were developed and applied to evaluate the uncertainty bounds of the literature experimental data. Here, the initial H 2/CO reaction model, assembled frommore » 73 reactions and 17 species, was subjected to a B2BDC analysis. For this purpose, a dataset was constructed that included a total of 167 experimental targets and 55 active model parameters. Consistency analysis of the composed dataset revealed disagreement between models and data. Further analysis suggested that removing 45 experimental targets, 8 of which were self-inconsistent, would lead to a consistent dataset. This dataset was subjected to a correlation analysis, which highlights possible directions for parameter modification and model improvement. Additionally, several methods of parameter optimization were applied, some of them unique to the B2BDC framework. The optimized models demonstrated improved agreement with experiments compared to the initially assembled model, and their predictions for experiments not included in the initial dataset (i.e., a blind prediction) were investigated. The results demonstrate benefits of applying the B2BDC methodology for developing predictive kinetic models.« less

  14. Development of an Uncertainty Quantification Predictive Chemical Reaction Model for Syngas Combustion

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Slavinskaya, N. A.; Abbasi, M.; Starcke, J. H.

    An automated data-centric infrastructure, Process Informatics Model (PrIMe), was applied to validation and optimization of a syngas combustion model. The Bound-to-Bound Data Collaboration (B2BDC) module of PrIMe was employed to discover the limits of parameter modifications based on uncertainty quantification (UQ) and consistency analysis of the model–data system and experimental data, including shock-tube ignition delay times and laminar flame speeds. Existing syngas reaction models are reviewed, and the selected kinetic data are described in detail. Empirical rules were developed and applied to evaluate the uncertainty bounds of the literature experimental data. Here, the initial H 2/CO reaction model, assembled frommore » 73 reactions and 17 species, was subjected to a B2BDC analysis. For this purpose, a dataset was constructed that included a total of 167 experimental targets and 55 active model parameters. Consistency analysis of the composed dataset revealed disagreement between models and data. Further analysis suggested that removing 45 experimental targets, 8 of which were self-inconsistent, would lead to a consistent dataset. This dataset was subjected to a correlation analysis, which highlights possible directions for parameter modification and model improvement. Additionally, several methods of parameter optimization were applied, some of them unique to the B2BDC framework. The optimized models demonstrated improved agreement with experiments compared to the initially assembled model, and their predictions for experiments not included in the initial dataset (i.e., a blind prediction) were investigated. The results demonstrate benefits of applying the B2BDC methodology for developing predictive kinetic models.« less

  15. An Automatic Multi-Target Independent Analysis Framework for Non-Planar Infrared-Visible Registration.

    PubMed

    Sun, Xinglong; Xu, Tingfa; Zhang, Jizhou; Zhao, Zishu; Li, Yuankun

    2017-07-26

    In this paper, we propose a novel automatic multi-target registration framework for non-planar infrared-visible videos. Previous approaches usually analyzed multiple targets together and then estimated a global homography for the whole scene, however, these cannot achieve precise multi-target registration when the scenes are non-planar. Our framework is devoted to solving the problem using feature matching and multi-target tracking. The key idea is to analyze and register each target independently. We present a fast and robust feature matching strategy, where only the features on the corresponding foreground pairs are matched. Besides, new reservoirs based on the Gaussian criterion are created for all targets, and a multi-target tracking method is adopted to determine the relationships between the reservoirs and foreground blobs. With the matches in the corresponding reservoir, the homography of each target is computed according to its moving state. We tested our framework on both public near-planar and non-planar datasets. The results demonstrate that the proposed framework outperforms the state-of-the-art global registration method and the manual global registration matrix in all tested datasets.

  16. Issues and Solutions for Bringing Heterogeneous Water Cycle Data Sets Together

    NASA Technical Reports Server (NTRS)

    Acker, James; Kempler, Steven; Teng, William; Belvedere, Deborah; Liu, Zhong; Leptoukh, Gregory

    2010-01-01

    The water cycle research community has generated many regional to global scale products using data from individual NASA missions or sensors (e.g., TRMM, AMSR-E); multiple ground- and space-based data sources (e.g., Global Precipitation Climatology Project [GPCP] products); and sophisticated data assimilation systems (e.g., Land Data Assimilation Systems [LDAS]). However, it is often difficult to access, explore, merge, analyze, and inter-compare these data in a coherent manner due to issues of data resolution, format, and structure. These difficulties were substantiated at the recent Collaborative Energy and Water Cycle Information Services (CEWIS) Workshop, where members of the NASA Energy and Water cycle Study (NEWS) community gave presentations, provided feedback, and developed scenarios which illustrated the difficulties and techniques for bringing together heterogeneous datasets. This presentation reports on the findings of the workshop, thus defining the problems and challenges of multi-dataset research. In addition, the CEWIS prototype shown at the workshop will be presented to illustrate new technologies that can mitigate data access roadblocks encountered in multi-dataset research, including: (1) Quick and easy search and access of selected NEWS data sets. (2) Multi-parameter data subsetting, manipulation, analysis, and display tools. (3) Access to input and derived water cycle data (data lineage). It is hoped that this presentation will encourage community discussion and feedback on heterogeneous data analysis scenarios, issues, and remedies.

  17. Regional Landslide Mapping Aided by Automated Classification of SqueeSAR™ Time Series (Northern Apennines, Italy)

    NASA Astrophysics Data System (ADS)

    Iannacone, J.; Berti, M.; Allievi, J.; Del Conte, S.; Corsini, A.

    2013-12-01

    Space borne InSAR has proven to be very valuable for landslides detection. In particular, extremely slow landslides (Cruden and Varnes, 1996) can be now clearly identified, thanks to the millimetric precision reached by recent multi-interferometric algorithms. The typical approach in radar interpretation for landslides mapping is based on average annual velocity of the deformation which is calculated over the entire times series. The Hotspot and Cluster Analysis (Lu et al., 2012) and the PSI-based matrix approach (Cigna et al., 2013) are examples of landslides mapping techniques based on average annual velocities. However, slope movements can be affected by non-linear deformation trends, (i.e. reactivation of dormant landslides, deceleration due to natural or man-made slope stabilization, seasonal activity, etc). Therefore, analyzing deformation time series is crucial in order to fully characterize slope dynamics. While this is relatively simple to be carried out manually when dealing with small dataset, the time series analysis over regional scale dataset requires automated classification procedures. Berti et al. (2013) developed an automatic procedure for the analysis of InSAR time series based on a sequence of statistical tests. The analysis allows to classify the time series into six distinctive target trends (0=uncorrelated; 1=linear; 2=quadratic; 3=bilinear; 4=discontinuous without constant velocity; 5=discontinuous with change in velocity) which are likely to represent different slope processes. The analysis also provides a series of descriptive parameters which can be used to characterize the temporal changes of ground motion. All the classification algorithms were integrated into a Graphical User Interface called PSTime. We investigated an area of about 2000 km2 in the Northern Apennines of Italy by using SqueeSAR™ algorithm (Ferretti et al., 2011). Two Radarsat-1 data stack, comprising of 112 scenes in descending orbit and 124 scenes in ascending orbit, were processed. The time coverage lasts from April 2003 to November 2012, with an average temporal frequency of 1 scene/month. Radar interpretation has been carried out by considering average annual velocities as well as acceleration/deceleration trends evidenced by PSTime. Altogether, from ascending and descending geometries respectively, this approach allowed detecting of 115 and 112 potential landslides on the basis of average displacement rate and 77 and 79 landslides on the basis of acceleration trends. In conclusion, time series analysis resulted to be very valuable for landslide mapping. In particular it highlighted areas with marked acceleration in a specific period in time while still being affected by low average annual velocity over the entire analysis period. On the other hand, even in areas with high average annual velocity, time series analysis was of primary importance to characterize the slope dynamics in terms of acceleration events.

  18. BaTMAn: Bayesian Technique for Multi-image Analysis

    NASA Astrophysics Data System (ADS)

    Casado, J.; Ascasibar, Y.; García-Benito, R.; Guidi, G.; Choudhury, O. S.; Bellocchi, E.; Sánchez, S. F.; Díaz, A. I.

    2016-12-01

    Bayesian Technique for Multi-image Analysis (BaTMAn) characterizes any astronomical dataset containing spatial information and performs a tessellation based on the measurements and errors provided as input. The algorithm iteratively merges spatial elements as long as they are statistically consistent with carrying the same information (i.e. identical signal within the errors). The output segmentations successfully adapt to the underlying spatial structure, regardless of its morphology and/or the statistical properties of the noise. BaTMAn identifies (and keeps) all the statistically-significant information contained in the input multi-image (e.g. an IFS datacube). The main aim of the algorithm is to characterize spatially-resolved data prior to their analysis.

  19. The connectome viewer toolkit: an open source framework to manage, analyze, and visualize connectomes.

    PubMed

    Gerhard, Stephan; Daducci, Alessandro; Lemkaddem, Alia; Meuli, Reto; Thiran, Jean-Philippe; Hagmann, Patric

    2011-01-01

    Advanced neuroinformatics tools are required for methods of connectome mapping, analysis, and visualization. The inherent multi-modality of connectome datasets poses new challenges for data organization, integration, and sharing. We have designed and implemented the Connectome Viewer Toolkit - a set of free and extensible open source neuroimaging tools written in Python. The key components of the toolkit are as follows: (1) The Connectome File Format is an XML-based container format to standardize multi-modal data integration and structured metadata annotation. (2) The Connectome File Format Library enables management and sharing of connectome files. (3) The Connectome Viewer is an integrated research and development environment for visualization and analysis of multi-modal connectome data. The Connectome Viewer's plugin architecture supports extensions with network analysis packages and an interactive scripting shell, to enable easy development and community contributions. Integration with tools from the scientific Python community allows the leveraging of numerous existing libraries for powerful connectome data mining, exploration, and comparison. We demonstrate the applicability of the Connectome Viewer Toolkit using Diffusion MRI datasets processed by the Connectome Mapper. The Connectome Viewer Toolkit is available from http://www.cmtk.org/

  20. The Connectome Viewer Toolkit: An Open Source Framework to Manage, Analyze, and Visualize Connectomes

    PubMed Central

    Gerhard, Stephan; Daducci, Alessandro; Lemkaddem, Alia; Meuli, Reto; Thiran, Jean-Philippe; Hagmann, Patric

    2011-01-01

    Advanced neuroinformatics tools are required for methods of connectome mapping, analysis, and visualization. The inherent multi-modality of connectome datasets poses new challenges for data organization, integration, and sharing. We have designed and implemented the Connectome Viewer Toolkit – a set of free and extensible open source neuroimaging tools written in Python. The key components of the toolkit are as follows: (1) The Connectome File Format is an XML-based container format to standardize multi-modal data integration and structured metadata annotation. (2) The Connectome File Format Library enables management and sharing of connectome files. (3) The Connectome Viewer is an integrated research and development environment for visualization and analysis of multi-modal connectome data. The Connectome Viewer's plugin architecture supports extensions with network analysis packages and an interactive scripting shell, to enable easy development and community contributions. Integration with tools from the scientific Python community allows the leveraging of numerous existing libraries for powerful connectome data mining, exploration, and comparison. We demonstrate the applicability of the Connectome Viewer Toolkit using Diffusion MRI datasets processed by the Connectome Mapper. The Connectome Viewer Toolkit is available from http://www.cmtk.org/ PMID:21713110

  1. MVP-CA Methodology for the Expert System Advocate's Advisor (ESAA)

    DOT National Transportation Integrated Search

    1997-11-01

    The Multi-Viewpoint Clustering Analysis (MVP-CA) tool is a semi-automated tool to provide a valuable aid for comprehension, verification, validation, maintenance, integration, and evolution of complex knowledge-based software systems. In this report,...

  2. Automation system for measurement of gamma-ray spectra of induced activity for multi-element high volume neutron activation analysis at the reactor IBR-2 of Frank Laboratory of Neutron Physics at the joint institute for nuclear research

    NASA Astrophysics Data System (ADS)

    Pavlov, S. S.; Dmitriev, A. Yu.; Chepurchenko, I. A.; Frontasyeva, M. V.

    2014-11-01

    The automation system for measurement of induced activity of gamma-ray spectra for multi-element high volume neutron activation analysis (NAA) was designed, developed and implemented at the reactor IBR-2 at the Frank Laboratory of Neutron Physics. The system consists of three devices of automatic sample changers for three Canberra HPGe detector-based gamma spectrometry systems. Each sample changer consists of two-axis of linear positioning module M202A by DriveSet company and disk with 45 slots for containers with samples. Control of automatic sample changer is performed by the Xemo S360U controller by Systec company. Positioning accuracy can reach 0.1 mm. Special software performs automatic changing of samples and measurement of gamma spectra at constant interaction with the NAA database.

  3. [Time consumption and quality of an automated fusion tool for SPECT and MRI images of the brain].

    PubMed

    Fiedler, E; Platsch, G; Schwarz, A; Schmiedehausen, K; Tomandl, B; Huk, W; Rupprecht, Th; Rahn, N; Kuwert, T

    2003-10-01

    Although the fusion of images from different modalities may improve diagnostic accuracy, it is rarely used in clinical routine work due to logistic problems. Therefore we evaluated performance and time needed for fusing MRI and SPECT images using a semiautomated dedicated software. PATIENTS, MATERIAL AND METHOD: In 32 patients regional cerebral blood flow was measured using (99m)Tc ethylcystein dimer (ECD) and the three-headed SPECT camera MultiSPECT 3. MRI scans of the brain were performed using either a 0,2 T Open or a 1,5 T Sonata. Twelve of the MRI data sets were acquired using a 3D-T1w MPRAGE sequence, 20 with a 2D acquisition technique and different echo sequences. Image fusion was performed on a Syngo workstation using an entropy minimizing algorithm by an experienced user of the software. The fusion results were classified. We measured the time needed for the automated fusion procedure and in case of need that for manual realignment after automated, but insufficient fusion. The mean time of the automated fusion procedure was 123 s. It was for the 2D significantly shorter than for the 3D MRI datasets. For four of the 2D data sets and two of the 3D data sets an optimal fit was reached using the automated approach. The remaining 26 data sets required manual correction. The sum of the time required for automated fusion and that needed for manual correction averaged 320 s (50-886 s). The fusion of 3D MRI data sets lasted significantly longer than that of the 2D MRI data. The automated fusion tool delivered in 20% an optimal fit, in 80% manual correction was necessary. Nevertheless, each of the 32 SPECT data sets could be merged in less than 15 min with the corresponding MRI data, which seems acceptable for clinical routine use.

  4. Nephele: a cloud platform for simplified, standardized and reproducible microbiome data analysis.

    PubMed

    Weber, Nick; Liou, David; Dommer, Jennifer; MacMenamin, Philip; Quiñones, Mariam; Misner, Ian; Oler, Andrew J; Wan, Joe; Kim, Lewis; Coakley McCarthy, Meghan; Ezeji, Samuel; Noble, Karlynn; Hurt, Darrell E

    2018-04-15

    Widespread interest in the study of the microbiome has resulted in data proliferation and the development of powerful computational tools. However, many scientific researchers lack the time, training, or infrastructure to work with large datasets or to install and use command line tools. The National Institute of Allergy and Infectious Diseases (NIAID) has created Nephele, a cloud-based microbiome data analysis platform with standardized pipelines and a simple web interface for transforming raw data into biological insights. Nephele integrates common microbiome analysis tools as well as valuable reference datasets like the healthy human subjects cohort of the Human Microbiome Project (HMP). Nephele is built on the Amazon Web Services cloud, which provides centralized and automated storage and compute capacity, thereby reducing the burden on researchers and their institutions. https://nephele.niaid.nih.gov and https://github.com/niaid/Nephele. darrell.hurt@nih.gov.

  5. Nephele: a cloud platform for simplified, standardized and reproducible microbiome data analysis

    PubMed Central

    Weber, Nick; Liou, David; Dommer, Jennifer; MacMenamin, Philip; Quiñones, Mariam; Misner, Ian; Oler, Andrew J; Wan, Joe; Kim, Lewis; Coakley McCarthy, Meghan; Ezeji, Samuel; Noble, Karlynn; Hurt, Darrell E

    2018-01-01

    Abstract Motivation Widespread interest in the study of the microbiome has resulted in data proliferation and the development of powerful computational tools. However, many scientific researchers lack the time, training, or infrastructure to work with large datasets or to install and use command line tools. Results The National Institute of Allergy and Infectious Diseases (NIAID) has created Nephele, a cloud-based microbiome data analysis platform with standardized pipelines and a simple web interface for transforming raw data into biological insights. Nephele integrates common microbiome analysis tools as well as valuable reference datasets like the healthy human subjects cohort of the Human Microbiome Project (HMP). Nephele is built on the Amazon Web Services cloud, which provides centralized and automated storage and compute capacity, thereby reducing the burden on researchers and their institutions. Availability and implementation https://nephele.niaid.nih.gov and https://github.com/niaid/Nephele Contact darrell.hurt@nih.gov PMID:29028892

  6. Automatic identification of variables in epidemiological datasets using logic regression.

    PubMed

    Lorenz, Matthias W; Abdi, Negin Ashtiani; Scheckenbach, Frank; Pflug, Anja; Bülbül, Alpaslan; Catapano, Alberico L; Agewall, Stefan; Ezhov, Marat; Bots, Michiel L; Kiechl, Stefan; Orth, Andreas

    2017-04-13

    For an individual participant data (IPD) meta-analysis, multiple datasets must be transformed in a consistent format, e.g. using uniform variable names. When large numbers of datasets have to be processed, this can be a time-consuming and error-prone task. Automated or semi-automated identification of variables can help to reduce the workload and improve the data quality. For semi-automation high sensitivity in the recognition of matching variables is particularly important, because it allows creating software which for a target variable presents a choice of source variables, from which a user can choose the matching one, with only low risk of having missed a correct source variable. For each variable in a set of target variables, a number of simple rules were manually created. With logic regression, an optimal Boolean combination of these rules was searched for every target variable, using a random subset of a large database of epidemiological and clinical cohort data (construction subset). In a second subset of this database (validation subset), this optimal combination rules were validated. In the construction sample, 41 target variables were allocated on average with a positive predictive value (PPV) of 34%, and a negative predictive value (NPV) of 95%. In the validation sample, PPV was 33%, whereas NPV remained at 94%. In the construction sample, PPV was 50% or less in 63% of all variables, in the validation sample in 71% of all variables. We demonstrated that the application of logic regression in a complex data management task in large epidemiological IPD meta-analyses is feasible. However, the performance of the algorithm is poor, which may require backup strategies.

  7. Computerized detection of breast cancer on automated breast ultrasound imaging of women with dense breasts

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Drukker, Karen, E-mail: kdrukker@uchicago.edu; Sennett, Charlene A.; Giger, Maryellen L.

    2014-01-15

    Purpose: Develop a computer-aided detection method and investigate its feasibility for detection of breast cancer in automated 3D ultrasound images of women with dense breasts. Methods: The HIPAA compliant study involved a dataset of volumetric ultrasound image data, “views,” acquired with an automated U-Systems Somo•V{sup ®} ABUS system for 185 asymptomatic women with dense breasts (BI-RADS Composition/Density 3 or 4). For each patient, three whole-breast views (3D image volumes) per breast were acquired. A total of 52 patients had breast cancer (61 cancers), diagnosed through any follow-up at most 365 days after the original screening mammogram. Thirty-one of these patientsmore » (32 cancers) had a screening-mammogram with a clinically assigned BI-RADS Assessment Category 1 or 2, i.e., were mammographically negative. All software used for analysis was developed in-house and involved 3 steps: (1) detection of initial tumor candidates, (2) characterization of candidates, and (3) elimination of false-positive candidates. Performance was assessed by calculating the cancer detection sensitivity as a function of the number of “marks” (detections) per view. Results: At a single mark per view, i.e., six marks per patient, the median detection sensitivity by cancer was 50.0% (16/32) ± 6% for patients with a screening mammogram-assigned BI-RADS category 1 or 2—similar to radiologists’ performance sensitivity (49.9%) for this dataset from a prior reader study—and 45.9% (28/61) ± 4% for all patients. Conclusions: Promising detection sensitivity was obtained for the computer on a 3D ultrasound dataset of women with dense breasts at a rate of false-positive detections that may be acceptable for clinical implementation.« less

  8. Scalable Visual Analytics of Massive Textual Datasets

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Krishnan, Manoj Kumar; Bohn, Shawn J.; Cowley, Wendy E.

    2007-04-01

    This paper describes the first scalable implementation of text processing engine used in Visual Analytics tools. These tools aid information analysts in interacting with and understanding large textual information content through visual interfaces. By developing parallel implementation of the text processing engine, we enabled visual analytics tools to exploit cluster architectures and handle massive dataset. The paper describes key elements of our parallelization approach and demonstrates virtually linear scaling when processing multi-gigabyte data sets such as Pubmed. This approach enables interactive analysis of large datasets beyond capabilities of existing state-of-the art visual analytics tools.

  9. Retinal fundus images for glaucoma analysis: the RIGA dataset

    NASA Astrophysics Data System (ADS)

    Almazroa, Ahmed; Alodhayb, Sami; Osman, Essameldin; Ramadan, Eslam; Hummadi, Mohammed; Dlaim, Mohammed; Alkatee, Muhannad; Raahemifar, Kaamran; Lakshminarayanan, Vasudevan

    2018-03-01

    Glaucoma neuropathy is a major cause of irreversible blindness worldwide. Current models of chronic care will not be able to close the gap of growing prevalence of glaucoma and challenges for access to healthcare services. Teleophthalmology is being developed to close this gap. In order to develop automated techniques for glaucoma detection which can be used in tele-ophthalmology we have developed a large retinal fundus dataset. A de-identified dataset of retinal fundus images for glaucoma analysis (RIGA) was derived from three sources for a total of 750 images. The optic cup and disc boundaries for each image was marked and annotated manually by six experienced ophthalmologists and included the cup to disc (CDR) estimates. Six parameters were extracted and assessed (the disc area and centroid, cup area and centroid, horizontal and vertical cup to disc ratios) among the ophthalmologists. The inter-observer annotations were compared by calculating the standard deviation (SD) for every image between the six ophthalmologists in order to determine if the outliers amongst the six and was used to filter the corresponding images. The data set will be made available to the research community in order to crowd source other analysis from other research groups in order to develop, validate and implement analysis algorithms appropriate for tele-glaucoma assessment. The RIGA dataset can be freely accessed online through University of Michigan, Deep Blue website (doi:10.7302/Z23R0R29).

  10. SPICE: exploration and analysis of post-cytometric complex multivariate datasets.

    PubMed

    Roederer, Mario; Nozzi, Joshua L; Nason, Martha C

    2011-02-01

    Polychromatic flow cytometry results in complex, multivariate datasets. To date, tools for the aggregate analysis of these datasets across multiple specimens grouped by different categorical variables, such as demographic information, have not been optimized. Often, the exploration of such datasets is accomplished by visualization of patterns with pie charts or bar charts, without easy access to statistical comparisons of measurements that comprise multiple components. Here we report on algorithms and a graphical interface we developed for these purposes. In particular, we discuss thresholding necessary for accurate representation of data in pie charts, the implications for display and comparison of normalized versus unnormalized data, and the effects of averaging when samples with significant background noise are present. Finally, we define a statistic for the nonparametric comparison of complex distributions to test for difference between groups of samples based on multi-component measurements. While originally developed to support the analysis of T cell functional profiles, these techniques are amenable to a broad range of datatypes. Published 2011 Wiley-Liss, Inc.

  11. Automated retinofugal visual pathway reconstruction with multi-shell HARDI and FOD-based analysis.

    PubMed

    Kammen, Alexandra; Law, Meng; Tjan, Bosco S; Toga, Arthur W; Shi, Yonggang

    2016-01-15

    Diffusion MRI tractography provides a non-invasive modality to examine the human retinofugal projection, which consists of the optic nerves, optic chiasm, optic tracts, the lateral geniculate nuclei (LGN) and the optic radiations. However, the pathway has several anatomic features that make it particularly challenging to study with tractography, including its location near blood vessels and bone-air interface at the base of the cerebrum, crossing fibers at the chiasm, somewhat-tortuous course around the temporal horn via Meyer's Loop, and multiple closely neighboring fiber bundles. To date, these unique complexities of the visual pathway have impeded the development of a robust and automated reconstruction method using tractography. To overcome these challenges, we develop a novel, fully automated system to reconstruct the retinofugal visual pathway from high-resolution diffusion imaging data. Using multi-shell, high angular resolution diffusion imaging (HARDI) data, we reconstruct precise fiber orientation distributions (FODs) with high order spherical harmonics (SPHARM) to resolve fiber crossings, which allows the tractography algorithm to successfully navigate the complicated anatomy surrounding the retinofugal pathway. We also develop automated algorithms for the identification of ROIs used for fiber bundle reconstruction. In particular, we develop a novel approach to extract the LGN region of interest (ROI) based on intrinsic shape analysis of a fiber bundle computed from a seed region at the optic chiasm to a target at the primary visual cortex. By combining automatically identified ROIs and FOD-based tractography, we obtain a fully automated system to compute the main components of the retinofugal pathway, including the optic tract and the optic radiation. We apply our method to the multi-shell HARDI data of 215 subjects from the Human Connectome Project (HCP). Through comparisons with post-mortem dissection measurements, we demonstrate the retinotopic organization of the optic radiation including a successful reconstruction of Meyer's loop. Then, using the reconstructed optic radiation bundle from the HCP cohort, we construct a probabilistic atlas and demonstrate its consistency with a post-mortem atlas. Finally, we generate a shape-based representation of the optic radiation for morphometry analysis. Copyright © 2015 Elsevier Inc. All rights reserved.

  12. Recognition of Damaged Arrow-Road Markings by Visible Light Camera Sensor Based on Convolutional Neural Network.

    PubMed

    Vokhidov, Husan; Hong, Hyung Gil; Kang, Jin Kyu; Hoang, Toan Minh; Park, Kang Ryoung

    2016-12-16

    Automobile driver information as displayed on marked road signs indicates the state of the road, traffic conditions, proximity to schools, etc. These signs are important to insure the safety of the driver and pedestrians. They are also important input to the automated advanced driver assistance system (ADAS), installed in many automobiles. Over time, the arrow-road markings may be eroded or otherwise damaged by automobile contact, making it difficult for the driver to correctly identify the marking. Failure to properly identify an arrow-road marker creates a dangerous situation that may result in traffic accidents or pedestrian injury. Very little research exists that studies the problem of automated identification of damaged arrow-road marking painted on the road. In this study, we propose a method that uses a convolutional neural network (CNN) to recognize six types of arrow-road markings, possibly damaged, by visible light camera sensor. Experimental results with six databases of Road marking dataset, KITTI dataset, Málaga dataset 2009, Málaga urban dataset, Naver street view dataset, and Road/Lane detection evaluation 2013 dataset, show that our method outperforms conventional methods.

  13. Recognition of Damaged Arrow-Road Markings by Visible Light Camera Sensor Based on Convolutional Neural Network

    PubMed Central

    Vokhidov, Husan; Hong, Hyung Gil; Kang, Jin Kyu; Hoang, Toan Minh; Park, Kang Ryoung

    2016-01-01

    Automobile driver information as displayed on marked road signs indicates the state of the road, traffic conditions, proximity to schools, etc. These signs are important to insure the safety of the driver and pedestrians. They are also important input to the automated advanced driver assistance system (ADAS), installed in many automobiles. Over time, the arrow-road markings may be eroded or otherwise damaged by automobile contact, making it difficult for the driver to correctly identify the marking. Failure to properly identify an arrow-road marker creates a dangerous situation that may result in traffic accidents or pedestrian injury. Very little research exists that studies the problem of automated identification of damaged arrow-road marking painted on the road. In this study, we propose a method that uses a convolutional neural network (CNN) to recognize six types of arrow-road markings, possibly damaged, by visible light camera sensor. Experimental results with six databases of Road marking dataset, KITTI dataset, Málaga dataset 2009, Málaga urban dataset, Naver street view dataset, and Road/Lane detection evaluation 2013 dataset, show that our method outperforms conventional methods. PMID:27999301

  14. Optical Coherence Tomography in the UK Biobank Study - Rapid Automated Analysis of Retinal Thickness for Large Population-Based Studies.

    PubMed

    Keane, Pearse A; Grossi, Carlota M; Foster, Paul J; Yang, Qi; Reisman, Charles A; Chan, Kinpui; Peto, Tunde; Thomas, Dhanes; Patel, Praveen J

    2016-01-01

    To describe an approach to the use of optical coherence tomography (OCT) imaging in large, population-based studies, including methods for OCT image acquisition, storage, and the remote, rapid, automated analysis of retinal thickness. In UK Biobank, OCT images were acquired between 2009 and 2010 using a commercially available "spectral domain" OCT device (3D OCT-1000, Topcon). Images were obtained using a raster scan protocol, 6 mm x 6 mm in area, and consisting of 128 B-scans. OCT image sets were stored on UK Biobank servers in a central repository, adjacent to high performance computers. Rapid, automated analysis of retinal thickness was performed using custom image segmentation software developed by the Topcon Advanced Biomedical Imaging Laboratory (TABIL). This software employs dual-scale gradient information to allow for automated segmentation of nine intraretinal boundaries in a rapid fashion. 67,321 participants (134,642 eyes) in UK Biobank underwent OCT imaging of both eyes as part of the ocular module. 134,611 images were successfully processed with 31 images failing segmentation analysis due to corrupted OCT files or withdrawal of subject consent for UKBB study participation. Average time taken to call up an image from the database and complete segmentation analysis was approximately 120 seconds per data set per login, and analysis of the entire dataset was completed in approximately 28 days. We report an approach to the rapid, automated measurement of retinal thickness from nearly 140,000 OCT image sets from the UK Biobank. In the near future, these measurements will be publically available for utilization by researchers around the world, and thus for correlation with the wealth of other data collected in UK Biobank. The automated analysis approaches we describe may be of utility for future large population-based epidemiological studies, clinical trials, and screening programs that employ OCT imaging.

  15. Optical Coherence Tomography in the UK Biobank Study – Rapid Automated Analysis of Retinal Thickness for Large Population-Based Studies

    PubMed Central

    Grossi, Carlota M.; Foster, Paul J.; Yang, Qi; Reisman, Charles A.; Chan, Kinpui; Peto, Tunde; Thomas, Dhanes; Patel, Praveen J.

    2016-01-01

    Purpose To describe an approach to the use of optical coherence tomography (OCT) imaging in large, population-based studies, including methods for OCT image acquisition, storage, and the remote, rapid, automated analysis of retinal thickness. Methods In UK Biobank, OCT images were acquired between 2009 and 2010 using a commercially available “spectral domain” OCT device (3D OCT-1000, Topcon). Images were obtained using a raster scan protocol, 6 mm x 6 mm in area, and consisting of 128 B-scans. OCT image sets were stored on UK Biobank servers in a central repository, adjacent to high performance computers. Rapid, automated analysis of retinal thickness was performed using custom image segmentation software developed by the Topcon Advanced Biomedical Imaging Laboratory (TABIL). This software employs dual-scale gradient information to allow for automated segmentation of nine intraretinal boundaries in a rapid fashion. Results 67,321 participants (134,642 eyes) in UK Biobank underwent OCT imaging of both eyes as part of the ocular module. 134,611 images were successfully processed with 31 images failing segmentation analysis due to corrupted OCT files or withdrawal of subject consent for UKBB study participation. Average time taken to call up an image from the database and complete segmentation analysis was approximately 120 seconds per data set per login, and analysis of the entire dataset was completed in approximately 28 days. Conclusions We report an approach to the rapid, automated measurement of retinal thickness from nearly 140,000 OCT image sets from the UK Biobank. In the near future, these measurements will be publically available for utilization by researchers around the world, and thus for correlation with the wealth of other data collected in UK Biobank. The automated analysis approaches we describe may be of utility for future large population-based epidemiological studies, clinical trials, and screening programs that employ OCT imaging. PMID:27716837

  16. Extraction of Urban Trees from Integrated Airborne Based Digital Image and LIDAR Point Cloud Datasets - Initial Results

    NASA Astrophysics Data System (ADS)

    Dogon-yaro, M. A.; Kumar, P.; Rahman, A. Abdul; Buyuksalih, G.

    2016-10-01

    Timely and accurate acquisition of information on the condition and structural changes of urban trees serves as a tool for decision makers to better appreciate urban ecosystems and their numerous values which are critical to building up strategies for sustainable development. The conventional techniques used for extracting tree features include; ground surveying and interpretation of the aerial photography. However, these techniques are associated with some constraint, such as labour intensive field work, a lot of financial requirement, influences by weather condition and topographical covers which can be overcome by means of integrated airborne based LiDAR and very high resolution digital image datasets. This study presented a semi-automated approach for extracting urban trees from integrated airborne based LIDAR and multispectral digital image datasets over Istanbul city of Turkey. The above scheme includes detection and extraction of shadow free vegetation features based on spectral properties of digital images using shadow index and NDVI techniques and automated extraction of 3D information about vegetation features from the integrated processing of shadow free vegetation image and LiDAR point cloud datasets. The ability of the developed algorithms shows a promising result as an automated and cost effective approach to estimating and delineated 3D information of urban trees. The research also proved that integrated datasets is a suitable technology and a viable source of information for city managers to be used in urban trees management.

  17. Survey and analysis of multiresolution methods for turbulence data

    DOE PAGES

    Pulido, Jesus; Livescu, Daniel; Woodring, Jonathan; ...

    2015-11-10

    This paper compares the effectiveness of various multi-resolution geometric representation methods, such as B-spline, Daubechies, Coiflet and Dual-tree wavelets, curvelets and surfacelets, to capture the structure of fully developed turbulence using a truncated set of coefficients. The turbulence dataset is obtained from a Direct Numerical Simulation of buoyancy driven turbulence on a 512 3 mesh size, with an Atwood number, A = 0.05, and turbulent Reynolds number, Re t = 1800, and the methods are tested against quantities pertaining to both velocities and active scalar (density) fields and their derivatives, spectra, and the properties of constant density surfaces. The comparisonsmore » between the algorithms are given in terms of performance, accuracy, and compression properties. The results should provide useful information for multi-resolution analysis of turbulence, coherent feature extraction, compression for large datasets handling, as well as simulations algorithms based on multi-resolution methods. In conclusion, the final section provides recommendations for best decomposition algorithms based on several metrics related to computational efficiency and preservation of turbulence properties using a reduced set of coefficients.« less

  18. Flow Cytometry Data Preparation Guidelines for Improved Automated Phenotypic Analysis.

    PubMed

    Jimenez-Carretero, Daniel; Ligos, José M; Martínez-López, María; Sancho, David; Montoya, María C

    2018-05-15

    Advances in flow cytometry (FCM) increasingly demand adoption of computational analysis tools to tackle the ever-growing data dimensionality. In this study, we tested different data input modes to evaluate how cytometry acquisition configuration and data compensation procedures affect the performance of unsupervised phenotyping tools. An analysis workflow was set up and tested for the detection of changes in reference bead subsets and in a rare subpopulation of murine lymph node CD103 + dendritic cells acquired by conventional or spectral cytometry. Raw spectral data or pseudospectral data acquired with the full set of available detectors by conventional cytometry consistently outperformed datasets acquired and compensated according to FCM standards. Our results thus challenge the paradigm of one-fluorochrome/one-parameter acquisition in FCM for unsupervised cluster-based analysis. Instead, we propose to configure instrument acquisition to use all available fluorescence detectors and to avoid integration and compensation procedures, thereby using raw spectral or pseudospectral data for improved automated phenotypic analysis. Copyright © 2018 by The American Association of Immunologists, Inc.

  19. Automated Image Registration Using Morphological Region of Interest Feature Extraction

    NASA Technical Reports Server (NTRS)

    Plaza, Antonio; LeMoigne, Jacqueline; Netanyahu, Nathan S.

    2005-01-01

    With the recent explosion in the amount of remotely sensed imagery and the corresponding interest in temporal change detection and modeling, image registration has become increasingly important as a necessary first step in the integration of multi-temporal and multi-sensor data for applications such as the analysis of seasonal and annual global climate changes, as well as land use/cover changes. The task of image registration can be divided into two major components: (1) the extraction of control points or features from images; and (2) the search among the extracted features for the matching pairs that represent the same feature in the images to be matched. Manual control feature extraction can be subjective and extremely time consuming, and often results in few usable points. Automated feature extraction is a solution to this problem, where desired target features are invariant, and represent evenly distributed landmarks such as edges, corners and line intersections. In this paper, we develop a novel automated registration approach based on the following steps. First, a mathematical morphology (MM)-based method is used to obtain a scale-orientation morphological profile at each image pixel. Next, a spectral dissimilarity metric such as the spectral information divergence is applied for automated extraction of landmark chips, followed by an initial approximate matching. This initial condition is then refined using a hierarchical robust feature matching (RFM) procedure. Experimental results reveal that the proposed registration technique offers a robust solution in the presence of seasonal changes and other interfering factors. Keywords-Automated image registration, multi-temporal imagery, mathematical morphology, robust feature matching.

  20. Robust and automated three-dimensional segmentation of densely packed cell nuclei in different biological specimens with Lines-of-Sight decomposition.

    PubMed

    Mathew, B; Schmitz, A; Muñoz-Descalzo, S; Ansari, N; Pampaloni, F; Stelzer, E H K; Fischer, S C

    2015-06-08

    Due to the large amount of data produced by advanced microscopy, automated image analysis is crucial in modern biology. Most applications require reliable cell nuclei segmentation. However, in many biological specimens cell nuclei are densely packed and appear to touch one another in the images. Therefore, a major difficulty of three-dimensional cell nuclei segmentation is the decomposition of cell nuclei that apparently touch each other. Current methods are highly adapted to a certain biological specimen or a specific microscope. They do not ensure similarly accurate segmentation performance, i.e. their robustness for different datasets is not guaranteed. Hence, these methods require elaborate adjustments to each dataset. We present an advanced three-dimensional cell nuclei segmentation algorithm that is accurate and robust. Our approach combines local adaptive pre-processing with decomposition based on Lines-of-Sight (LoS) to separate apparently touching cell nuclei into approximately convex parts. We demonstrate the superior performance of our algorithm using data from different specimens recorded with different microscopes. The three-dimensional images were recorded with confocal and light sheet-based fluorescence microscopes. The specimens are an early mouse embryo and two different cellular spheroids. We compared the segmentation accuracy of our algorithm with ground truth data for the test images and results from state-of-the-art methods. The analysis shows that our method is accurate throughout all test datasets (mean F-measure: 91%) whereas the other methods each failed for at least one dataset (F-measure≤69%). Furthermore, nuclei volume measurements are improved for LoS decomposition. The state-of-the-art methods required laborious adjustments of parameter values to achieve these results. Our LoS algorithm did not require parameter value adjustments. The accurate performance was achieved with one fixed set of parameter values. We developed a novel and fully automated three-dimensional cell nuclei segmentation method incorporating LoS decomposition. LoS are easily accessible features that ensure correct splitting of apparently touching cell nuclei independent of their shape, size or intensity. Our method showed superior performance compared to state-of-the-art methods, performing accurately for a variety of test images. Hence, our LoS approach can be readily applied to quantitative evaluation in drug testing, developmental and cell biology.

  1. Multi-spectrometer calibration transfer based on independent component analysis.

    PubMed

    Liu, Yan; Xu, Hao; Xia, Zhenzhen; Gong, Zhiyong

    2018-02-26

    Calibration transfer is indispensable for practical applications of near infrared (NIR) spectroscopy due to the need for precise and consistent measurements across different spectrometers. In this work, a method for multi-spectrometer calibration transfer is described based on independent component analysis (ICA). A spectral matrix is first obtained by aligning the spectra measured on different spectrometers. Then, by using independent component analysis, the aligned spectral matrix is decomposed into the mixing matrix and the independent components of different spectrometers. These differing measurements between spectrometers can then be standardized by correcting the coefficients within the independent components. Two NIR datasets of corn and edible oil samples measured with three and four spectrometers, respectively, were used to test the reliability of this method. The results of both datasets reveal that spectra measurements across different spectrometers can be transferred simultaneously and that the partial least squares (PLS) models built with the measurements on one spectrometer can predict that the spectra can be transferred correctly on another.

  2. Deep machine learning provides state-of-the-art performance in image-based plant phenotyping.

    PubMed

    Pound, Michael P; Atkinson, Jonathan A; Townsend, Alexandra J; Wilson, Michael H; Griffiths, Marcus; Jackson, Aaron S; Bulat, Adrian; Tzimiropoulos, Georgios; Wells, Darren M; Murchie, Erik H; Pridmore, Tony P; French, Andrew P

    2017-10-01

    In plant phenotyping, it has become important to be able to measure many features on large image sets in order to aid genetic discovery. The size of the datasets, now often captured robotically, often precludes manual inspection, hence the motivation for finding a fully automated approach. Deep learning is an emerging field that promises unparalleled results on many data analysis problems. Building on artificial neural networks, deep approaches have many more hidden layers in the network, and hence have greater discriminative and predictive power. We demonstrate the use of such approaches as part of a plant phenotyping pipeline. We show the success offered by such techniques when applied to the challenging problem of image-based plant phenotyping and demonstrate state-of-the-art results (>97% accuracy) for root and shoot feature identification and localization. We use fully automated trait identification using deep learning to identify quantitative trait loci in root architecture datasets. The majority (12 out of 14) of manually identified quantitative trait loci were also discovered using our automated approach based on deep learning detection to locate plant features. We have shown deep learning-based phenotyping to have very good detection and localization accuracy in validation and testing image sets. We have shown that such features can be used to derive meaningful biological traits, which in turn can be used in quantitative trait loci discovery pipelines. This process can be completely automated. We predict a paradigm shift in image-based phenotyping bought about by such deep learning approaches, given sufficient training sets. © The Authors 2017. Published by Oxford University Press.

  3. DSAP: deep-sequencing small RNA analysis pipeline.

    PubMed

    Huang, Po-Jung; Liu, Yi-Chung; Lee, Chi-Ching; Lin, Wei-Chen; Gan, Richie Ruei-Chi; Lyu, Ping-Chiang; Tang, Petrus

    2010-07-01

    DSAP is an automated multiple-task web service designed to provide a total solution to analyzing deep-sequencing small RNA datasets generated by next-generation sequencing technology. DSAP uses a tab-delimited file as an input format, which holds the unique sequence reads (tags) and their corresponding number of copies generated by the Solexa sequencing platform. The input data will go through four analysis steps in DSAP: (i) cleanup: removal of adaptors and poly-A/T/C/G/N nucleotides; (ii) clustering: grouping of cleaned sequence tags into unique sequence clusters; (iii) non-coding RNA (ncRNA) matching: sequence homology mapping against a transcribed sequence library from the ncRNA database Rfam (http://rfam.sanger.ac.uk/); and (iv) known miRNA matching: detection of known miRNAs in miRBase (http://www.mirbase.org/) based on sequence homology. The expression levels corresponding to matched ncRNAs and miRNAs are summarized in multi-color clickable bar charts linked to external databases. DSAP is also capable of displaying miRNA expression levels from different jobs using a log(2)-scaled color matrix. Furthermore, a cross-species comparative function is also provided to show the distribution of identified miRNAs in different species as deposited in miRBase. DSAP is available at http://dsap.cgu.edu.tw.

  4. Visualising nursing data using correspondence analysis.

    PubMed

    Kokol, Peter; Blažun Vošner, Helena; Železnik, Danica

    2016-09-01

    Digitally stored, large healthcare datasets enable nurses to use 'big data' techniques and tools in nursing research. Big data is complex and multi-dimensional, so visualisation may be a preferable approach to analyse and understand it. To demonstrate the use of visualisation of big data in a technique called correspondence analysis. In the authors' study, relations among data in a nursing dataset were shown visually in graphs using correspondence analysis. The case presented demonstrates that correspondence analysis is easy to use, shows relations between data visually in a form that is simple to interpret, and can reveal hidden associations between data. Correspondence analysis supports the discovery of new knowledge. Implications for practice Knowledge obtained using correspondence analysis can be transferred immediately into practice or used to foster further research.

  5. Can single empirical algorithms accurately predict inland shallow water quality status from high resolution, multi-sensor, multi-temporal satellite data?

    NASA Astrophysics Data System (ADS)

    Theologou, I.; Patelaki, M.; Karantzalos, K.

    2015-04-01

    Assessing and monitoring water quality status through timely, cost effective and accurate manner is of fundamental importance for numerous environmental management and policy making purposes. Therefore, there is a current need for validated methodologies which can effectively exploit, in an unsupervised way, the enormous amount of earth observation imaging datasets from various high-resolution satellite multispectral sensors. To this end, many research efforts are based on building concrete relationships and empirical algorithms from concurrent satellite and in-situ data collection campaigns. We have experimented with Landsat 7 and Landsat 8 multi-temporal satellite data, coupled with hyperspectral data from a field spectroradiometer and in-situ ground truth data with several physico-chemical and other key monitoring indicators. All available datasets, covering a 4 years period, in our case study Lake Karla in Greece, were processed and fused under a quantitative evaluation framework. The performed comprehensive analysis posed certain questions regarding the applicability of single empirical models across multi-temporal, multi-sensor datasets towards the accurate prediction of key water quality indicators for shallow inland systems. Single linear regression models didn't establish concrete relations across multi-temporal, multi-sensor observations. Moreover, the shallower parts of the inland system followed, in accordance with the literature, different regression patterns. Landsat 7 and 8 resulted in quite promising results indicating that from the recreation of the lake and onward consistent per-sensor, per-depth prediction models can be successfully established. The highest rates were for chl-a (r2=89.80%), dissolved oxygen (r2=88.53%), conductivity (r2=88.18%), ammonium (r2=87.2%) and pH (r2=86.35%), while the total phosphorus (r2=70.55%) and nitrates (r2=55.50%) resulted in lower correlation rates.

  6. Evaluation of Uncertainty in Precipitation Datasets for New Mexico, USA

    NASA Astrophysics Data System (ADS)

    Besha, A. A.; Steele, C. M.; Fernald, A.

    2014-12-01

    Climate change, population growth and other factors are endangering water availability and sustainability in semiarid/arid areas particularly in the southwestern United States. Wide coverage of spatial and temporal measurements of precipitation are key for regional water budget analysis and hydrological operations which themselves are valuable tool for water resource planning and management. Rain gauge measurements are usually reliable and accurate at a point. They measure rainfall continuously, but spatial sampling is limited. Ground based radar and satellite remotely sensed precipitation have wide spatial and temporal coverage. However, these measurements are indirect and subject to errors because of equipment, meteorological variability, the heterogeneity of the land surface itself and lack of regular recording. This study seeks to understand precipitation uncertainty and in doing so, lessen uncertainty propagation into hydrological applications and operations. We reviewed, compared and evaluated the TRMM (Tropical Rainfall Measuring Mission) precipitation products, NOAA's (National Oceanic and Atmospheric Administration) Global Precipitation Climatology Centre (GPCC) monthly precipitation dataset, PRISM (Parameter elevation Regression on Independent Slopes Model) data and data from individual climate stations including Cooperative Observer Program (COOP), Remote Automated Weather Stations (RAWS), Soil Climate Analysis Network (SCAN) and Snowpack Telemetry (SNOTEL) stations. Though not yet finalized, this study finds that the uncertainty within precipitation estimates datasets is influenced by regional topography, season, climate and precipitation rate. Ongoing work aims to further evaluate precipitation datasets based on the relative influence of these phenomena so that we can identify the optimum datasets for input to statewide water budget analysis.

  7. From data to knowledge: The future of multi-omics data analysis for the rhizosphere

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Allen White, Richard; Borkum, Mark I.; Rivas-Ubach, Albert

    The rhizosphere is the interface between a plant's roots and its surrounding soil. The rhizosphere microbiome, a complex microbial ecosystem, nourishes the terrestrial biosphere. Integrated multi-omics is a modern approach to systems biology that analyzes and interprets the datasets of multiple -omes of both individual organisms and multi-organism communities and consortia. The successful usage and application of integrated multi-omics to rhizospheric science is predicated upon the availability of rhizosphere-specific data, metadata and software. This review analyzes the availability of multi-omics data, metadata and software for rhizospheric science, identifying potential issues, challenges and opportunities.

  8. Multi-fractal detrended texture feature for brain tumor classification

    NASA Astrophysics Data System (ADS)

    Reza, Syed M. S.; Mays, Randall; Iftekharuddin, Khan M.

    2015-03-01

    We propose a novel non-invasive brain tumor type classification using Multi-fractal Detrended Fluctuation Analysis (MFDFA) [1] in structural magnetic resonance (MR) images. This preliminary work investigates the efficacy of the MFDFA features along with our novel texture feature known as multifractional Brownian motion (mBm) [2] in classifying (grading) brain tumors as High Grade (HG) and Low Grade (LG). Based on prior performance, Random Forest (RF) [3] is employed for tumor grading using two different datasets such as BRATS-2013 [4] and BRATS-2014 [5]. Quantitative scores such as precision, recall, accuracy are obtained using the confusion matrix. On an average 90% precision and 85% recall from the inter-dataset cross-validation confirm the efficacy of the proposed method.

  9. Automated segmentation and tracking of non-rigid objects in time-lapse microscopy videos of polymorphonuclear neutrophils.

    PubMed

    Brandes, Susanne; Mokhtari, Zeinab; Essig, Fabian; Hünniger, Kerstin; Kurzai, Oliver; Figge, Marc Thilo

    2015-02-01

    Time-lapse microscopy is an important technique to study the dynamics of various biological processes. The labor-intensive manual analysis of microscopy videos is increasingly replaced by automated segmentation and tracking methods. These methods are often limited to certain cell morphologies and/or cell stainings. In this paper, we present an automated segmentation and tracking framework that does not have these restrictions. In particular, our framework handles highly variable cell shapes and does not rely on any cell stainings. Our segmentation approach is based on a combination of spatial and temporal image variations to detect moving cells in microscopy videos. This method yields a sensitivity of 99% and a precision of 95% in object detection. The tracking of cells consists of different steps, starting from single-cell tracking based on a nearest-neighbor-approach, detection of cell-cell interactions and splitting of cell clusters, and finally combining tracklets using methods from graph theory. The segmentation and tracking framework was applied to synthetic as well as experimental datasets with varying cell densities implying different numbers of cell-cell interactions. We established a validation framework to measure the performance of our tracking technique. The cell tracking accuracy was found to be >99% for all datasets indicating a high accuracy for connecting the detected cells between different time points. Copyright © 2014 Elsevier B.V. All rights reserved.

  10. Multi-source Geospatial Data Analysis with Google Earth Engine

    NASA Astrophysics Data System (ADS)

    Erickson, T.

    2014-12-01

    The Google Earth Engine platform is a cloud computing environment for data analysis that combines a public data catalog with a large-scale computational facility optimized for parallel processing of geospatial data. The data catalog is a multi-petabyte archive of georeferenced datasets that include images from Earth observing satellite and airborne sensors (examples: USGS Landsat, NASA MODIS, USDA NAIP), weather and climate datasets, and digital elevation models. Earth Engine supports both a just-in-time computation model that enables real-time preview and debugging during algorithm development for open-ended data exploration, and a batch computation mode for applying algorithms over large spatial and temporal extents. The platform automatically handles many traditionally-onerous data management tasks, such as data format conversion, reprojection, and resampling, which facilitates writing algorithms that combine data from multiple sensors and/or models. Although the primary use of Earth Engine, to date, has been the analysis of large Earth observing satellite datasets, the computational platform is generally applicable to a wide variety of use cases that require large-scale geospatial data analyses. This presentation will focus on how Earth Engine facilitates the analysis of geospatial data streams that originate from multiple separate sources (and often communities) and how it enables collaboration during algorithm development and data exploration. The talk will highlight current projects/analyses that are enabled by this functionality.https://earthengine.google.org

  11. Deep convolutional neural network training enrichment using multi-view object-based analysis of Unmanned Aerial systems imagery for wetlands classification

    NASA Astrophysics Data System (ADS)

    Liu, Tao; Abd-Elrahman, Amr

    2018-05-01

    Deep convolutional neural network (DCNN) requires massive training datasets to trigger its image classification power, while collecting training samples for remote sensing application is usually an expensive process. When DCNN is simply implemented with traditional object-based image analysis (OBIA) for classification of Unmanned Aerial systems (UAS) orthoimage, its power may be undermined if the number training samples is relatively small. This research aims to develop a novel OBIA classification approach that can take advantage of DCNN by enriching the training dataset automatically using multi-view data. Specifically, this study introduces a Multi-View Object-based classification using Deep convolutional neural network (MODe) method to process UAS images for land cover classification. MODe conducts the classification on multi-view UAS images instead of directly on the orthoimage, and gets the final results via a voting procedure. 10-fold cross validation results show the mean overall classification accuracy increasing substantially from 65.32%, when DCNN was applied on the orthoimage to 82.08% achieved when MODe was implemented. This study also compared the performances of the support vector machine (SVM) and random forest (RF) classifiers with DCNN under traditional OBIA and the proposed multi-view OBIA frameworks. The results indicate that the advantage of DCNN over traditional classifiers in terms of accuracy is more obvious when these classifiers were applied with the proposed multi-view OBIA framework than when these classifiers were applied within the traditional OBIA framework.

  12. Automated Topographic Change Detection via Dem Differencing at Large Scales Using The Arcticdem Database

    NASA Astrophysics Data System (ADS)

    Candela, S. G.; Howat, I.; Noh, M. J.; Porter, C. C.; Morin, P. J.

    2016-12-01

    In the last decade, high resolution satellite imagery has become an increasingly accessible tool for geoscientists to quantify changes in the Arctic land surface due to geophysical, ecological and anthropomorphic processes. However, the trade off between spatial coverage and spatial-temporal resolution has limited detailed, process-level change detection over large (i.e. continental) scales. The ArcticDEM project utilized over 300,000 Worldview image pairs to produce a nearly 100% coverage elevation model (above 60°N) offering the first polar, high spatial - high resolution (2-8m by region) dataset, often with multiple repeats in areas of particular interest to geo-scientists. A dataset of this size (nearly 250 TB) offers endless new avenues of scientific inquiry, but quickly becomes unmanageable computationally and logistically for the computing resources available to the average scientist. Here we present TopoDiff, a framework for a generalized. automated workflow that requires minimal input from the end user about a study site, and utilizes cloud computing resources to provide a temporally sorted and differenced dataset, ready for geostatistical analysis. This hands-off approach allows the end user to focus on the science, without having to manage thousands of files, or petabytes of data. At the same time, TopoDiff provides a consistent and accurate workflow for image sorting, selection, and co-registration enabling cross-comparisons between research projects.

  13. The Automation and Exoplanet Orbital Characterization from the Gemini Planet Imager Exoplanet Survey

    NASA Astrophysics Data System (ADS)

    Jinfei Wang, Jason; Graham, James; Perrin, Marshall; Pueyo, Laurent; Savransky, Dmitry; Kalas, Paul; arriaga, Pauline; Chilcote, Jeffrey K.; De Rosa, Robert J.; Ruffio, Jean-Baptiste; Sivaramakrishnan, Anand; Gemini Planet Imager Exoplanet Survey Collaboration

    2018-01-01

    The Gemini Planet Imager (GPI) Exoplanet Survey (GPIES) is a multi-year 600-star survey to discover and characterize young Jovian exoplanets and their planet forming environments. For large surveys like GPIES, it is critical to have a uniform dataset processed with the latest techniques and calibrations. I will describe the GPI Data Cruncher, an automated data processing framework that is able to generate fully reduced data minutes after the data are taken and can also reprocess the entire campaign in a single day on a supercomputer. The Data Cruncher integrates into a larger automated data processing infrastructure which syncs, logs, and displays the data. I will discuss the benefits of the GPIES data infrastructure, including optimizing observing strategies, finding planets, characterizing instrument performance, and constraining giant planet occurrence. I will also discuss my work in characterizing the exoplanets we have imaged in GPIES through monitoring their orbits. Using advanced data processing algorithms and GPI's precise astrometric calibration, I will show that GPI can achieve one milliarcsecond astrometry on the extensively-studied planet Beta Pic b. With GPI, we can confidently rule out a possible transit of Beta Pic b, but have precise timings on a Hill sphere transit, and I will discuss efforts to search for transiting circumplanetary material this year. I will also discuss the orbital monitoring of other exoplanets as part of GPIES.

  14. Measuring Performance with Library Automated Systems.

    ERIC Educational Resources Information Center

    OFarrell, John P.

    2000-01-01

    Investigates the capability of three library automated systems to generate some of the datasets necessary to form the ISO (International Standards Organization) standard on performance measurement within libraries, based on research in Liverpool John Moores University (United Kingdom). Concludes that the systems are weak in generating the…

  15. MaxBin 2.0: an automated binning algorithm to recover genomes from multiple metagenomic datasets

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wu, Yu-Wei; Simmons, Blake A.; Singer, Steven W.

    The recovery of genomes from metagenomic datasets is a critical step to defining the functional roles of the underlying uncultivated populations. We previously developed MaxBin, an automated binning approach for high-throughput recovery of microbial genomes from metagenomes. Here, we present an expanded binning algorithm, MaxBin 2.0, which recovers genomes from co-assembly of a collection of metagenomic datasets. Tests on simulated datasets revealed that MaxBin 2.0 is highly accurate in recovering individual genomes, and the application of MaxBin 2.0 to several metagenomes from environmental samples demonstrated that it could achieve two complementary goals: recovering more bacterial genomes compared to binning amore » single sample as well as comparing the microbial community composition between different sampling environments. Availability and implementation: MaxBin 2.0 is freely available at http://sourceforge.net/projects/maxbin/ under BSD license. Supplementary information: Supplementary data are available at Bioinformatics online.« less

  16. 9th Annual Systems Engineering Conference: Volume 4 Thursday

    DTIC Science & Technology

    2006-10-26

    Connectivity, Speed, Volume • Enterprise application integration • Workflow integration or multi-media • Federated search capability • Link analysis and...categorization, federated search & automated discovery of information — Collaborative tools to quickly share relevant information Built on commercial

  17. Method of multi-dimensional moment analysis for the characterization of signal peaks

    DOEpatents

    Pfeifer, Kent B; Yelton, William G; Kerr, Dayle R; Bouchier, Francis A

    2012-10-23

    A method of multi-dimensional moment analysis for the characterization of signal peaks can be used to optimize the operation of an analytical system. With a two-dimensional Peclet analysis, the quality and signal fidelity of peaks in a two-dimensional experimental space can be analyzed and scored. This method is particularly useful in determining optimum operational parameters for an analytical system which requires the automated analysis of large numbers of analyte data peaks. For example, the method can be used to optimize analytical systems including an ion mobility spectrometer that uses a temperature stepped desorption technique for the detection of explosive mixtures.

  18. [Quantification of pulmonary emphysema in multislice-CT using different software tools].

    PubMed

    Heussel, C P; Achenbach, T; Buschsieweke, C; Kuhnigk, J; Weinheimer, O; Hammer, G; Düber, C; Kauczor, H-U

    2006-10-01

    The data records of thin-section MSCT of the lung with approx. 300 images are difficult to use in manual evaluation. A computer-assisted pre-diagnosis can help with reporting. Furthermore, post-processing techniques, for instance, for quantification of emphysema on the basis of three-dimensional anatomical information might be improved and the workflow might be further automated. The results of 4 programs (Pulmo, Volume, YACTA and PulmoFUNC) for the quantitative analysis of emphysema (lung and emphysema volume, mean lung density and emphysema index) of 30 consecutive thin-section MSCT datasets with different emphysema severity levels were compared. The classification result of the YACTA program for different types of emphysema was also analyzed. Pulmo and Volume have a median operating time of 105 and 59 minutes respectively due to the necessity for extensive manual correction of the lung segmentation. The programs PulmoFUNC and YACTA, which are automated to a large extent, have a median runtime of 26 and 16 minutes, respectively. The evaluation with Pulmo and Volume using 2 different datasets resulted in implausible values. PulmoFUNC crashed with 2 other datasets in a reproducible manner. Only with YACTA could all graphic datasets be evaluated. The lung volume, emphysema volume, emphysema index and mean lung density determined by YACTA and PulmoFUNC are significantly larger than the corresponding values of Volume and Pulmo (differences: Volume: 119 cm(3)/65 cm(3)/1 %/17 HU, Pulmo: 60 cm(3)/96 cm(3)/1 %/37 HU). Classification of the emphysema type was in agreement with that of the radiologist in 26 panlobular cases, in 22 paraseptalen cases and in 15 centrilobular emphysema cases. The substantial expenditure of time obstructs the employment of quantitative emphysema analysis in the clinical routine. The results of YACTA and PulmoFUNC are affected by the dedicated exclusion of the tracheobronchial system. These fully automatic tools enable not only fast quantification without manual interaction, but also a reproducible measurement without user dependence.

  19. Implementation of a Flexible Tool for Automated Literature-Mining and Knowledgebase Development (DevToxMine)

    EPA Science Inventory

    Deriving novel relationships from the scientific literature is an important adjunct to datamining activities for complex datasets in genomics and high-throughput screening activities. Automated text-mining algorithms can be used to extract relevant content from the literature and...

  20. Recognition of skin melanoma through dermoscopic image analysis

    NASA Astrophysics Data System (ADS)

    Gómez, Catalina; Herrera, Diana Sofia

    2017-11-01

    Melanoma skin cancer diagnosis can be challenging due to the similarities of the early stage symptoms with regular moles. Standardized visual parameters can be determined and characterized to suspect a melanoma cancer type. The automation of this diagnosis could have an impact in the medical field by providing a tool to support the specialists with high accuracy. The objective of this study is to develop an algorithm trained to distinguish a highly probable melanoma from a non-dangerous mole by the segmentation and classification of dermoscopic mole images. We evaluate our approach on the dataset provided by the International Skin Imaging Collaboration used in the International Challenge Skin Lesion Analysis Towards Melanoma Detection. For the segmentation task, we apply a preprocessing algorithm and use Otsu's thresholding in the best performing color space; the average Jaccard Index in the test dataset is 70.05%. For the subsequent classification stage, we use joint histograms in the YCbCr color space, a RBF Gaussian SVM trained with five features concerning circularity and irregularity of the segmented lesion, and the Gray Level Co-occurrence matrix features for texture analysis. These features are combined to obtain an Average Classification Accuracy of 63.3% in the test dataset.

  1. Annotating spatio-temporal datasets for meaningful analysis in the Web

    NASA Astrophysics Data System (ADS)

    Stasch, Christoph; Pebesma, Edzer; Scheider, Simon

    2014-05-01

    More and more environmental datasets that vary in space and time are available in the Web. This comes along with an advantage of using the data for other purposes than originally foreseen, but also with the danger that users may apply inappropriate analysis procedures due to lack of important assumptions made during the data collection process. In order to guide towards a meaningful (statistical) analysis of spatio-temporal datasets available in the Web, we have developed a Higher-Order-Logic formalism that captures some relevant assumptions in our previous work [1]. It allows to proof on meaningful spatial prediction and aggregation in a semi-automated fashion. In this poster presentation, we will present a concept for annotating spatio-temporal datasets available in the Web with concepts defined in our formalism. Therefore, we have defined a subset of the formalism as a Web Ontology Language (OWL) pattern. It allows capturing the distinction between the different spatio-temporal variable types, i.e. point patterns, fields, lattices and trajectories, that in turn determine whether a particular dataset can be interpolated or aggregated in a meaningful way using a certain procedure. The actual annotations that link spatio-temporal datasets with the concepts in the ontology pattern are provided as Linked Data. In order to allow data producers to add the annotations to their datasets, we have implemented a Web portal that uses a triple store at the backend to store the annotations and to make them available in the Linked Data cloud. Furthermore, we have implemented functions in the statistical environment R to retrieve the RDF annotations and, based on these annotations, to support a stronger typing of spatio-temporal datatypes guiding towards a meaningful analysis in R. [1] Stasch, C., Scheider, S., Pebesma, E., Kuhn, W. (2014): "Meaningful spatial prediction and aggregation", Environmental Modelling & Software, 51, 149-165.

  2. Evaluation of linear discriminant analysis for automated Raman histological mapping of esophageal high-grade dysplasia

    NASA Astrophysics Data System (ADS)

    Hutchings, Joanne; Kendall, Catherine; Shepherd, Neil; Barr, Hugh; Stone, Nicholas

    2010-11-01

    Rapid Raman mapping has the potential to be used for automated histopathology diagnosis, providing an adjunct technique to histology diagnosis. The aim of this work is to evaluate the feasibility of automated and objective pathology classification of Raman maps using linear discriminant analysis. Raman maps of esophageal tissue sections are acquired. Principal component (PC)-fed linear discriminant analysis (LDA) is carried out using subsets of the Raman map data (6483 spectra). An overall (validated) training classification model performance of 97.7% (sensitivity 95.0 to 100% and specificity 98.6 to 100%) is obtained. The remainder of the map spectra (131,672 spectra) are projected onto the classification model resulting in Raman images, demonstrating good correlation with contiguous hematoxylin and eosin (HE) sections. Initial results suggest that LDA has the potential to automate pathology diagnosis of esophageal Raman images, but since the classification of test spectra is forced into existing training groups, further work is required to optimize the training model. A small pixel size is advantageous for developing the training datasets using mapping data, despite lengthy mapping times, due to additional morphological information gained, and could facilitate differentiation of further tissue groups, such as the basal cells/lamina propria, in the future, but larger pixels sizes (and faster mapping) may be more feasible for clinical application.

  3. Spinal cord grey matter segmentation challenge.

    PubMed

    Prados, Ferran; Ashburner, John; Blaiotta, Claudia; Brosch, Tom; Carballido-Gamio, Julio; Cardoso, Manuel Jorge; Conrad, Benjamin N; Datta, Esha; Dávid, Gergely; Leener, Benjamin De; Dupont, Sara M; Freund, Patrick; Wheeler-Kingshott, Claudia A M Gandini; Grussu, Francesco; Henry, Roland; Landman, Bennett A; Ljungberg, Emil; Lyttle, Bailey; Ourselin, Sebastien; Papinutto, Nico; Saporito, Salvatore; Schlaeger, Regina; Smith, Seth A; Summers, Paul; Tam, Roger; Yiannakas, Marios C; Zhu, Alyssa; Cohen-Adad, Julien

    2017-05-15

    An important image processing step in spinal cord magnetic resonance imaging is the ability to reliably and accurately segment grey and white matter for tissue specific analysis. There are several semi- or fully-automated segmentation methods for cervical cord cross-sectional area measurement with an excellent performance close or equal to the manual segmentation. However, grey matter segmentation is still challenging due to small cross-sectional size and shape, and active research is being conducted by several groups around the world in this field. Therefore a grey matter spinal cord segmentation challenge was organised to test different capabilities of various methods using the same multi-centre and multi-vendor dataset acquired with distinct 3D gradient-echo sequences. This challenge aimed to characterize the state-of-the-art in the field as well as identifying new opportunities for future improvements. Six different spinal cord grey matter segmentation methods developed independently by various research groups across the world and their performance were compared to manual segmentation outcomes, the present gold-standard. All algorithms provided good overall results for detecting the grey matter butterfly, albeit with variable performance in certain quality-of-segmentation metrics. The data have been made publicly available and the challenge web site remains open to new submissions. No modifications were introduced to any of the presented methods as a result of this challenge for the purposes of this publication. Copyright © 2017 The Authors. Published by Elsevier Inc. All rights reserved.

  4. Automated multivariate analysis of multi-sensor data submitted online: Real-time environmental monitoring.

    PubMed

    Eide, Ingvar; Westad, Frank

    2018-01-01

    A pilot study demonstrating real-time environmental monitoring with automated multivariate analysis of multi-sensor data submitted online has been performed at the cabled LoVe Ocean Observatory located at 258 m depth 20 km off the coast of Lofoten-Vesterålen, Norway. The major purpose was efficient monitoring of many variables simultaneously and early detection of changes and time-trends in the overall response pattern before changes were evident in individual variables. The pilot study was performed with 12 sensors from May 16 to August 31, 2015. The sensors provided data for chlorophyll, turbidity, conductivity, temperature (three sensors), salinity (calculated from temperature and conductivity), biomass at three different depth intervals (5-50, 50-120, 120-250 m), and current speed measured in two directions (east and north) using two sensors covering different depths with overlap. A total of 88 variables were monitored, 78 from the two current speed sensors. The time-resolution varied, thus the data had to be aligned to a common time resolution. After alignment, the data were interpreted using principal component analysis (PCA). Initially, a calibration model was established using data from May 16 to July 31. The data on current speed from two sensors were subject to two separate PCA models and the score vectors from these two models were combined with the other 10 variables in a multi-block PCA model. The observations from August were projected on the calibration model consecutively one at a time and the result was visualized in a score plot. Automated PCA of multi-sensor data submitted online is illustrated with an attached time-lapse video covering the relative short time period used in the pilot study. Methods for statistical validation, and warning and alarm limits are described. Redundant sensors enable sensor diagnostics and quality assurance. In a future perspective, the concept may be used in integrated environmental monitoring.

  5. Slaying Hydra: A Python-Based Reduction Pipeline for the Hydra Multi-Object Spectrograph

    NASA Astrophysics Data System (ADS)

    Seifert, Richard; Mann, Andrew

    2018-01-01

    We present a Python-based data reduction pipeline for the Hydra Multi-Object Spectrograph on the WIYN 3.5 m telescope, an instrument which enables simultaneous spectroscopy of up to 93 targets. The reduction steps carried out include flat-fielding, dynamic fiber tracing, wavelength calibration, optimal fiber extraction, and sky subtraction. The pipeline also supports the use of sky lines to correct for zero-point offsets between fibers. To account for the moving parts on the instrument and telescope, fiber positions and wavelength solutions are derived in real-time for each dataset. The end result is a one-dimensional spectrum for each target fiber. Quick and fully automated, the pipeline enables on-the-fly reduction while observing, and has been known to outperform the IRAF pipeline by more accurately reproducing known RVs. While Hydra has many configurations in both high- and low-resolution, the pipeline was developed and tested with only one high-resolution mode. In the future we plan to expand the pipeline to work in most commonly used modes.

  6. Integrative Exploratory Analysis of Two or More Genomic Datasets.

    PubMed

    Meng, Chen; Culhane, Aedin

    2016-01-01

    Exploratory analysis is an essential step in the analysis of high throughput data. Multivariate approaches such as correspondence analysis (CA), principal component analysis, and multidimensional scaling are widely used in the exploratory analysis of single dataset. Modern biological studies often assay multiple types of biological molecules (e.g., mRNA, protein, phosphoproteins) on a same set of biological samples, thereby creating multiple different types of omics data or multiassay data. Integrative exploratory analysis of these multiple omics data is required to leverage the potential of multiple omics studies. In this chapter, we describe the application of co-inertia analysis (CIA; for analyzing two datasets) and multiple co-inertia analysis (MCIA; for three or more datasets) to address this problem. These methods are powerful yet simple multivariate approaches that represent samples using a lower number of variables, allowing a more easily identification of the correlated structure in and between multiple high dimensional datasets. Graphical representations can be employed to this purpose. In addition, the methods simultaneously project samples and variables (genes, proteins) onto the same lower dimensional space, so the most variant variables from each dataset can be selected and associated with samples, which can be further used to facilitate biological interpretation and pathway analysis. We applied CIA to explore the concordance between mRNA and protein expression in a panel of 60 tumor cell lines from the National Cancer Institute. In the same 60 cell lines, we used MCIA to perform a cross-platform comparison of mRNA gene expression profiles obtained on four different microarray platforms. Last, as an example of integrative analysis of multiassay or multi-omics data we analyzed transcriptomic, proteomic, and phosphoproteomic data from pluripotent (iPS) and embryonic stem (ES) cell lines.

  7. Chemical elements in the environment: multi-element geochemical datasets from continental to national scale surveys on four continents

    USGS Publications Warehouse

    Caritat, Patrice de; Reimann, Clemens; Smith, David; Wang, Xueqiu

    2017-01-01

    During the last 10-20 years, Geological Surveys around the world have undertaken a major effort towards delivering fully harmonized and tightly quality-controlled low-density multi-element soil geochemical maps and datasets of vast regions including up to whole continents. Concentrations of between 45 and 60 elements commonly have been determined in a variety of different regolith types (e.g., sediment, soil). The multi-element datasets are published as complete geochemical atlases and made available to the general public. Several other geochemical datasets covering smaller areas but generally at a higher spatial density are also available. These datasets may, however, not be found by superficial internet-based searches because the elements are not mentioned individually either in the title or in the keyword lists of the original references. This publication attempts to increase the visibility and discoverability of these fundamental background datasets covering large areas up to whole continents.

  8. Test-retest reliability and longitudinal analysis of automated hippocampal subregion volumes in healthy ageing and Alzheimer's disease populations.

    PubMed

    Worker, Amanda; Dima, Danai; Combes, Anna; Crum, William R; Streffer, Johannes; Einstein, Steven; Mehta, Mitul A; Barker, Gareth J; C R Williams, Steve; O'daly, Owen

    2018-04-01

    The hippocampal formation is a complex brain structure that is important in cognitive processes such as memory, mood, reward processing and other executive functions. Histological and neuroimaging studies have implicated the hippocampal region in neuropsychiatric disorders as well as in neurodegenerative diseases. This highly plastic limbic region is made up of several subregions that are believed to have different functional roles. Therefore, there is a growing interest in imaging the subregions of the hippocampal formation rather than modelling the hippocampus as a homogenous structure, driving the development of new automated analysis tools. Consequently, there is a pressing need to understand the stability of the measures derived from these new techniques. In this study, an automated hippocampal subregion segmentation pipeline, released as a developmental version of Freesurfer (v6.0), was applied to T1-weighted magnetic resonance imaging (MRI) scans of 22 healthy older participants, scanned on 3 separate occasions and a separate longitudinal dataset of 40 Alzheimer's disease (AD) patients. Test-retest reliability of hippocampal subregion volumes was assessed using the intra-class correlation coefficient (ICC), percentage volume difference and percentage volume overlap (Dice). Sensitivity of the regional estimates to longitudinal change was estimated using linear mixed effects (LME) modelling. The results show that out of the 24 hippocampal subregions, 20 had ICC scores of 0.9 or higher in both samples; these regions include the molecular layer, granule cell layer of the dentate gyrus, CA1, CA3 and the subiculum (ICC > 0.9), whilst the hippocampal fissure and fimbria had lower ICC scores (0.73-0.88). Furthermore, LME analysis of the independent AD dataset demonstrated sensitivity to group and individual differences in the rate of volume change over time in several hippocampal subregions (CA1, molecular layer, CA3, hippocampal tail, fissure and presubiculum). These results indicate that this automated segmentation method provides a robust method with which to measure hippocampal subregions, and may be useful in tracking disease progression and measuring the effects of pharmacological intervention. © 2018 Wiley Periodicals, Inc.

  9. Detection of breast cancer in automated 3D breast ultrasound

    NASA Astrophysics Data System (ADS)

    Tan, Tao; Platel, Bram; Mus, Roel; Karssemeijer, Nico

    2012-03-01

    Automated 3D breast ultrasound (ABUS) is a novel imaging modality, in which motorized scans of the breasts are made with a wide transducer through a membrane under modest compression. The technology has gained high interest and may become widely used in screening of dense breasts, where sensitivity of mammography is poor. ABUS has a high sensitivity for detecting solid breast lesions. However, reading ABUS images is time consuming, and subtle abnormalities may be missed. Therefore, we are developing a computer aided detection (CAD) system to help reduce reading time and errors. In the multi-stage system we propose, segmentations of the breast and nipple are performed, providing landmarks for the detection algorithm. Subsequently, voxel features characterizing coronal spiculation patterns, blobness, contrast, and locations with respect to landmarks are extracted. Using an ensemble of classifiers, a likelihood map indicating potential malignancies is computed. Local maxima in the likelihood map are determined using a local maxima detector and form a set of candidate lesions in each view. These candidates are further processed in a second detection stage, which includes region segmentation, feature extraction and a final classification. Region segmentation is performed using a 3D spiral-scanning dynamic programming method. Region features include descriptors of shape, acoustic behavior and texture. Performance was determined using a 78-patient dataset with 93 images, including 50 malignant lesions. We used 10-fold cross-validation. Using FROC analysis we found that the system obtains a lesion sensitivity of 60% and 70% at 2 and 4 false positives per image respectively.

  10. Semi-automatic mapping of geological Structures using UAV-based photogrammetric data: An image analysis approach

    NASA Astrophysics Data System (ADS)

    Vasuki, Yathunanthan; Holden, Eun-Jung; Kovesi, Peter; Micklethwaite, Steven

    2014-08-01

    Recent advances in data acquisition technologies, such as Unmanned Aerial Vehicles (UAVs), have led to a growing interest in capturing high-resolution rock surface images. However, due to the large volumes of data that can be captured in a short flight, efficient analysis of this data brings new challenges, especially the time it takes to digitise maps and extract orientation data. We outline a semi-automated method that allows efficient mapping of geological faults using photogrammetric data of rock surfaces, which was generated from aerial photographs collected by a UAV. Our method harnesses advanced automated image analysis techniques and human data interaction to rapidly map structures and then calculate their dip and dip directions. Geological structures (faults, joints and fractures) are first detected from the primary photographic dataset and the equivalent three dimensional (3D) structures are then identified within a 3D surface model generated by structure from motion (SfM). From this information the location, dip and dip direction of the geological structures are calculated. A structure map generated by our semi-automated method obtained a recall rate of 79.8% when compared against a fault map produced using expert manual digitising and interpretation methods. The semi-automated structure map was produced in 10 min whereas the manual method took approximately 7 h. In addition, the dip and dip direction calculation, using our automated method, shows a mean±standard error of 1.9°±2.2° and 4.4°±2.6° respectively with field measurements. This shows the potential of using our semi-automated method for accurate and efficient mapping of geological structures, particularly from remote, inaccessible or hazardous sites.

  11. Development of Multi-perspective Diagnostics and Analysis Algorithms with Applications to Subsonic and Supersonic Combustors

    NASA Astrophysics Data System (ADS)

    Wickersham, Andrew Joseph

    There are two critical research needs for the study of hydrocarbon combustion in high speed flows: 1) combustion diagnostics with adequate temporal and spatial resolution, and 2) mathematical techniques that can extract key information from large datasets. The goal of this work is to address these needs, respectively, by the use of high speed and multi-perspective chemiluminescence and advanced mathematical algorithms. To obtain the measurements, this work explored the application of high speed chemiluminescence diagnostics and the use of fiber-based endoscopes (FBEs) for non-intrusive and multi-perspective chemiluminescence imaging up to 20 kHz. Non-intrusive and full-field imaging measurements provide a wealth of information for model validation and design optimization of propulsion systems. However, it is challenging to obtain such measurements due to various implementation difficulties such as optical access, thermal management, and equipment cost. This work therefore explores the application of FBEs for non-intrusive imaging to supersonic propulsion systems. The FBEs used in this work are demonstrated to overcome many of the aforementioned difficulties and provided datasets from multiple angular positions up to 20 kHz in a supersonic combustor. The combustor operated on ethylene fuel at Mach 2 with an inlet stagnation temperature and pressure of approximately 640 degrees Fahrenheit and 70 psia, respectively. The imaging measurements were obtained from eight perspectives simultaneously, providing full-field datasets under such flow conditions for the first time, allowing the possibility of inferring multi-dimensional measurements. Due to the high speed and multi-perspective nature, such new diagnostic capability generates a large volume of data and calls for analysis algorithms that can process the data and extract key physics effectively. To extract the key combustion dynamics from the measurements, three mathematical methods were investigated in this work: Fourier analysis, proper orthogonal decomposition (POD), and wavelet analysis (WA). These algorithms were first demonstrated and tested on imaging measurements obtained from one perspective in a sub-sonic combustor (up to Mach 0.2). The results show that these algorithms are effective in extracting the key physics from large datasets, including the characteristic frequencies of flow-flame interactions especially during transient processes such as lean blow off and ignition. After these relatively simple tests and demonstrations, these algorithms were applied to process the measurements obtained from multi-perspective in the supersonic combustor. compared to past analyses (which have been limited to data obtained from one perspective only), the availability of data at multiple perspective provide further insights into the flame and flow structures in high speed flows. In summary, this work shows that high speed chemiluminescence is a simple yet powerful combustion diagnostic. Especially when combined with FBEs and the analyses algorithms described in this work, such diagnostics provide full-field imaging at high repetition rate in challenging flows. Based on such measurements, a wealth of information can be obtained from proper analysis algorithms, including characteristic frequency, dominating flame modes, and even multi-dimensional flame and flow structures.

  12. Automatic cardiac cycle determination directly from EEG-fMRI data by multi-scale peak detection method.

    PubMed

    Wong, Chung-Ki; Luo, Qingfei; Zotev, Vadim; Phillips, Raquel; Chan, Kam Wai Clifford; Bodurka, Jerzy

    2018-03-31

    In simultaneous EEG-fMRI, identification of the period of cardioballistic artifact (BCG) in EEG is required for the artifact removal. Recording the electrocardiogram (ECG) waveform during fMRI is difficult, often causing inaccurate period detection. Since the waveform of the BCG extracted by independent component analysis (ICA) is relatively invariable compared to the ECG waveform, we propose a multiple-scale peak-detection algorithm to determine the BCG cycle directly from the EEG data. The algorithm first extracts the high contrast BCG component from the EEG data by ICA. The BCG cycle is then estimated by band-pass filtering the component around the fundamental frequency identified from its energy spectral density, and the peak of BCG artifact occurrence is selected from each of the estimated cycle. The algorithm is shown to achieve a high accuracy on a large EEG-fMRI dataset. It is also adaptive to various heart rates without the needs of adjusting the threshold parameters. The cycle detection remains accurate with the scan duration reduced to half a minute. Additionally, the algorithm gives a figure of merit to evaluate the reliability of the detection accuracy. The algorithm is shown to give a higher detection accuracy than the commonly used cycle detection algorithm fmrib_qrsdetect implemented in EEGLAB. The achieved high cycle detection accuracy of our algorithm without using the ECG waveforms makes possible to create and automate pipelines for processing large EEG-fMRI datasets, and virtually eliminates the need for ECG recordings for BCG artifact removal. Copyright © 2018 The Authors. Published by Elsevier B.V. All rights reserved.

  13. Data fusion for automated non-destructive inspection

    PubMed Central

    Brierley, N.; Tippetts, T.; Cawley, P.

    2014-01-01

    In industrial non-destructive evaluation (NDE), it is increasingly common for data acquisition to be automated, driving a recent substantial increase in the availability of data. The collected data need to be analysed, typically necessitating the painstaking manual labour of a skilled operator. Moreover, in automated NDE a region of an inspected component is typically interrogated several times, be it within a single data channel due to multiple probe passes, across several channels acquired simultaneously or over the course of repeated inspections. The systematic combination of these diverse readings is recognized to offer an opportunity to improve the reliability of the inspection, but is not achievable in a manual analysis. This paper describes a data-fusion-based software framework providing a partial automation capability, allowing component regions to be declared defect-free to a very high probability while readily identifying defect indications, thereby optimizing the use of the operator's time. The system is designed to applicable to a wide range of automated NDE scenarios, but the processing is exemplified using the industrial ultrasonic immersion inspection of aerospace turbine discs. Results obtained for industrial datasets demonstrate an orders-of-magnitude reduction in false-call rates, for a given probability of detection, achievable using the developed software system. PMID:25002828

  14. Integrating multiple immunogenetic data sources for feature extraction and mining somatic hypermutation patterns: the case of "towards analysis" in chronic lymphocytic leukaemia.

    PubMed

    Kavakiotis, Ioannis; Xochelli, Aliki; Agathangelidis, Andreas; Tsoumakas, Grigorios; Maglaveras, Nicos; Stamatopoulos, Kostas; Hadzidimitriou, Anastasia; Vlahavas, Ioannis; Chouvarda, Ioanna

    2016-06-06

    Somatic Hypermutation (SHM) refers to the introduction of mutations within rearranged V(D)J genes, a process that increases the diversity of Immunoglobulins (IGs). The analysis of SHM has offered critical insight into the physiology and pathology of B cells, leading to strong prognostication markers for clinical outcome in chronic lymphocytic leukaemia (CLL), the most frequent adult B-cell malignancy. In this paper we present a methodology for integrating multiple immunogenetic and clinocobiological data sources in order to extract features and create high quality datasets for SHM analysis in IG receptors of CLL patients. This dataset is used as the basis for a higher level integration procedure, inspired form social choice theory. This is applied in the Towards Analysis, our attempt to investigate the potential ontogenetic transformation of genes belonging to specific stereotyped CLL subsets towards other genes or gene families, through SHM. The data integration process, followed by feature extraction, resulted in the generation of a dataset containing information about mutations occurring through SHM. The Towards analysis performed on the integrated dataset applying voting techniques, revealed the distinct behaviour of subset #201 compared to other subsets, as regards SHM related movements among gene clans, both in allele-conserved and non-conserved gene areas. With respect to movement between genes, a high percentage movement towards pseudo genes was found in all CLL subsets. This data integration and feature extraction process can set the basis for exploratory analysis or a fully automated computational data mining approach on many as yet unanswered, clinically relevant biological questions.

  15. SamuROI, a Python-Based Software Tool for Visualization and Analysis of Dynamic Time Series Imaging at Multiple Spatial Scales.

    PubMed

    Rueckl, Martin; Lenzi, Stephen C; Moreno-Velasquez, Laura; Parthier, Daniel; Schmitz, Dietmar; Ruediger, Sten; Johenning, Friedrich W

    2017-01-01

    The measurement of activity in vivo and in vitro has shifted from electrical to optical methods. While the indicators for imaging activity have improved significantly over the last decade, tools for analysing optical data have not kept pace. Most available analysis tools are limited in their flexibility and applicability to datasets obtained at different spatial scales. Here, we present SamuROI (Structured analysis of multiple user-defined ROIs), an open source Python-based analysis environment for imaging data. SamuROI simplifies exploratory analysis and visualization of image series of fluorescence changes in complex structures over time and is readily applicable at different spatial scales. In this paper, we show the utility of SamuROI in Ca 2+ -imaging based applications at three spatial scales: the micro-scale (i.e., sub-cellular compartments including cell bodies, dendrites and spines); the meso-scale, (i.e., whole cell and population imaging with single-cell resolution); and the macro-scale (i.e., imaging of changes in bulk fluorescence in large brain areas, without cellular resolution). The software described here provides a graphical user interface for intuitive data exploration and region of interest (ROI) management that can be used interactively within Jupyter Notebook: a publicly available interactive Python platform that allows simple integration of our software with existing tools for automated ROI generation and post-processing, as well as custom analysis pipelines. SamuROI software, source code and installation instructions are publicly available on GitHub and documentation is available online. SamuROI reduces the energy barrier for manual exploration and semi-automated analysis of spatially complex Ca 2+ imaging datasets, particularly when these have been acquired at different spatial scales.

  16. SamuROI, a Python-Based Software Tool for Visualization and Analysis of Dynamic Time Series Imaging at Multiple Spatial Scales

    PubMed Central

    Rueckl, Martin; Lenzi, Stephen C.; Moreno-Velasquez, Laura; Parthier, Daniel; Schmitz, Dietmar; Ruediger, Sten; Johenning, Friedrich W.

    2017-01-01

    The measurement of activity in vivo and in vitro has shifted from electrical to optical methods. While the indicators for imaging activity have improved significantly over the last decade, tools for analysing optical data have not kept pace. Most available analysis tools are limited in their flexibility and applicability to datasets obtained at different spatial scales. Here, we present SamuROI (Structured analysis of multiple user-defined ROIs), an open source Python-based analysis environment for imaging data. SamuROI simplifies exploratory analysis and visualization of image series of fluorescence changes in complex structures over time and is readily applicable at different spatial scales. In this paper, we show the utility of SamuROI in Ca2+-imaging based applications at three spatial scales: the micro-scale (i.e., sub-cellular compartments including cell bodies, dendrites and spines); the meso-scale, (i.e., whole cell and population imaging with single-cell resolution); and the macro-scale (i.e., imaging of changes in bulk fluorescence in large brain areas, without cellular resolution). The software described here provides a graphical user interface for intuitive data exploration and region of interest (ROI) management that can be used interactively within Jupyter Notebook: a publicly available interactive Python platform that allows simple integration of our software with existing tools for automated ROI generation and post-processing, as well as custom analysis pipelines. SamuROI software, source code and installation instructions are publicly available on GitHub and documentation is available online. SamuROI reduces the energy barrier for manual exploration and semi-automated analysis of spatially complex Ca2+ imaging datasets, particularly when these have been acquired at different spatial scales. PMID:28706482

  17. Expert diagnostics system as a part of analysis software for power mission operations

    NASA Technical Reports Server (NTRS)

    Harris, Jennifer A.; Bahrami, Khosrow A.

    1993-01-01

    The operation of interplanetary spacecraft at JPL has become an increasingly complex activity. This complexity is due to advanced spacecraft designs and ambitious mission objectives which lead to operations requirements that are more demanding than those of any previous mission. For this reason, several productivity enhancement measures are underway at JPL within mission operations, particularly in the spacecraft analysis area. These measures aimed at spacecraft analysis include: the development of a multi-mission, multi-subsystem operations environment; the introduction of automated tools into this environment; and the development of an expert diagnostics system. This paper discusses an effort to integrate the above mentioned productivity enhancement measures. A prototype was developed that integrates an expert diagnostics system into a multi-mission, multi-subsystem operations environment using the Galileo Power / Pyro Subsystem as a testbed. This prototype will be discussed in addition to background information associated with it.

  18. Fast Lamb wave energy shift approach using fully contactless ultrasonic system to characterize concrete structures

    NASA Astrophysics Data System (ADS)

    Ham, Suyun; Popovics, John S.

    2015-03-01

    Ultrasonic techniques provide an effective non-destructive evaluation (NDE) method to monitor concrete structures, but the need to perform rapid and accurate structural assessment requires evaluation of hundreds, or even thousands, of measurement datasets. Use of a fully contactless ultrasonic system can save time and labor through rapid implementation, and can enable automated and controlled data acquisition, for example through robotic scanning. Here we present results using a fully contactless ultrasonic system. This paper describes our efforts to develop a contactless ultrasonic guided wave NDE approach to detect and characterize delamination defects in concrete structures. The developed contactless sensors, controlled scanning system, and employed Multi-channel Analysis of Surface Waves (MASW) signal processing scheme are reviewed. Then a guided wave interpretation approach for MASW data is described. The presence of delamination is interpreted by guided plate wave (Lamb wave) behavior, where a shift in excited Lamb mode phase velocity, is monitored. Numerically simulated and experimental ultrasonic data collected from a concrete sample with simulated delamination defects are presented, where the occurrence of delamination is shown to be associated with a mode shift in Lamb wave energy.

  19. EEG feature selection method based on decision tree.

    PubMed

    Duan, Lijuan; Ge, Hui; Ma, Wei; Miao, Jun

    2015-01-01

    This paper aims to solve automated feature selection problem in brain computer interface (BCI). In order to automate feature selection process, we proposed a novel EEG feature selection method based on decision tree (DT). During the electroencephalogram (EEG) signal processing, a feature extraction method based on principle component analysis (PCA) was used, and the selection process based on decision tree was performed by searching the feature space and automatically selecting optimal features. Considering that EEG signals are a series of non-linear signals, a generalized linear classifier named support vector machine (SVM) was chosen. In order to test the validity of the proposed method, we applied the EEG feature selection method based on decision tree to BCI Competition II datasets Ia, and the experiment showed encouraging results.

  20. Leveraging freely available remote sensing and ancillary datasets for semi-automated identification of potential wetland areas using a Geographic Information System (GIS).

    DOT National Transportation Integrated Search

    2016-06-01

    The purpose of this study was to develop a wetland identification tool that makes use of freely available geospatial : datasets to identify potential wetland locations at a spatial scale relevant for transportation corridor assessments. The tool was ...

  1. A novel image processing technique for 3D volumetric analysis of severely resorbed alveolar sockets with CBCT.

    PubMed

    Manavella, Valeria; Romano, Federica; Garrone, Federica; Terzini, Mara; Bignardi, Cristina; Aimetti, Mario

    2017-06-01

    The aim of this study was to present and validate a novel procedure for the quantitative volumetric assessment of extraction sockets that combines cone-beam computed tomography (CBCT) and image processing techniques. The CBCT dataset of 9 severely resorbed extraction sockets was analyzed by means of two image processing software, Image J and Mimics, using manual and automated segmentation techniques. They were also applied on 5-mm spherical aluminum markers of known volume and on a polyvinyl chloride model of one alveolar socket scanned with Micro-CT to test the accuracy. Statistical differences in alveolar socket volume were found between the different methods of volumetric analysis (P<0.0001). The automated segmentation using Mimics was the most reliable and accurate method with a relative error of 1.5%, considerably smaller than the error of 7% and of 10% introduced by the manual method using Mimics and by the automated method using ImageJ. The currently proposed automated segmentation protocol for the three-dimensional rendering of alveolar sockets showed more accurate results, excellent inter-observer similarity and increased user friendliness. The clinical application of this method enables a three-dimensional evaluation of extraction socket healing after the reconstructive procedures and during the follow-up visits.

  2. Ecological Momentary Assessments and Automated Time Series Analysis to Promote Tailored Health Care: A Proof-of-Principle Study.

    PubMed

    van der Krieke, Lian; Emerencia, Ando C; Bos, Elisabeth H; Rosmalen, Judith Gm; Riese, Harriëtte; Aiello, Marco; Sytema, Sjoerd; de Jonge, Peter

    2015-08-07

    Health promotion can be tailored by combining ecological momentary assessments (EMA) with time series analysis. This combined method allows for studying the temporal order of dynamic relationships among variables, which may provide concrete indications for intervention. However, application of this method in health care practice is hampered because analyses are conducted manually and advanced statistical expertise is required. This study aims to show how this limitation can be overcome by introducing automated vector autoregressive modeling (VAR) of EMA data and to evaluate its feasibility through comparisons with results of previously published manual analyses. We developed a Web-based open source application, called AutoVAR, which automates time series analyses of EMA data and provides output that is intended to be interpretable by nonexperts. The statistical technique we used was VAR. AutoVAR tests and evaluates all possible VAR models within a given combinatorial search space and summarizes their results, thereby replacing the researcher's tasks of conducting the analysis, making an informed selection of models, and choosing the best model. We compared the output of AutoVAR to the output of a previously published manual analysis (n=4). An illustrative example consisting of 4 analyses was provided. Compared to the manual output, the AutoVAR output presents similar model characteristics and statistical results in terms of the Akaike information criterion, the Bayesian information criterion, and the test statistic of the Granger causality test. Results suggest that automated analysis and interpretation of times series is feasible. Compared to a manual procedure, the automated procedure is more robust and can save days of time. These findings may pave the way for using time series analysis for health promotion on a larger scale. AutoVAR was evaluated using the results of a previously conducted manual analysis. Analysis of additional datasets is needed in order to validate and refine the application for general use.

  3. Ecological Momentary Assessments and Automated Time Series Analysis to Promote Tailored Health Care: A Proof-of-Principle Study

    PubMed Central

    Emerencia, Ando C; Bos, Elisabeth H; Rosmalen, Judith GM; Riese, Harriëtte; Aiello, Marco; Sytema, Sjoerd; de Jonge, Peter

    2015-01-01

    Background Health promotion can be tailored by combining ecological momentary assessments (EMA) with time series analysis. This combined method allows for studying the temporal order of dynamic relationships among variables, which may provide concrete indications for intervention. However, application of this method in health care practice is hampered because analyses are conducted manually and advanced statistical expertise is required. Objective This study aims to show how this limitation can be overcome by introducing automated vector autoregressive modeling (VAR) of EMA data and to evaluate its feasibility through comparisons with results of previously published manual analyses. Methods We developed a Web-based open source application, called AutoVAR, which automates time series analyses of EMA data and provides output that is intended to be interpretable by nonexperts. The statistical technique we used was VAR. AutoVAR tests and evaluates all possible VAR models within a given combinatorial search space and summarizes their results, thereby replacing the researcher’s tasks of conducting the analysis, making an informed selection of models, and choosing the best model. We compared the output of AutoVAR to the output of a previously published manual analysis (n=4). Results An illustrative example consisting of 4 analyses was provided. Compared to the manual output, the AutoVAR output presents similar model characteristics and statistical results in terms of the Akaike information criterion, the Bayesian information criterion, and the test statistic of the Granger causality test. Conclusions Results suggest that automated analysis and interpretation of times series is feasible. Compared to a manual procedure, the automated procedure is more robust and can save days of time. These findings may pave the way for using time series analysis for health promotion on a larger scale. AutoVAR was evaluated using the results of a previously conducted manual analysis. Analysis of additional datasets is needed in order to validate and refine the application for general use. PMID:26254160

  4. Automated detection of exudates for diabetic retinopathy screening

    NASA Astrophysics Data System (ADS)

    Fleming, Alan D.; Philip, Sam; Goatman, Keith A.; Williams, Graeme J.; Olson, John A.; Sharp, Peter F.

    2007-12-01

    Automated image analysis is being widely sought to reduce the workload required for grading images resulting from diabetic retinopathy screening programmes. The recognition of exudates in retinal images is an important goal for automated analysis since these are one of the indicators that the disease has progressed to a stage requiring referral to an ophthalmologist. Candidate exudates were detected using a multi-scale morphological process. Based on local properties, the likelihoods of a candidate being a member of classes exudate, drusen or background were determined. This leads to a likelihood of the image containing exudates which can be thresholded to create a binary decision. Compared to a clinical reference standard, images containing exudates were detected with sensitivity 95.0% and specificity 84.6% in a test set of 13 219 images of which 300 contained exudates. Depending on requirements, this method could form part of an automated system to detect images showing either any diabetic retinopathy or referable diabetic retinopathy.

  5. Advancements in automated tissue segmentation pipeline for contrast-enhanced CT scans of adult and pediatric patients

    NASA Astrophysics Data System (ADS)

    Somasundaram, Elanchezhian; Kaufman, Robert; Brady, Samuel

    2017-03-01

    The development of a random forests machine learning technique is presented for fully-automated neck, chest, abdomen, and pelvis tissue segmentation of CT images using Trainable WEKA (Waikato Environment for Knowledge Analysis) Segmentation (TWS) plugin of FIJI (ImageJ, NIH). The use of a single classifier model to segment six tissue classes (lung, fat, muscle, solid organ, blood/contrast agent, bone) in the CT images is studied. An automated unbiased scheme to sample pixels from the training images and generate a balanced training dataset over the seven classes is also developed. Two independent training datasets are generated from a pool of 4 adult (>55 kg) and 3 pediatric patients (<=55 kg) with 7 manually contoured slices for each patient. Classifier training investigated 28 image filters comprising a total of 272 features. Highly correlated and insignificant features are eliminated using Correlated Feature Subset (CFS) selection with Best First Search (BFS) algorithms in WEKA. The 2 training models (from the 2 training datasets) had 74 and 71 input training features, respectively. The study also investigated the effect of varying the number of trees (25, 50, 100, and 200) in the random forest algorithm. The performance of the 2 classifier models are evaluated on inter-patient intra-slice, intrapatient inter-slice and inter-patient inter-slice test datasets. The Dice similarity coefficients (DSC) and confusion matrices are used to understand the performance of the classifiers across the tissue segments. The effect of number of features in the training input on the performance of the classifiers for tissue classes with less than optimal DSC values is also studied. The average DSC values for the two training models on the inter-patient intra-slice test data are: 0.98, 0.89, 0.87, 0.79, 0.68, and 0.84, for lung, fat, muscle, solid organ, blood/contrast agent, and bone, respectively. The study demonstrated that a robust segmentation accuracy for lung, muscle and fat tissue classes. For solid-organ, blood/contrast and bone, the performance of the segmentation pipeline improved significantly by using the advanced capabilities of WEKA. However, further improvements are needed to reduce the noise in the segmentation.

  6. Automated reconstruction of standing posture panoramas from multi-sector long limb x-ray images

    NASA Astrophysics Data System (ADS)

    Miller, Linzey; Trier, Caroline; Ben-Zikri, Yehuda K.; Linte, Cristian A.

    2016-03-01

    Due to the digital X-ray imaging system's limited field of view, several individual sector images are required to capture the posture of an individual in standing position. These images are then "stitched together" to reconstruct the standing posture. We have created an image processing application that automates the stitching, therefore minimizing user input, optimizing workflow, and reducing human error. The application begins with pre-processing the input images by removing artifacts, filtering out isolated noisy regions, and amplifying a seamless bone edge. The resulting binary images are then registered together using a rigid-body intensity based registration algorithm. The identified registration transformations are then used to map the original sector images into the panorama image. Our method focuses primarily on the use of the anatomical content of the images to generate the panoramas as opposed to using external markers employed to aid with the alignment process. Currently, results show robust edge detection prior to registration and we have tested our approach by comparing the resulting automatically-stitched panoramas to the manually stitched panoramas in terms of registration parameters, target registration error of homologous markers, and the homogeneity of the digitally subtracted automatically- and manually-stitched images using 26 patient datasets.

  7. Feature Based Retention Time Alignment for Improved HDX MS Analysis

    NASA Astrophysics Data System (ADS)

    Venable, John D.; Scuba, William; Brock, Ansgar

    2013-04-01

    An algorithm for retention time alignment of mass shifted hydrogen-deuterium exchange (HDX) data based on an iterative distance minimization procedure is described. The algorithm performs pairwise comparisons in an iterative fashion between a list of features from a reference file and a file to be time aligned to calculate a retention time mapping function. Features are characterized by their charge, retention time and mass of the monoisotopic peak. The algorithm is able to align datasets with mass shifted features, which is a prerequisite for aligning hydrogen-deuterium exchange mass spectrometry datasets. Confidence assignments from the fully automated processing of a commercial HDX software package are shown to benefit significantly from retention time alignment prior to extraction of deuterium incorporation values.

  8. An automated curation procedure for addressing chemical errors and inconsistencies in public datasets used in QSAR modelling.

    PubMed

    Mansouri, K; Grulke, C M; Richard, A M; Judson, R S; Williams, A J

    2016-11-01

    The increasing availability of large collections of chemical structures and associated experimental data provides an opportunity to build robust QSAR models for applications in different fields. One common concern is the quality of both the chemical structure information and associated experimental data. Here we describe the development of an automated KNIME workflow to curate and correct errors in the structure and identity of chemicals using the publicly available PHYSPROP physicochemical properties and environmental fate datasets. The workflow first assembles structure-identity pairs using up to four provided chemical identifiers, including chemical name, CASRNs, SMILES, and MolBlock. Problems detected included errors and mismatches in chemical structure formats, identifiers and various structure validation issues, including hypervalency and stereochemistry descriptions. Subsequently, a machine learning procedure was applied to evaluate the impact of this curation process. The performance of QSAR models built on only the highest-quality subset of the original dataset was compared with the larger curated and corrected dataset. The latter showed statistically improved predictive performance. The final workflow was used to curate the full list of PHYSPROP datasets, and is being made publicly available for further usage and integration by the scientific community.

  9. Automation of lidar-based hydrologic feature extraction workflows using GIS

    NASA Astrophysics Data System (ADS)

    Borlongan, Noel Jerome B.; de la Cruz, Roel M.; Olfindo, Nestor T.; Perez, Anjillyn Mae C.

    2016-10-01

    With the advent of LiDAR technology, higher resolution datasets become available for use in different remote sensing and GIS applications. One significant application of LiDAR datasets in the Philippines is in resource features extraction. Feature extraction using LiDAR datasets require complex and repetitive workflows which can take a lot of time for researchers through manual execution and supervision. The Development of the Philippine Hydrologic Dataset for Watersheds from LiDAR Surveys (PHD), a project under the Nationwide Detailed Resources Assessment Using LiDAR (Phil-LiDAR 2) program, created a set of scripts, the PHD Toolkit, to automate its processes and workflows necessary for hydrologic features extraction specifically Streams and Drainages, Irrigation Network, and Inland Wetlands, using LiDAR Datasets. These scripts are created in Python and can be added in the ArcGIS® environment as a toolbox. The toolkit is currently being used as an aid for the researchers in hydrologic feature extraction by simplifying the workflows, eliminating human errors when providing the inputs, and providing quick and easy-to-use tools for repetitive tasks. This paper discusses the actual implementation of different workflows developed by Phil-LiDAR 2 Project 4 in Streams, Irrigation Network and Inland Wetlands extraction.

  10. Integrating disparate lidar datasets for a regional storm tide inundation analysis of Hurricane Katrina

    USGS Publications Warehouse

    Stoker, Jason M.; Tyler, Dean J.; Turnipseed, D. Phil; Van Wilson, K.; Oimoen, Michael J.

    2009-01-01

    Hurricane Katrina was one of the largest natural disasters in U.S. history. Due to the sheer size of the affected areas, an unprecedented regional analysis at very high resolution and accuracy was needed to properly quantify and understand the effects of the hurricane and the storm tide. Many disparate sources of lidar data were acquired and processed for varying environmental reasons by pre- and post-Katrina projects. The datasets were in several formats and projections and were processed to varying phases of completion, and as a result the task of producing a seamless digital elevation dataset required a high level of coordination, research, and revision. To create a seamless digital elevation dataset, many technical issues had to be resolved before producing the desired 1/9-arc-second (3meter) grid needed as the map base for projecting the Katrina peak storm tide throughout the affected coastal region. This report presents the methodology that was developed to construct seamless digital elevation datasets from multipurpose, multi-use, and disparate lidar datasets, and describes an easily accessible Web application for viewing the maximum storm tide caused by Hurricane Katrina in southeastern Louisiana, Mississippi, and Alabama.

  11. Detecting the presence-absence of bluefin tuna by automated analysis of medium-range sonars on fishing vessels.

    PubMed

    Uranga, Jon; Arrizabalaga, Haritz; Boyra, Guillermo; Hernandez, Maria Carmen; Goñi, Nicolas; Arregui, Igor; Fernandes, Jose A; Yurramendi, Yosu; Santiago, Josu

    2017-01-01

    This study presents a methodology for the automated analysis of commercial medium-range sonar signals for detecting presence/absence of bluefin tuna (Tunnus thynnus) in the Bay of Biscay. The approach uses image processing techniques to analyze sonar screenshots. For each sonar image we extracted measurable regions and analyzed their characteristics. Scientific data was used to classify each region into a class ("tuna" or "no-tuna") and build a dataset to train and evaluate classification models by using supervised learning. The methodology performed well when validated with commercial sonar screenshots, and has the potential to automatically analyze high volumes of data at a low cost. This represents a first milestone towards the development of acoustic, fishery-independent indices of abundance for bluefin tuna in the Bay of Biscay. Future research lines and additional alternatives to inform stock assessments are also discussed.

  12. Detecting the presence-absence of bluefin tuna by automated analysis of medium-range sonars on fishing vessels

    PubMed Central

    Uranga, Jon; Arrizabalaga, Haritz; Boyra, Guillermo; Hernandez, Maria Carmen; Goñi, Nicolas; Arregui, Igor; Fernandes, Jose A.; Yurramendi, Yosu; Santiago, Josu

    2017-01-01

    This study presents a methodology for the automated analysis of commercial medium-range sonar signals for detecting presence/absence of bluefin tuna (Tunnus thynnus) in the Bay of Biscay. The approach uses image processing techniques to analyze sonar screenshots. For each sonar image we extracted measurable regions and analyzed their characteristics. Scientific data was used to classify each region into a class (“tuna” or “no-tuna”) and build a dataset to train and evaluate classification models by using supervised learning. The methodology performed well when validated with commercial sonar screenshots, and has the potential to automatically analyze high volumes of data at a low cost. This represents a first milestone towards the development of acoustic, fishery-independent indices of abundance for bluefin tuna in the Bay of Biscay. Future research lines and additional alternatives to inform stock assessments are also discussed. PMID:28152032

  13. Multi-scale Gaussian representation and outline-learning based cell image segmentation.

    PubMed

    Farhan, Muhammad; Ruusuvuori, Pekka; Emmenlauer, Mario; Rämö, Pauli; Dehio, Christoph; Yli-Harja, Olli

    2013-01-01

    High-throughput genome-wide screening to study gene-specific functions, e.g. for drug discovery, demands fast automated image analysis methods to assist in unraveling the full potential of such studies. Image segmentation is typically at the forefront of such analysis as the performance of the subsequent steps, for example, cell classification, cell tracking etc., often relies on the results of segmentation. We present a cell cytoplasm segmentation framework which first separates cell cytoplasm from image background using novel approach of image enhancement and coefficient of variation of multi-scale Gaussian scale-space representation. A novel outline-learning based classification method is developed using regularized logistic regression with embedded feature selection which classifies image pixels as outline/non-outline to give cytoplasm outlines. Refinement of the detected outlines to separate cells from each other is performed in a post-processing step where the nuclei segmentation is used as contextual information. We evaluate the proposed segmentation methodology using two challenging test cases, presenting images with completely different characteristics, with cells of varying size, shape, texture and degrees of overlap. The feature selection and classification framework for outline detection produces very simple sparse models which use only a small subset of the large, generic feature set, that is, only 7 and 5 features for the two cases. Quantitative comparison of the results for the two test cases against state-of-the-art methods show that our methodology outperforms them with an increase of 4-9% in segmentation accuracy with maximum accuracy of 93%. Finally, the results obtained for diverse datasets demonstrate that our framework not only produces accurate segmentation but also generalizes well to different segmentation tasks.

  14. Multi-scale Gaussian representation and outline-learning based cell image segmentation

    PubMed Central

    2013-01-01

    Background High-throughput genome-wide screening to study gene-specific functions, e.g. for drug discovery, demands fast automated image analysis methods to assist in unraveling the full potential of such studies. Image segmentation is typically at the forefront of such analysis as the performance of the subsequent steps, for example, cell classification, cell tracking etc., often relies on the results of segmentation. Methods We present a cell cytoplasm segmentation framework which first separates cell cytoplasm from image background using novel approach of image enhancement and coefficient of variation of multi-scale Gaussian scale-space representation. A novel outline-learning based classification method is developed using regularized logistic regression with embedded feature selection which classifies image pixels as outline/non-outline to give cytoplasm outlines. Refinement of the detected outlines to separate cells from each other is performed in a post-processing step where the nuclei segmentation is used as contextual information. Results and conclusions We evaluate the proposed segmentation methodology using two challenging test cases, presenting images with completely different characteristics, with cells of varying size, shape, texture and degrees of overlap. The feature selection and classification framework for outline detection produces very simple sparse models which use only a small subset of the large, generic feature set, that is, only 7 and 5 features for the two cases. Quantitative comparison of the results for the two test cases against state-of-the-art methods show that our methodology outperforms them with an increase of 4-9% in segmentation accuracy with maximum accuracy of 93%. Finally, the results obtained for diverse datasets demonstrate that our framework not only produces accurate segmentation but also generalizes well to different segmentation tasks. PMID:24267488

  15. An automated multi-model based evapotranspiration estimation framework for understanding crop-climate interactions in India

    NASA Astrophysics Data System (ADS)

    Bhattarai, N.; Jain, M.; Mallick, K.

    2017-12-01

    A remote sensing based multi-model evapotranspiration (ET) estimation framework is developed using MODIS and NASA Merra-2 reanalysis data for data poor regions, and we apply this framework to the Indian subcontinent. The framework eliminates the need for in-situ calibration data and hence estimates ET completely from space and is replicable across all regions in the world. Currently, six surface energy balance models ranging from widely-used SEBAL, METRIC, and SEBS to moderately-used S-SEBI, SSEBop, and a relatively new model, STIC1.2 are being integrated and validated. Preliminary analysis suggests good predictability of the models for estimating near- real time ET under clear sky conditions from various crop types in India with coefficient of determination 0.32-0.55 and percent bias -15%-28%, when compared against Bowen Ratio based ET estimates. The results are particularly encouraging given that no direct ground input data were used in the analysis. The framework is currently being extended to estimate seasonal ET across the Indian subcontinent using a model-ensemble approach that uses all available MODIS 8-day datasets since 2000. These ET products are being used to monitor inter-seasonal and inter-annual dynamics of ET and crop water use across different crop and irrigation practices in India. Particularly, the potential impacts of changes in precipitation patterns and extreme heat (e.g., extreme degree days) on seasonal crop water consumption is being studied. Our ET products are able to locate the water stress hotspots that need to be targeted with water saving interventions to maintain agricultural production in the face of climate variability and change.

  16. Particle Swarm Optimization approach to defect detection in armour ceramics.

    PubMed

    Kesharaju, Manasa; Nagarajah, Romesh

    2017-03-01

    In this research, various extracted features were used in the development of an automated ultrasonic sensor based inspection system that enables defect classification in each ceramic component prior to despatch to the field. Classification is an important task and large number of irrelevant, redundant features commonly introduced to a dataset reduces the classifiers performance. Feature selection aims to reduce the dimensionality of the dataset while improving the performance of a classification system. In the context of a multi-criteria optimization problem (i.e. to minimize classification error rate and reduce number of features) such as one discussed in this research, the literature suggests that evolutionary algorithms offer good results. Besides, it is noted that Particle Swarm Optimization (PSO) has not been explored especially in the field of classification of high frequency ultrasonic signals. Hence, a binary coded Particle Swarm Optimization (BPSO) technique is investigated in the implementation of feature subset selection and to optimize the classification error rate. In the proposed method, the population data is used as input to an Artificial Neural Network (ANN) based classification system to obtain the error rate, as ANN serves as an evaluator of PSO fitness function. Copyright © 2016. Published by Elsevier B.V.

  17. Statistical segmentation of multidimensional brain datasets

    NASA Astrophysics Data System (ADS)

    Desco, Manuel; Gispert, Juan D.; Reig, Santiago; Santos, Andres; Pascau, Javier; Malpica, Norberto; Garcia-Barreno, Pedro

    2001-07-01

    This paper presents an automatic segmentation procedure for MRI neuroimages that overcomes part of the problems involved in multidimensional clustering techniques like partial volume effects (PVE), processing speed and difficulty of incorporating a priori knowledge. The method is a three-stage procedure: 1) Exclusion of background and skull voxels using threshold-based region growing techniques with fully automated seed selection. 2) Expectation Maximization algorithms are used to estimate the probability density function (PDF) of the remaining pixels, which are assumed to be mixtures of gaussians. These pixels can then be classified into cerebrospinal fluid (CSF), white matter and grey matter. Using this procedure, our method takes advantage of using the full covariance matrix (instead of the diagonal) for the joint PDF estimation. On the other hand, logistic discrimination techniques are more robust against violation of multi-gaussian assumptions. 3) A priori knowledge is added using Markov Random Field techniques. The algorithm has been tested with a dataset of 30 brain MRI studies (co-registered T1 and T2 MRI). Our method was compared with clustering techniques and with template-based statistical segmentation, using manual segmentation as a gold-standard. Our results were more robust and closer to the gold-standard.

  18. Multi-view L2-SVM and its multi-view core vector machine.

    PubMed

    Huang, Chengquan; Chung, Fu-lai; Wang, Shitong

    2016-03-01

    In this paper, a novel L2-SVM based classifier Multi-view L2-SVM is proposed to address multi-view classification tasks. The proposed Multi-view L2-SVM classifier does not have any bias in its objective function and hence has the flexibility like μ-SVC in the sense that the number of the yielded support vectors can be controlled by a pre-specified parameter. The proposed Multi-view L2-SVM classifier can make full use of the coherence and the difference of different views through imposing the consensus among multiple views to improve the overall classification performance. Besides, based on the generalized core vector machine GCVM, the proposed Multi-view L2-SVM classifier is extended into its GCVM version MvCVM which can realize its fast training on large scale multi-view datasets, with its asymptotic linear time complexity with the sample size and its space complexity independent of the sample size. Our experimental results demonstrated the effectiveness of the proposed Multi-view L2-SVM classifier for small scale multi-view datasets and the proposed MvCVM classifier for large scale multi-view datasets. Copyright © 2015 Elsevier Ltd. All rights reserved.

  19. Semi-automated Neuron Boundary Detection and Nonbranching Process Segmentation in Electron Microscopy Images

    PubMed Central

    Jurrus, Elizabeth; Watanabe, Shigeki; Giuly, Richard J.; Paiva, Antonio R. C.; Ellisman, Mark H.; Jorgensen, Erik M.; Tasdizen, Tolga

    2013-01-01

    Neuroscientists are developing new imaging techniques and generating large volumes of data in an effort to understand the complex structure of the nervous system. The complexity and size of this data makes human interpretation a labor-intensive task. To aid in the analysis, new segmentation techniques for identifying neurons in these feature rich datasets are required. This paper presents a method for neuron boundary detection and nonbranching process segmentation in electron microscopy images and visualizing them in three dimensions. It combines both automated segmentation techniques with a graphical user interface for correction of mistakes in the automated process. The automated process first uses machine learning and image processing techniques to identify neuron membranes that deliniate the cells in each two-dimensional section. To segment nonbranching processes, the cell regions in each two-dimensional section are connected in 3D using correlation of regions between sections. The combination of this method with a graphical user interface specially designed for this purpose, enables users to quickly segment cellular processes in large volumes. PMID:22644867

  20. Semi-Automated Neuron Boundary Detection and Nonbranching Process Segmentation in Electron Microscopy Images

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Jurrus, Elizabeth R.; Watanabe, Shigeki; Giuly, Richard J.

    2013-01-01

    Neuroscientists are developing new imaging techniques and generating large volumes of data in an effort to understand the complex structure of the nervous system. The complexity and size of this data makes human interpretation a labor-intensive task. To aid in the analysis, new segmentation techniques for identifying neurons in these feature rich datasets are required. This paper presents a method for neuron boundary detection and nonbranching process segmentation in electron microscopy images and visualizing them in three dimensions. It combines both automated segmentation techniques with a graphical user interface for correction of mistakes in the automated process. The automated processmore » first uses machine learning and image processing techniques to identify neuron membranes that deliniate the cells in each two-dimensional section. To segment nonbranching processes, the cell regions in each two-dimensional section are connected in 3D using correlation of regions between sections. The combination of this method with a graphical user interface specially designed for this purpose, enables users to quickly segment cellular processes in large volumes.« less

  1. An Analysis of the Climate Data Initiative's Data Collection

    NASA Astrophysics Data System (ADS)

    Ramachandran, R.; Bugbee, K.

    2015-12-01

    The Climate Data Initiative (CDI) is a broad multi-agency effort of the U.S. government that seeks to leverage the extensive existing federal climate-relevant data to stimulate innovation and private-sector entrepreneurship to support national climate-change preparedness. The CDI project is a systematic effort to manually curate and share openly available climate data from various federal agencies. To date, the CDI has curated seven themes, or topics, relevant to climate change resiliency. These themes include Coastal Flooding, Food Resilience, Water, Ecosystem Vulnerability, Human Health, Energy Infrastructure, and Transportation. Each theme was curated by subject matter experts who selected datasets relevant to the topic at hand. An analysis of the entire Climate Data Initiative data collection and the data curated for each theme offers insights into which datasets are considered most relevant in addressing climate resiliency. Other aspects of the data collection will be examined including which datasets were the most visited or popular and which datasets were the most sought after for curation by the theme teams. Results from the analysis of the CDI collection will be presented in this talk.

  2. Joining the yellow hub: Uses of the Simple Application Messaging Protocol in Space Physics analysis tools

    NASA Astrophysics Data System (ADS)

    Génot, V.; André, N.; Cecconi, B.; Bouchemit, M.; Budnik, E.; Bourrel, N.; Gangloff, M.; Dufourg, N.; Hess, S.; Modolo, R.; Renard, B.; Lormant, N.; Beigbeder, L.; Popescu, D.; Toniutti, J.-P.

    2014-11-01

    The interest for data communication between analysis tools in planetary sciences and space physics is illustrated in this paper via several examples of the uses of SAMP. The Simple Application Messaging Protocol is developed in the frame of the IVOA from an earlier protocol called PLASTIC. SAMP enables easy communication and interoperability between astronomy software, stand-alone and web-based; it is now increasingly adopted by the planetary sciences and space physics community. Its attractiveness is based, on one hand, on the use of common file formats for exchange and, on the other hand, on established messaging models. Examples of uses at the CDPP and elsewhere are presented. The CDPP (Centre de Données de la Physique des Plasmas, http://cdpp.eu/), the French data center for plasma physics, is engaged for more than a decade in the archiving and dissemination of data products from space missions and ground observatories. Besides these activities, the CDPP developed services like AMDA (Automated Multi Dataset Analysis, http://amda.cdpp.eu/) which enables in depth analysis of large amount of data through dedicated functionalities such as: visualization, conditional search and cataloging. Besides AMDA, the 3DView (http://3dview.cdpp.eu/) tool provides immersive visualizations and is further developed to include simulation and observational data. These tools and their interactions with each other, notably via SAMP, are presented via science cases of interest to planetary sciences and space physics communities.

  3. An Integrated Systems Genetics and Omics Toolkit to Probe Gene Function.

    PubMed

    Li, Hao; Wang, Xu; Rukina, Daria; Huang, Qingyao; Lin, Tao; Sorrentino, Vincenzo; Zhang, Hongbo; Bou Sleiman, Maroun; Arends, Danny; McDaid, Aaron; Luan, Peiling; Ziari, Naveed; Velázquez-Villegas, Laura A; Gariani, Karim; Kutalik, Zoltan; Schoonjans, Kristina; Radcliffe, Richard A; Prins, Pjotr; Morgenthaler, Stephan; Williams, Robert W; Auwerx, Johan

    2018-01-24

    Identifying genetic and environmental factors that impact complex traits and common diseases is a high biomedical priority. Here, we developed, validated, and implemented a series of multi-layered systems approaches, including (expression-based) phenome-wide association, transcriptome-/proteome-wide association, and (reverse-) mediation analysis, in an open-access web server (systems-genetics.org) to expedite the systems dissection of gene function. We applied these approaches to multi-omics datasets from the BXD mouse genetic reference population, and identified and validated associations between genes and clinical and molecular phenotypes, including previously unreported links between Rpl26 and body weight, and Cpt1a and lipid metabolism. Furthermore, through mediation and reverse-mediation analysis we established regulatory relations between genes, such as the co-regulation of BCKDHA and BCKDHB protein levels, and identified targets of transcription factors E2F6, ZFP277, and ZKSCAN1. Our multifaceted toolkit enabled the identification of gene-gene and gene-phenotype links that are robust and that translate well across populations and species, and can be universally applied to any populations with multi-omics datasets. Copyright © 2017 The Authors. Published by Elsevier Inc. All rights reserved.

  4. SHARP: A multi-mission artificial intelligence system for spacecraft telemetry monitoring and diagnosis

    NASA Technical Reports Server (NTRS)

    Lawson, Denise L.; James, Mark L.

    1989-01-01

    The Spacecraft Health Automated Reasoning Prototype (SHARP) is a system designed to demonstrate automated health and status analysis for multi-mission spacecraft and ground data systems operations. Telecommunications link analysis of the Voyager 2 spacecraft is the initial focus for the SHARP system demonstration which will occur during Voyager's encounter with the planet Neptune in August, 1989, in parallel with real time Voyager operations. The SHARP system combines conventional computer science methodologies with artificial intelligence techniques to produce an effective method for detecting and analyzing potential spacecraft and ground systems problems. The system performs real time analysis of spacecraft and other related telemetry, and is also capable of examining data in historical context. A brief introduction is given to the spacecraft and ground systems monitoring process at the Jet Propulsion Laboratory. The current method of operation for monitoring the Voyager Telecommunications subsystem is described, and the difficulties associated with the existing technology are highlighted. The approach taken in the SHARP system to overcome the current limitations is also described, as well as both the conventional and artificial intelligence solutions developed in SHARP.

  5. SHARP: A multi-mission AI system for spacecraft telemetry monitoring and diagnosis

    NASA Technical Reports Server (NTRS)

    Lawson, Denise L.; James, Mark L.

    1989-01-01

    The Spacecraft Health Automated Reasoning Prototype (SHARP) is a system designed to demonstrate automated health and status analysis for multi-mission spacecraft and ground data systems operations. Telecommunications link analysis of the Voyager II spacecraft is the initial focus for the SHARP system demonstration which will occur during Voyager's encounter with the planet Neptune in August, 1989, in parallel with real-time Voyager operations. The SHARP system combines conventional computer science methodologies with artificial intelligence techniques to produce an effective method for detecting and analyzing potential spacecraft and ground systems problems. The system performs real-time analysis of spacecraft and other related telemetry, and is also capable of examining data in historical context. A brief introduction is given to the spacecraft and ground systems monitoring process at the Jet Propulsion Laboratory. The current method of operation for monitoring the Voyager Telecommunications subsystem is described, and the difficulties associated with the existing technology are highlighted. The approach taken in the SHARP system to overcome the current limitations is also described, as well as both the conventional and artificial intelligence solutions developed in SHARP.

  6. Transthoracic 3D echocardiographic left heart chamber quantification in patients with bicuspid aortic valve disease.

    PubMed

    van den Hoven, Allard T; Mc-Ghie, Jackie S; Chelu, Raluca G; Duijnhouwer, Anthonie L; Baggen, Vivan J M; Coenen, Adriaan; Vletter, Wim B; Dijkshoorn, Marcel L; van den Bosch, Annemien E; Roos-Hesselink, Jolien W

    2017-12-01

    Integration of volumetric heart chamber quantification by 3D echocardiography into clinical practice has been hampered by several factors which a new fully automated algorithm (Left Heart Model, (LHM)) may help overcome. This study therefore aims to evaluate the feasibility and accuracy of the LHM software in quantifying left atrial and left ventricular volumes and left ventricular ejection fraction in a cohort of patients with a bicuspid aortic valve. Patients with a bicuspid aortic valve were prospectively included. All patients underwent 2D and 3D transthoracic echocardiography and computed tomography. Left atrial and ventricular volumes were obtained using the automated program, which did not require manual contour detection. For comparison manual and semi-automated measurements were performed using conventional 2D and 3D datasets. 53 patients were included, in four of those patients no 3D dataset could be acquired. Additionally, 12 patients were excluded based on poor imaging quality. Left ventricular end-diastolic and end-systolic volumes and ejection fraction calculated by the LHM correlated well with manual 2D and 3D measurements (Pearson's r between 0.43 and 0.97, p < 0.05). Left atrial volume (LAV) also correlated significantly although LHM did estimate larger LAV compared to both 2DE and 3DE (Pearson's r between 0.61 and 0.81, p < 0.01). The fully automated software works well in a real-world setting and helps to overcome some of the major hurdles in integrating 3D analysis into daily practice, as it is user-independent and highly reproducible in a group of patients with a clearly defined and well-studied valvular abnormality.

  7. Innovative techniques with multi-purpose survey vehicle for automated analysis of cross-slope data.

    DOT National Transportation Integrated Search

    2007-11-02

    Manual surveying methods have long been used in the field of highway engineering to determine : the cross-slope, and longitudinal grade of an existing roadway. However, these methods are : slow, tedious and labor intensive. Moreover, manual survey me...

  8. Information Graphic Classification, Decomposition and Alternative Representation

    ERIC Educational Resources Information Center

    Gao, Jinglun

    2012-01-01

    This thesis work is mainly focused on two problems related to improving accessibility of information graphics for visually impaired users. The first problem is automated analysis of information graphics for information extraction and the second problem is multi-modal representations for accessibility. Information graphics are graphical…

  9. Validation of geometric measurements of the left atrium and pulmonary veins for analysis of reverse structural remodeling following ablation therapy

    NASA Astrophysics Data System (ADS)

    Rettmann, M. E.; Holmes, D. R., III; Gunawan, M. S.; Ge, X.; Karwoski, R. A.; Breen, J. F.; Packer, D. L.; Robb, R. A.

    2012-03-01

    Geometric analysis of the left atrium and pulmonary veins is important for studying reverse structural remodeling following cardiac ablation therapy. It has been shown that the left atrium decreases in volume and the pulmonary vein ostia decrease in diameter following ablation therapy. Most analysis techniques, however, require laborious manual tracing of image cross-sections. Pulmonary vein diameters are typically measured at the junction between the left atrium and pulmonary veins, called the pulmonary vein ostia, with manually drawn lines on volume renderings or on image cross-sections. In this work, we describe a technique for making semi-automatic measurements of the left atrium and pulmonary vein ostial diameters from high resolution CT scans and multi-phase datasets. The left atrium and pulmonary veins are segmented from a CT volume using a 3D volume approach and cut planes are interactively positioned to separate the pulmonary veins from the body of the left atrium. The cut plane is also used to compute the pulmonary vein ostial diameter. Validation experiments are presented which demonstrate the ability to repeatedly measure left atrial volume and pulmonary vein diameters from high resolution CT scans, as well as the feasibility of this approach for analyzing dynamic, multi-phase datasets. In the high resolution CT scans the left atrial volume measurements show high repeatability with approximately 4% intra-rater repeatability and 8% inter-rater repeatability. Intra- and inter-rater repeatability for pulmonary vein diameter measurements range from approximately 2 to 4 mm. For the multi-phase CT datasets, differences in left atrial volumes between a standard slice-by-slice approach and the proposed 3D volume approach are small, with percent differences on the order of 3% to 6%.

  10. Confidence-based ensemble for GBM brain tumor segmentation

    NASA Astrophysics Data System (ADS)

    Huo, Jing; van Rikxoort, Eva M.; Okada, Kazunori; Kim, Hyun J.; Pope, Whitney; Goldin, Jonathan; Brown, Matthew

    2011-03-01

    It is a challenging task to automatically segment glioblastoma multiforme (GBM) brain tumors on T1w post-contrast isotropic MR images. A semi-automated system using fuzzy connectedness has recently been developed for computing the tumor volume that reduces the cost of manual annotation. In this study, we propose a an ensemble method that combines multiple segmentation results into a final ensemble one. The method is evaluated on a dataset of 20 cases from a multi-center pharmaceutical drug trial and compared to the fuzzy connectedness method. Three individual methods were used in the framework: fuzzy connectedness, GrowCut, and voxel classification. The combination method is a confidence map averaging (CMA) method. The CMA method shows an improved ROC curve compared to the fuzzy connectedness method (p < 0.001). The CMA ensemble result is more robust compared to the three individual methods.

  11. Automated image based prominent nucleoli detection

    PubMed Central

    Yap, Choon K.; Kalaw, Emarene M.; Singh, Malay; Chong, Kian T.; Giron, Danilo M.; Huang, Chao-Hui; Cheng, Li; Law, Yan N.; Lee, Hwee Kuan

    2015-01-01

    Introduction: Nucleolar changes in cancer cells are one of the cytologic features important to the tumor pathologist in cancer assessments of tissue biopsies. However, inter-observer variability and the manual approach to this work hamper the accuracy of the assessment by pathologists. In this paper, we propose a computational method for prominent nucleoli pattern detection. Materials and Methods: Thirty-five hematoxylin and eosin stained images were acquired from prostate cancer, breast cancer, renal clear cell cancer and renal papillary cell cancer tissues. Prostate cancer images were used for the development of a computer-based automated prominent nucleoli pattern detector built on a cascade farm. An ensemble of approximately 1000 cascades was constructed by permuting different combinations of classifiers such as support vector machines, eXclusive component analysis, boosting, and logistic regression. The output of cascades was then combined using the RankBoost algorithm. The output of our prominent nucleoli pattern detector is a ranked set of detected image patches of patterns of prominent nucleoli. Results: The mean number of detected prominent nucleoli patterns in the top 100 ranked detected objects was 58 in the prostate cancer dataset, 68 in the breast cancer dataset, 86 in the renal clear cell cancer dataset, and 76 in the renal papillary cell cancer dataset. The proposed cascade farm performs twice as good as the use of a single cascade proposed in the seminal paper by Viola and Jones. For comparison, a naive algorithm that randomly chooses a pixel as a nucleoli pattern would detect five correct patterns in the first 100 ranked objects. Conclusions: Detection of sparse nucleoli patterns in a large background of highly variable tissue patterns is a difficult challenge our method has overcome. This study developed an accurate prominent nucleoli pattern detector with the potential to be used in the clinical settings. PMID:26167383

  12. Automated image based prominent nucleoli detection.

    PubMed

    Yap, Choon K; Kalaw, Emarene M; Singh, Malay; Chong, Kian T; Giron, Danilo M; Huang, Chao-Hui; Cheng, Li; Law, Yan N; Lee, Hwee Kuan

    2015-01-01

    Nucleolar changes in cancer cells are one of the cytologic features important to the tumor pathologist in cancer assessments of tissue biopsies. However, inter-observer variability and the manual approach to this work hamper the accuracy of the assessment by pathologists. In this paper, we propose a computational method for prominent nucleoli pattern detection. Thirty-five hematoxylin and eosin stained images were acquired from prostate cancer, breast cancer, renal clear cell cancer and renal papillary cell cancer tissues. Prostate cancer images were used for the development of a computer-based automated prominent nucleoli pattern detector built on a cascade farm. An ensemble of approximately 1000 cascades was constructed by permuting different combinations of classifiers such as support vector machines, eXclusive component analysis, boosting, and logistic regression. The output of cascades was then combined using the RankBoost algorithm. The output of our prominent nucleoli pattern detector is a ranked set of detected image patches of patterns of prominent nucleoli. The mean number of detected prominent nucleoli patterns in the top 100 ranked detected objects was 58 in the prostate cancer dataset, 68 in the breast cancer dataset, 86 in the renal clear cell cancer dataset, and 76 in the renal papillary cell cancer dataset. The proposed cascade farm performs twice as good as the use of a single cascade proposed in the seminal paper by Viola and Jones. For comparison, a naive algorithm that randomly chooses a pixel as a nucleoli pattern would detect five correct patterns in the first 100 ranked objects. Detection of sparse nucleoli patterns in a large background of highly variable tissue patterns is a difficult challenge our method has overcome. This study developed an accurate prominent nucleoli pattern detector with the potential to be used in the clinical settings.

  13. PANDA: a pipeline toolbox for analyzing brain diffusion images.

    PubMed

    Cui, Zaixu; Zhong, Suyu; Xu, Pengfei; He, Yong; Gong, Gaolang

    2013-01-01

    Diffusion magnetic resonance imaging (dMRI) is widely used in both scientific research and clinical practice in in-vivo studies of the human brain. While a number of post-processing packages have been developed, fully automated processing of dMRI datasets remains challenging. Here, we developed a MATLAB toolbox named "Pipeline for Analyzing braiN Diffusion imAges" (PANDA) for fully automated processing of brain diffusion images. The processing modules of a few established packages, including FMRIB Software Library (FSL), Pipeline System for Octave and Matlab (PSOM), Diffusion Toolkit and MRIcron, were employed in PANDA. Using any number of raw dMRI datasets from different subjects, in either DICOM or NIfTI format, PANDA can automatically perform a series of steps to process DICOM/NIfTI to diffusion metrics [e.g., fractional anisotropy (FA) and mean diffusivity (MD)] that are ready for statistical analysis at the voxel-level, the atlas-level and the Tract-Based Spatial Statistics (TBSS)-level and can finish the construction of anatomical brain networks for all subjects. In particular, PANDA can process different subjects in parallel, using multiple cores either in a single computer or in a distributed computing environment, thus greatly reducing the time cost when dealing with a large number of datasets. In addition, PANDA has a friendly graphical user interface (GUI), allowing the user to be interactive and to adjust the input/output settings, as well as the processing parameters. As an open-source package, PANDA is freely available at http://www.nitrc.org/projects/panda/. This novel toolbox is expected to substantially simplify the image processing of dMRI datasets and facilitate human structural connectome studies.

  14. PANDA: a pipeline toolbox for analyzing brain diffusion images

    PubMed Central

    Cui, Zaixu; Zhong, Suyu; Xu, Pengfei; He, Yong; Gong, Gaolang

    2013-01-01

    Diffusion magnetic resonance imaging (dMRI) is widely used in both scientific research and clinical practice in in-vivo studies of the human brain. While a number of post-processing packages have been developed, fully automated processing of dMRI datasets remains challenging. Here, we developed a MATLAB toolbox named “Pipeline for Analyzing braiN Diffusion imAges” (PANDA) for fully automated processing of brain diffusion images. The processing modules of a few established packages, including FMRIB Software Library (FSL), Pipeline System for Octave and Matlab (PSOM), Diffusion Toolkit and MRIcron, were employed in PANDA. Using any number of raw dMRI datasets from different subjects, in either DICOM or NIfTI format, PANDA can automatically perform a series of steps to process DICOM/NIfTI to diffusion metrics [e.g., fractional anisotropy (FA) and mean diffusivity (MD)] that are ready for statistical analysis at the voxel-level, the atlas-level and the Tract-Based Spatial Statistics (TBSS)-level and can finish the construction of anatomical brain networks for all subjects. In particular, PANDA can process different subjects in parallel, using multiple cores either in a single computer or in a distributed computing environment, thus greatly reducing the time cost when dealing with a large number of datasets. In addition, PANDA has a friendly graphical user interface (GUI), allowing the user to be interactive and to adjust the input/output settings, as well as the processing parameters. As an open-source package, PANDA is freely available at http://www.nitrc.org/projects/panda/. This novel toolbox is expected to substantially simplify the image processing of dMRI datasets and facilitate human structural connectome studies. PMID:23439846

  15. Understanding reliance on automation: effects of error type, error distribution, age and experience

    PubMed Central

    Sanchez, Julian; Rogers, Wendy A.; Fisk, Arthur D.; Rovira, Ericka

    2015-01-01

    An obstacle detection task supported by “imperfect” automation was used with the goal of understanding the effects of automation error types and age on automation reliance. Sixty younger and sixty older adults interacted with a multi-task simulation of an agricultural vehicle (i.e. a virtual harvesting combine). The simulator included an obstacle detection task and a fully manual tracking task. A micro-level analysis provided insight into the way reliance patterns change over time. The results indicated that there are distinct patterns of reliance that develop as a function of error type. A prevalence of automation false alarms led participants to under-rely on the automation during alarm states while over relying on it during non-alarms states. Conversely, a prevalence of automation misses led participants to over-rely on automated alarms and under-rely on the automation during non-alarm states. Older adults adjusted their behavior according to the characteristics of the automation similarly to younger adults, although it took them longer to do so. The results of this study suggest the relationship between automation reliability and reliance depends on the prevalence of specific errors and on the state of the system. Understanding the effects of automation detection criterion settings on human-automation interaction can help designers of automated systems make predictions about human behavior and system performance as a function of the characteristics of the automation. PMID:25642142

  16. Understanding reliance on automation: effects of error type, error distribution, age and experience.

    PubMed

    Sanchez, Julian; Rogers, Wendy A; Fisk, Arthur D; Rovira, Ericka

    2014-03-01

    An obstacle detection task supported by "imperfect" automation was used with the goal of understanding the effects of automation error types and age on automation reliance. Sixty younger and sixty older adults interacted with a multi-task simulation of an agricultural vehicle (i.e. a virtual harvesting combine). The simulator included an obstacle detection task and a fully manual tracking task. A micro-level analysis provided insight into the way reliance patterns change over time. The results indicated that there are distinct patterns of reliance that develop as a function of error type. A prevalence of automation false alarms led participants to under-rely on the automation during alarm states while over relying on it during non-alarms states. Conversely, a prevalence of automation misses led participants to over-rely on automated alarms and under-rely on the automation during non-alarm states. Older adults adjusted their behavior according to the characteristics of the automation similarly to younger adults, although it took them longer to do so. The results of this study suggest the relationship between automation reliability and reliance depends on the prevalence of specific errors and on the state of the system. Understanding the effects of automation detection criterion settings on human-automation interaction can help designers of automated systems make predictions about human behavior and system performance as a function of the characteristics of the automation.

  17. Two novel motion-based algorithms for surveillance video analysis on embedded platforms

    NASA Astrophysics Data System (ADS)

    Vijverberg, Julien A.; Loomans, Marijn J. H.; Koeleman, Cornelis J.; de With, Peter H. N.

    2010-05-01

    This paper proposes two novel motion-vector based techniques for target detection and target tracking in surveillance videos. The algorithms are designed to operate on a resource-constrained device, such as a surveillance camera, and to reuse the motion vectors generated by the video encoder. The first novel algorithm for target detection uses motion vectors to construct a consistent motion mask, which is combined with a simple background segmentation technique to obtain a segmentation mask. The second proposed algorithm aims at multi-target tracking and uses motion vectors to assign blocks to targets employing five features. The weights of these features are adapted based on the interaction between targets. These algorithms are combined in one complete analysis application. The performance of this application for target detection has been evaluated for the i-LIDS sterile zone dataset and achieves an F1-score of 0.40-0.69. The performance of the analysis algorithm for multi-target tracking has been evaluated using the CAVIAR dataset and achieves an MOTP of around 9.7 and MOTA of 0.17-0.25. On a selection of targets in videos from other datasets, the achieved MOTP and MOTA are 8.8-10.5 and 0.32-0.49 respectively. The execution time on a PC-based platform is 36 ms. This includes the 20 ms for generating motion vectors, which are also required by the video encoder.

  18. Automated seed localization from CT datasets of the prostate.

    PubMed

    Brinkmann, D H; Kline, R W

    1998-09-01

    With the increasing utilization of permanent brachytherapy implants for treating carcinoma of the prostate, the importance of accurate post-treatment dose calculation also increases for assessing patient outcome and planning future treatments. An automatic method for seed localization of permanent brachytherapy implants, using CT datasets of the prostate, has been developed and tested on a phantom using an actual patient planned seed distribution. This method was also compared to results with the three-film technique for three patient datasets. The automatic method is as accurate or more accurate than the three film technique for 1 mm, 3 mm, and 5 mm contiguous CT slices, and eliminates the inter- and intra-observer variability of the manual methods. The automated method improves the localization of brachytherapy seeds while reducing the time required for the user to input information, and is demonstrated to be less operator dependent, less time consuming, and potentially more accurate than the three-film technique.

  19. Multimodal Teaching Analytics: Automated Extraction of Orchestration Graphs from Wearable Sensor Data.

    PubMed

    Prieto, Luis P; Sharma, Kshitij; Kidzinski, Łukasz; Rodríguez-Triana, María Jesús; Dillenbourg, Pierre

    2018-04-01

    The pedagogical modelling of everyday classroom practice is an interesting kind of evidence, both for educational research and teachers' own professional development. This paper explores the usage of wearable sensors and machine learning techniques to automatically extract orchestration graphs (teaching activities and their social plane over time), on a dataset of 12 classroom sessions enacted by two different teachers in different classroom settings. The dataset included mobile eye-tracking as well as audiovisual and accelerometry data from sensors worn by the teacher. We evaluated both time-independent and time-aware models, achieving median F1 scores of about 0.7-0.8 on leave-one-session-out k-fold cross-validation. Although these results show the feasibility of this approach, they also highlight the need for larger datasets, recorded in a wider variety of classroom settings, to provide automated tagging of classroom practice that can be used in everyday practice across multiple teachers.

  20. An assessment of the cultivated cropland class of NLCD 2006 using a multi-source and multi-criteria approach

    USGS Publications Warehouse

    Danielson, Patrick; Yang, Limin; Jin, Suming; Homer, Collin G.; Napton, Darrell

    2016-01-01

    We developed a method that analyzes the quality of the cultivated cropland class mapped in the USA National Land Cover Database (NLCD) 2006. The method integrates multiple geospatial datasets and a Multi Index Integrated Change Analysis (MIICA) change detection method that captures spectral changes to identify the spatial distribution and magnitude of potential commission and omission errors for the cultivated cropland class in NLCD 2006. The majority of the commission and omission errors in NLCD 2006 are in areas where cultivated cropland is not the most dominant land cover type. The errors are primarily attributed to the less accurate training dataset derived from the National Agricultural Statistics Service Cropland Data Layer dataset. In contrast, error rates are low in areas where cultivated cropland is the dominant land cover. Agreement between model-identified commission errors and independently interpreted reference data was high (79%). Agreement was low (40%) for omission error comparison. The majority of the commission errors in the NLCD 2006 cultivated crops were confused with low-intensity developed classes, while the majority of omission errors were from herbaceous and shrub classes. Some errors were caused by inaccurate land cover change from misclassification in NLCD 2001 and the subsequent land cover post-classification process.

  1. A multi-source dataset of urban life in the city of Milan and the Province of Trentino.

    PubMed

    Barlacchi, Gianni; De Nadai, Marco; Larcher, Roberto; Casella, Antonio; Chitic, Cristiana; Torrisi, Giovanni; Antonelli, Fabrizio; Vespignani, Alessandro; Pentland, Alex; Lepri, Bruno

    2015-01-01

    The study of socio-technical systems has been revolutionized by the unprecedented amount of digital records that are constantly being produced by human activities such as accessing Internet services, using mobile devices, and consuming energy and knowledge. In this paper, we describe the richest open multi-source dataset ever released on two geographical areas. The dataset is composed of telecommunications, weather, news, social networks and electricity data from the city of Milan and the Province of Trentino. The unique multi-source composition of the dataset makes it an ideal testbed for methodologies and approaches aimed at tackling a wide range of problems including energy consumption, mobility planning, tourist and migrant flows, urban structures and interactions, event detection, urban well-being and many others.

  2. A multi-source dataset of urban life in the city of Milan and the Province of Trentino

    NASA Astrophysics Data System (ADS)

    Barlacchi, Gianni; de Nadai, Marco; Larcher, Roberto; Casella, Antonio; Chitic, Cristiana; Torrisi, Giovanni; Antonelli, Fabrizio; Vespignani, Alessandro; Pentland, Alex; Lepri, Bruno

    2015-10-01

    The study of socio-technical systems has been revolutionized by the unprecedented amount of digital records that are constantly being produced by human activities such as accessing Internet services, using mobile devices, and consuming energy and knowledge. In this paper, we describe the richest open multi-source dataset ever released on two geographical areas. The dataset is composed of telecommunications, weather, news, social networks and electricity data from the city of Milan and the Province of Trentino. The unique multi-source composition of the dataset makes it an ideal testbed for methodologies and approaches aimed at tackling a wide range of problems including energy consumption, mobility planning, tourist and migrant flows, urban structures and interactions, event detection, urban well-being and many others.

  3. A multi-source dataset of urban life in the city of Milan and the Province of Trentino

    PubMed Central

    Barlacchi, Gianni; De Nadai, Marco; Larcher, Roberto; Casella, Antonio; Chitic, Cristiana; Torrisi, Giovanni; Antonelli, Fabrizio; Vespignani, Alessandro; Pentland, Alex; Lepri, Bruno

    2015-01-01

    The study of socio-technical systems has been revolutionized by the unprecedented amount of digital records that are constantly being produced by human activities such as accessing Internet services, using mobile devices, and consuming energy and knowledge. In this paper, we describe the richest open multi-source dataset ever released on two geographical areas. The dataset is composed of telecommunications, weather, news, social networks and electricity data from the city of Milan and the Province of Trentino. The unique multi-source composition of the dataset makes it an ideal testbed for methodologies and approaches aimed at tackling a wide range of problems including energy consumption, mobility planning, tourist and migrant flows, urban structures and interactions, event detection, urban well-being and many others. PMID:26528394

  4. Automated estimation of image quality for coronary computed tomographic angiography using machine learning.

    PubMed

    Nakanishi, Rine; Sankaran, Sethuraman; Grady, Leo; Malpeso, Jenifer; Yousfi, Razik; Osawa, Kazuhiro; Ceponiene, Indre; Nazarat, Negin; Rahmani, Sina; Kissel, Kendall; Jayawardena, Eranthi; Dailing, Christopher; Zarins, Christopher; Koo, Bon-Kwon; Min, James K; Taylor, Charles A; Budoff, Matthew J

    2018-03-23

    Our goal was to evaluate the efficacy of a fully automated method for assessing the image quality (IQ) of coronary computed tomography angiography (CCTA). The machine learning method was trained using 75 CCTA studies by mapping features (noise, contrast, misregistration scores, and un-interpretability index) to an IQ score based on manual ground truth data. The automated method was validated on a set of 50 CCTA studies and subsequently tested on a new set of 172 CCTA studies against visual IQ scores on a 5-point Likert scale. The area under the curve in the validation set was 0.96. In the 172 CCTA studies, our method yielded a Cohen's kappa statistic for the agreement between automated and visual IQ assessment of 0.67 (p < 0.01). In the group where good to excellent (n = 163), fair (n = 6), and poor visual IQ scores (n = 3) were graded, 155, 5, and 2 of the patients received an automated IQ score > 50 %, respectively. Fully automated assessment of the IQ of CCTA data sets by machine learning was reproducible and provided similar results compared with visual analysis within the limits of inter-operator variability. • The proposed method enables automated and reproducible image quality assessment. • Machine learning and visual assessments yielded comparable estimates of image quality. • Automated assessment potentially allows for more standardised image quality. • Image quality assessment enables standardization of clinical trial results across different datasets.

  5. Multi-task learning for cross-platform siRNA efficacy prediction: an in-silico study

    PubMed Central

    2010-01-01

    Background Gene silencing using exogenous small interfering RNAs (siRNAs) is now a widespread molecular tool for gene functional study and new-drug target identification. The key mechanism in this technique is to design efficient siRNAs that incorporated into the RNA-induced silencing complexes (RISC) to bind and interact with the mRNA targets to repress their translations to proteins. Although considerable progress has been made in the computational analysis of siRNA binding efficacy, few joint analysis of different RNAi experiments conducted under different experimental scenarios has been done in research so far, while the joint analysis is an important issue in cross-platform siRNA efficacy prediction. A collective analysis of RNAi mechanisms for different datasets and experimental conditions can often provide new clues on the design of potent siRNAs. Results An elegant multi-task learning paradigm for cross-platform siRNA efficacy prediction is proposed. Experimental studies were performed on a large dataset of siRNA sequences which encompass several RNAi experiments recently conducted by different research groups. By using our multi-task learning method, the synergy among different experiments is exploited and an efficient multi-task predictor for siRNA efficacy prediction is obtained. The 19 most popular biological features for siRNA according to their jointly importance in multi-task learning were ranked. Furthermore, the hypothesis is validated out that the siRNA binding efficacy on different messenger RNAs(mRNAs) have different conditional distribution, thus the multi-task learning can be conducted by viewing tasks at an "mRNA"-level rather than at the "experiment"-level. Such distribution diversity derived from siRNAs bound to different mRNAs help indicate that the properties of target mRNA have important implications on the siRNA binding efficacy. Conclusions The knowledge gained from our study provides useful insights on how to analyze various cross-platform RNAi data for uncovering of their complex mechanism. PMID:20380733

  6. Multi-task learning for cross-platform siRNA efficacy prediction: an in-silico study.

    PubMed

    Liu, Qi; Xu, Qian; Zheng, Vincent W; Xue, Hong; Cao, Zhiwei; Yang, Qiang

    2010-04-10

    Gene silencing using exogenous small interfering RNAs (siRNAs) is now a widespread molecular tool for gene functional study and new-drug target identification. The key mechanism in this technique is to design efficient siRNAs that incorporated into the RNA-induced silencing complexes (RISC) to bind and interact with the mRNA targets to repress their translations to proteins. Although considerable progress has been made in the computational analysis of siRNA binding efficacy, few joint analysis of different RNAi experiments conducted under different experimental scenarios has been done in research so far, while the joint analysis is an important issue in cross-platform siRNA efficacy prediction. A collective analysis of RNAi mechanisms for different datasets and experimental conditions can often provide new clues on the design of potent siRNAs. An elegant multi-task learning paradigm for cross-platform siRNA efficacy prediction is proposed. Experimental studies were performed on a large dataset of siRNA sequences which encompass several RNAi experiments recently conducted by different research groups. By using our multi-task learning method, the synergy among different experiments is exploited and an efficient multi-task predictor for siRNA efficacy prediction is obtained. The 19 most popular biological features for siRNA according to their jointly importance in multi-task learning were ranked. Furthermore, the hypothesis is validated out that the siRNA binding efficacy on different messenger RNAs(mRNAs) have different conditional distribution, thus the multi-task learning can be conducted by viewing tasks at an "mRNA"-level rather than at the "experiment"-level. Such distribution diversity derived from siRNAs bound to different mRNAs help indicate that the properties of target mRNA have important implications on the siRNA binding efficacy. The knowledge gained from our study provides useful insights on how to analyze various cross-platform RNAi data for uncovering of their complex mechanism.

  7. Gender classification of running subjects using full-body kinematics

    NASA Astrophysics Data System (ADS)

    Williams, Christina M.; Flora, Jeffrey B.; Iftekharuddin, Khan M.

    2016-05-01

    This paper proposes novel automated gender classification of subjects while engaged in running activity. The machine learning techniques include preprocessing steps using principal component analysis followed by classification with linear discriminant analysis, and nonlinear support vector machines, and decision-stump with AdaBoost. The dataset consists of 49 subjects (25 males, 24 females, 2 trials each) all equipped with approximately 80 retroreflective markers. The trials are reflective of the subject's entire body moving unrestrained through a capture volume at a self-selected running speed, thus producing highly realistic data. The classification accuracy using leave-one-out cross validation for the 49 subjects is improved from 66.33% using linear discriminant analysis to 86.74% using the nonlinear support vector machine. Results are further improved to 87.76% by means of implementing a nonlinear decision stump with AdaBoost classifier. The experimental findings suggest that the linear classification approaches are inadequate in classifying gender for a large dataset with subjects running in a moderately uninhibited environment.

  8. AstroML: Python-powered Machine Learning for Astronomy

    NASA Astrophysics Data System (ADS)

    Vander Plas, Jake; Connolly, A. J.; Ivezic, Z.

    2014-01-01

    As astronomical data sets grow in size and complexity, automated machine learning and data mining methods are becoming an increasingly fundamental component of research in the field. The astroML project (http://astroML.org) provides a common repository for practical examples of the data mining and machine learning tools used and developed by astronomical researchers, written in Python. The astroML module contains a host of general-purpose data analysis and machine learning routines, loaders for openly-available astronomical datasets, and fast implementations of specific computational methods often used in astronomy and astrophysics. The associated website features hundreds of examples of these routines being used for analysis of real astronomical datasets, while the associated textbook provides a curriculum resource for graduate-level courses focusing on practical statistics, machine learning, and data mining approaches within Astronomical research. This poster will highlight several of the more powerful and unique examples of analysis performed with astroML, all of which can be reproduced in their entirety on any computer with the proper packages installed.

  9. Imaging mass spectrometry data reduction: automated feature identification and extraction.

    PubMed

    McDonnell, Liam A; van Remoortere, Alexandra; de Velde, Nico; van Zeijl, René J M; Deelder, André M

    2010-12-01

    Imaging MS now enables the parallel analysis of hundreds of biomolecules, spanning multiple molecular classes, which allows tissues to be described by their molecular content and distribution. When combined with advanced data analysis routines, tissues can be analyzed and classified based solely on their molecular content. Such molecular histology techniques have been used to distinguish regions with differential molecular signatures that could not be distinguished using established histologic tools. However, its potential to provide an independent, complementary analysis of clinical tissues has been limited by the very large file sizes and large number of discrete variables associated with imaging MS experiments. Here we demonstrate data reduction tools, based on automated feature identification and extraction, for peptide, protein, and lipid imaging MS, using multiple imaging MS technologies, that reduce data loads and the number of variables by >100×, and that highlight highly-localized features that can be missed using standard data analysis strategies. It is then demonstrated how these capabilities enable multivariate analysis on large imaging MS datasets spanning multiple tissues. Copyright © 2010 American Society for Mass Spectrometry. Published by Elsevier Inc. All rights reserved.

  10. Knowledge-Guided Robust MRI Brain Extraction for Diverse Large-Scale Neuroimaging Studies on Humans and Non-Human Primates

    PubMed Central

    Wang, Yaping; Nie, Jingxin; Yap, Pew-Thian; Li, Gang; Shi, Feng; Geng, Xiujuan; Guo, Lei; Shen, Dinggang

    2014-01-01

    Accurate and robust brain extraction is a critical step in most neuroimaging analysis pipelines. In particular, for the large-scale multi-site neuroimaging studies involving a significant number of subjects with diverse age and diagnostic groups, accurate and robust extraction of the brain automatically and consistently is highly desirable. In this paper, we introduce population-specific probability maps to guide the brain extraction of diverse subject groups, including both healthy and diseased adult human populations, both developing and aging human populations, as well as non-human primates. Specifically, the proposed method combines an atlas-based approach, for coarse skull-stripping, with a deformable-surface-based approach that is guided by local intensity information and population-specific prior information learned from a set of real brain images for more localized refinement. Comprehensive quantitative evaluations were performed on the diverse large-scale populations of ADNI dataset with over 800 subjects (55∼90 years of age, multi-site, various diagnosis groups), OASIS dataset with over 400 subjects (18∼96 years of age, wide age range, various diagnosis groups), and NIH pediatrics dataset with 150 subjects (5∼18 years of age, multi-site, wide age range as a complementary age group to the adult dataset). The results demonstrate that our method consistently yields the best overall results across almost the entire human life span, with only a single set of parameters. To demonstrate its capability to work on non-human primates, the proposed method is further evaluated using a rhesus macaque dataset with 20 subjects. Quantitative comparisons with popularly used state-of-the-art methods, including BET, Two-pass BET, BET-B, BSE, HWA, ROBEX and AFNI, demonstrate that the proposed method performs favorably with superior performance on all testing datasets, indicating its robustness and effectiveness. PMID:24489639

  11. Hessian-LoG filtering for enhancement and detection of photoreceptor cells in adaptive optics retinal images.

    PubMed

    Lazareva, Anfisa; Liatsis, Panos; Rauscher, Franziska G

    2016-01-01

    Automated analysis of retinal images plays a vital role in the examination, diagnosis, and prognosis of healthy and pathological retinas. Retinal disorders and the associated visual loss can be interpreted via quantitative correlations, based on measurements of photoreceptor loss. Therefore, it is important to develop reliable tools for identification of photoreceptor cells. In this paper, an automated algorithm is proposed, based on the use of the Hessian-Laplacian of Gaussian filter, which allows enhancement and detection of photoreceptor cells. The performance of the proposed technique is evaluated on both synthetic and high-resolution retinal images, in terms of packing density. The results on the synthetic data were compared against ground truth as well as cone counts obtained by the Li and Roorda algorithm. For the synthetic datasets, our method showed an average detection accuracy of 98.8%, compared to 93.9% for the Li and Roorda approach. The packing density estimates calculated on the retinal datasets were validated against manual counts and the results obtained by a proprietary software from Imagine Eyes and the Li and Roorda algorithm. Among the tested methods, the proposed approach showed the closest agreement with manual counting.

  12. Global Characterization and Monitoring of Forest Cover Using Landsat Data: Opportunities and Challenges

    NASA Technical Reports Server (NTRS)

    Townshend, John R.; Masek, Jeffrey G.; Huang, ChengQuan; Vermote, Eric F.; Gao, Feng; Channan, Saurabh; Sexton, Joseph O.; Feng, Min; Narasimhan, Ramghuram; Kim, Dohyung; hide

    2012-01-01

    The compilation of global Landsat data-sets and the ever-lowering costs of computing now make it feasible to monitor the Earth's land cover at Landsat resolutions of 30 m. In this article, we describe the methods to create global products of forest cover and cover change at Landsat resolutions. Nevertheless, there are many challenges in ensuring the creation of high-quality products. And we propose various ways in which the challenges can be overcome. Among the challenges are the need for atmospheric correction, incorrect calibration coefficients in some of the data-sets, the different phenologies between compilations, the need for terrain correction, the lack of consistent reference data for training and accuracy assessment, and the need for highly automated characterization and change detection. We propose and evaluate the creation and use of surface reflectance products, improved selection of scenes to reduce phenological differences, terrain illumination correction, automated training selection, and the use of information extraction procedures robust to errors in training data along with several other issues. At several stages we use Moderate Resolution Spectroradiometer data and products to assist our analysis. A global working prototype product of forest cover and forest cover change is included.

  13. Automating an integrated spatial data-mining model for landfill site selection

    NASA Astrophysics Data System (ADS)

    Abujayyab, Sohaib K. M.; Ahamad, Mohd Sanusi S.; Yahya, Ahmad Shukri; Ahmad, Siti Zubaidah; Aziz, Hamidi Abdul

    2017-10-01

    An integrated programming environment represents a robust approach to building a valid model for landfill site selection. One of the main challenges in the integrated model is the complicated processing and modelling due to the programming stages and several limitations. An automation process helps avoid the limitations and improve the interoperability between integrated programming environments. This work targets the automation of a spatial data-mining model for landfill site selection by integrating between spatial programming environment (Python-ArcGIS) and non-spatial environment (MATLAB). The model was constructed using neural networks and is divided into nine stages distributed between Matlab and Python-ArcGIS. A case study was taken from the north part of Peninsular Malaysia. 22 criteria were selected to utilise as input data and to build the training and testing datasets. The outcomes show a high-performance accuracy percentage of 98.2% in the testing dataset using 10-fold cross validation. The automated spatial data mining model provides a solid platform for decision makers to performing landfill site selection and planning operations on a regional scale.

  14. Public Data Set: Control and Automation of the Pegasus Multi-point Thomson Scattering System

    DOE Data Explorer

    Bodner, Grant M. [University of Wisconsin-Madison] (ORCID:0000000324979172); Bongard, Michael W. [University of Wisconsin-Madison] (ORCID:0000000231609746); Fonck, Raymond J. [University of Wisconsin-Madison] (ORCID:0000000294386762); Reusch, Joshua A. [University of Wisconsin-Madison] (ORCID:0000000284249422); Rodriguez Sanchez, Cuauhtemoc [University of Wisconsin-Madison] (ORCID:0000000334712586); Schlossberg, David J. [University of Wisconsin-Madison] (ORCID:0000000287139448)

    2016-08-12

    This public data set contains openly-documented, machine readable digital research data corresponding to figures published in G.M. Bodner et al., 'Control and Automation of the Pegasus Multi-point Thomson Scattering System,' Rev. Sci. Instrum. 87, 11E523 (2016).

  15. Multi-scale curvature for automated identification of glaciated mountain landscapes

    NASA Astrophysics Data System (ADS)

    Prasicek, Günther; Otto, Jan-Christoph; Montgomery, David; Schrott, Lothar

    2014-05-01

    Automated morphometric interpretation of digital terrain data based on impartial rule sets holds substantial promise for large dataset processing and objective landscape classification. However, the geomorphological realm presents tremendous complexity in the translation of qualitative descriptions into geomorphometric semantics. Here, the simple, conventional distinction of V-shaped fluvial and U-shaped glacial valleys is analyzed quantitatively using the relation of multi-scale curvature and drainage area. Glacial and fluvial erosion shapes mountain landscapes in a long-recognized and characteristic way. Valleys incised by fluvial processes typically have V-shaped cross-sections with uniform and moderately steep slopes, whereas glacial valleys tend to have U-shaped profiles and topographic gradients steepening with distance from valley floor. On a DEM, thalweg cells are determined by a drainage area cutoff and multiple moving window sizes are used to derive per-cell curvature over a variety of scales ranging from the vicinity of the flow path at the valley bottom to catchment sections fully including valley sides. The relation of the curvatures calculated for the user-defined minimum scale and the automatically detected maximum scale is presented as a novel morphometric variable termed Difference of Minimum Curvature (DMC). DMC thresholds determined from typical glacial and fluvial sample catchments are employed to identify quadrats of glaciated and non-glaciated mountain landscapes and the distinctions are validated by field-based geological and geomorphological maps. A first test of the novel algorithm at three study sites in the western United States and a subsequent application to Europe and western Asia demonstrate the transferability of the approach.

  16. Evaluating Climate Causation of Conflict in Darfur Using Multi-temporal, Multi-resolution Satellite Image Datasets With Novel Analyses

    NASA Astrophysics Data System (ADS)

    Brown, I.; Wennbom, M.

    2013-12-01

    Climate change, population growth and changes in traditional lifestyles have led to instabilities in traditional demarcations between neighboring ethic and religious groups in the Sahel region. This has resulted in a number of conflicts as groups resort to arms to settle disputes. Such disputes often centre on or are justified by competition for resources. The conflict in Darfur has been controversially explained by resource scarcity resulting from climate change. Here we analyse established methods of using satellite imagery to assess vegetation health in Darfur. Multi-decadal time series of observations are available using low spatial resolution visible-near infrared imagery. Typically normalized difference vegetation index (NDVI) analyses are produced to describe changes in vegetation ';greenness' or ';health'. Such approaches have been widely used to evaluate the long term development of vegetation in relation to climate variations across a wide range of environments from the Arctic to the Sahel. These datasets typically measure peak NDVI observed over a given interval and may introduce bias. It is furthermore unclear how the spatial organization of sparse vegetation may affect low resolution NDVI products. We develop and assess alternative measures of vegetation including descriptors of the growing season, wetness and resource availability. Expanding the range of parameters used in the analysis reduces our dependence on peak NDVI. Furthermore, these descriptors provide a better characterization of the growing season than the single NDVI measure. Using multi-sensor data we combine high temporal/moderate spatial resolution data with low temporal/high spatial resolution data to improve the spatial representativity of the observations and to provide improved spatial analysis of vegetation patterns. The approach places the high resolution observations in the NDVI context space using a longer time series of lower resolution imagery. The vegetation descriptors derived are evaluated using independent high spatial resolution datasets that reveal the pattern and health of vegetation at metre scales. We also use climate variables to support the interpretation of these data. We conclude that the spatio-temporal patterns in Darfur vegetation and climate datasets suggest that labelling the conflict a climate-change conflict is inaccurate and premature.

  17. VideoWeb Dataset for Multi-camera Activities and Non-verbal Communication

    NASA Astrophysics Data System (ADS)

    Denina, Giovanni; Bhanu, Bir; Nguyen, Hoang Thanh; Ding, Chong; Kamal, Ahmed; Ravishankar, Chinya; Roy-Chowdhury, Amit; Ivers, Allen; Varda, Brenda

    Human-activity recognition is one of the most challenging problems in computer vision. Researchers from around the world have tried to solve this problem and have come a long way in recognizing simple motions and atomic activities. As the computer vision community heads toward fully recognizing human activities, a challenging and labeled dataset is needed. To respond to that need, we collected a dataset of realistic scenarios in a multi-camera network environment (VideoWeb) involving multiple persons performing dozens of different repetitive and non-repetitive activities. This chapter describes the details of the dataset. We believe that this VideoWeb Activities dataset is unique and it is one of the most challenging datasets available today. The dataset is publicly available online at http://vwdata.ee.ucr.edu/ along with the data annotation.

  18. Challenges in Extracting Information From Large Hydrogeophysical-monitoring Datasets

    NASA Astrophysics Data System (ADS)

    Day-Lewis, F. D.; Slater, L. D.; Johnson, T.

    2012-12-01

    Over the last decade, new automated geophysical data-acquisition systems have enabled collection of increasingly large and information-rich geophysical datasets. Concurrent advances in field instrumentation, web services, and high-performance computing have made real-time processing, inversion, and visualization of large three-dimensional tomographic datasets practical. Geophysical-monitoring datasets have provided high-resolution insights into diverse hydrologic processes including groundwater/surface-water exchange, infiltration, solute transport, and bioremediation. Despite the high information content of such datasets, extraction of quantitative or diagnostic hydrologic information is challenging. Visual inspection and interpretation for specific hydrologic processes is difficult for datasets that are large, complex, and (or) affected by forcings (e.g., seasonal variations) unrelated to the target hydrologic process. New strategies are needed to identify salient features in spatially distributed time-series data and to relate temporal changes in geophysical properties to hydrologic processes of interest while effectively filtering unrelated changes. Here, we review recent work using time-series and digital-signal-processing approaches in hydrogeophysics. Examples include applications of cross-correlation, spectral, and time-frequency (e.g., wavelet and Stockwell transforms) approaches to (1) identify salient features in large geophysical time series; (2) examine correlation or coherence between geophysical and hydrologic signals, even in the presence of non-stationarity; and (3) condense large datasets while preserving information of interest. Examples demonstrate analysis of large time-lapse electrical tomography and fiber-optic temperature datasets to extract information about groundwater/surface-water exchange and contaminant transport.

  19. Hexicon 2: Automated Processing of Hydrogen-Deuterium Exchange Mass Spectrometry Data with Improved Deuteration Distribution Estimation

    NASA Astrophysics Data System (ADS)

    Lindner, Robert; Lou, Xinghua; Reinstein, Jochen; Shoeman, Robert L.; Hamprecht, Fred A.; Winkler, Andreas

    2014-06-01

    Hydrogen-deuterium exchange (HDX) experiments analyzed by mass spectrometry (MS) provide information about the dynamics and the solvent accessibility of protein backbone amide hydrogen atoms. Continuous improvement of MS instrumentation has contributed to the increasing popularity of this method; however, comprehensive automated data analysis is only beginning to mature. We present Hexicon 2, an automated pipeline for data analysis and visualization based on the previously published program Hexicon (Lou et al. 2010). Hexicon 2 employs the sensitive NITPICK peak detection algorithm of its predecessor in a divide-and-conquer strategy and adds new features, such as chromatogram alignment and improved peptide sequence assignment. The unique feature of deuteration distribution estimation was retained in Hexicon 2 and improved using an iterative deconvolution algorithm that is robust even to noisy data. In addition, Hexicon 2 provides a data browser that facilitates quality control and provides convenient access to common data visualization tasks. Analysis of a benchmark dataset demonstrates superior performance of Hexicon 2 compared with its predecessor in terms of deuteration centroid recovery and deuteration distribution estimation. Hexicon 2 greatly reduces data analysis time compared with manual analysis, whereas the increased number of peptides provides redundant coverage of the entire protein sequence. Hexicon 2 is a standalone application available free of charge under http://hx2.mpimf-heidelberg.mpg.de.

  20. Hexicon 2: automated processing of hydrogen-deuterium exchange mass spectrometry data with improved deuteration distribution estimation.

    PubMed

    Lindner, Robert; Lou, Xinghua; Reinstein, Jochen; Shoeman, Robert L; Hamprecht, Fred A; Winkler, Andreas

    2014-06-01

    Hydrogen-deuterium exchange (HDX) experiments analyzed by mass spectrometry (MS) provide information about the dynamics and the solvent accessibility of protein backbone amide hydrogen atoms. Continuous improvement of MS instrumentation has contributed to the increasing popularity of this method; however, comprehensive automated data analysis is only beginning to mature. We present Hexicon 2, an automated pipeline for data analysis and visualization based on the previously published program Hexicon (Lou et al. 2010). Hexicon 2 employs the sensitive NITPICK peak detection algorithm of its predecessor in a divide-and-conquer strategy and adds new features, such as chromatogram alignment and improved peptide sequence assignment. The unique feature of deuteration distribution estimation was retained in Hexicon 2 and improved using an iterative deconvolution algorithm that is robust even to noisy data. In addition, Hexicon 2 provides a data browser that facilitates quality control and provides convenient access to common data visualization tasks. Analysis of a benchmark dataset demonstrates superior performance of Hexicon 2 compared with its predecessor in terms of deuteration centroid recovery and deuteration distribution estimation. Hexicon 2 greatly reduces data analysis time compared with manual analysis, whereas the increased number of peptides provides redundant coverage of the entire protein sequence. Hexicon 2 is a standalone application available free of charge under http://hx2.mpimf-heidelberg.mpg.de.

  1. Human Vision-Motivated Algorithm Allows Consistent Retinal Vessel Classification Based on Local Color Contrast for Advancing General Diagnostic Exams.

    PubMed

    Ivanov, Iliya V; Leitritz, Martin A; Norrenberg, Lars A; Völker, Michael; Dynowski, Marek; Ueffing, Marius; Dietter, Johannes

    2016-02-01

    Abnormalities of blood vessel anatomy, morphology, and ratio can serve as important diagnostic markers for retinal diseases such as AMD or diabetic retinopathy. Large cohort studies demand automated and quantitative image analysis of vascular abnormalities. Therefore, we developed an analytical software tool to enable automated standardized classification of blood vessels supporting clinical reading. A dataset of 61 images was collected from a total of 33 women and 8 men with a median age of 38 years. The pupils were not dilated, and images were taken after dark adaption. In contrast to current methods in which classification is based on vessel profile intensity averages, and similar to human vision, local color contrast was chosen as a discriminator to allow artery vein discrimination and arterial-venous ratio (AVR) calculation without vessel tracking. With 83% ± 1 standard error of the mean for our dataset, we achieved best classification for weighted lightness information from a combination of the red, green, and blue channels. Tested on an independent dataset, our method reached 89% correct classification, which, when benchmarked against conventional ophthalmologic classification, shows significantly improved classification scores. Our study demonstrates that vessel classification based on local color contrast can cope with inter- or intraimage lightness variability and allows consistent AVR calculation. We offer an open-source implementation of this method upon request, which can be integrated into existing tool sets and applied to general diagnostic exams.

  2. Toward automated face detection in thermal and polarimetric thermal imagery

    NASA Astrophysics Data System (ADS)

    Gordon, Christopher; Acosta, Mark; Short, Nathan; Hu, Shuowen; Chan, Alex L.

    2016-05-01

    Visible spectrum face detection algorithms perform pretty reliably under controlled lighting conditions. However, variations in illumination and application of cosmetics can distort the features used by common face detectors, thereby degrade their detection performance. Thermal and polarimetric thermal facial imaging are relatively invariant to illumination and robust to the application of makeup, due to their measurement of emitted radiation instead of reflected light signals. The objective of this work is to evaluate a government off-the-shelf wavelet based naïve-Bayes face detection algorithm and a commercial off-the-shelf Viola-Jones cascade face detection algorithm on face imagery acquired in different spectral bands. New classifiers were trained using the Viola-Jones cascade object detection framework with preprocessed facial imagery. Preprocessing using Difference of Gaussians (DoG) filtering reduces the modality gap between facial signatures across the different spectral bands, thus enabling more correlated histogram of oriented gradients (HOG) features to be extracted from the preprocessed thermal and visible face images. Since the availability of training data is much more limited in the thermal spectrum than in the visible spectrum, it is not feasible to train a robust multi-modal face detector using thermal imagery alone. A large training dataset was constituted with DoG filtered visible and thermal imagery, which was subsequently used to generate a custom trained Viola-Jones detector. A 40% increase in face detection rate was achieved on a testing dataset, as compared to the performance of a pre-trained/baseline face detector. Insights gained in this research are valuable in the development of more robust multi-modal face detectors.

  3. A LabVIEW®-based software for the control of the AUTORAD platform: a fully automated multisequential flow injection analysis Lab-on-Valve (MSFIA-LOV) system for radiochemical analysis.

    PubMed

    Barbesi, Donato; Vicente Vilas, Víctor; Millet, Sylvain; Sandow, Miguel; Colle, Jean-Yves; Aldave de Las Heras, Laura

    2017-01-01

    A LabVIEW ® -based software for the control of the fully automated multi-sequential flow injection analysis Lab-on-Valve (MSFIA-LOV) platform AutoRAD performing radiochemical analysis is described. The analytical platform interfaces an Arduino ® -based device triggering multiple detectors providing a flexible and fit for purpose choice of detection systems. The different analytical devices are interfaced to the PC running LabVIEW ® VI software using USB and RS232 interfaces, both for sending commands and receiving confirmation or error responses. The AUTORAD platform has been successfully applied for the chemical separation and determination of Sr, an important fission product pertinent to nuclear waste.

  4. Benefit Analysis of SPC Panel SP-10 Projects

    DTIC Science & Technology

    1995-12-01

    started by stating the definition of Flexible Automation as: “The combination of reprogrammable single and multi- functional manipulators and fixed...shipyards (like the automobile industry) for them to put a plan together, hence the need for a consultant or contractor. To-assist the developing a plan

  5. Integrating Automation into a Multi-Mission Operations Center

    NASA Technical Reports Server (NTRS)

    Surka, Derek M.; Jones, Lori; Crouse, Patrick; Cary, Everett A, Jr.; Esposito, Timothy C.

    2007-01-01

    NASA Goddard Space Flight Center's Space Science Mission Operations (SSMO) Project is currently tackling the challenge of minimizing ground operations costs for multiple satellites that have surpassed their prime mission phase and are well into extended mission. These missions are being reengineered into a multi-mission operations center built around modern information technologies and a common ground system infrastructure. The effort began with the integration of four SMEX missions into a similar architecture that provides command and control capabilities and demonstrates fleet automation and control concepts as a pathfinder for additional mission integrations. The reengineered ground system, called the Multi-Mission Operations Center (MMOC), is now undergoing a transformation to support other SSMO missions, which include SOHO, Wind, and ACE. This paper presents the automation principles and lessons learned to date for integrating automation into an existing operations environment for multiple satellites.

  6. Automatic Denoising of Functional MRI Data: Combining Independent Component Analysis and Hierarchical Fusion of Classifiers

    PubMed Central

    Salimi-Khorshidi, Gholamreza; Douaud, Gwenaëlle; Beckmann, Christian F; Glasser, Matthew F; Griffanti, Ludovica; Smith, Stephen M

    2014-01-01

    Many sources of fluctuation contribute to the fMRI signal, and this makes identifying the effects that are truly related to the underlying neuronal activity difficult. Independent component analysis (ICA) - one of the most widely used techniques for the exploratory analysis of fMRI data - has shown to be a powerful technique in identifying various sources of neuronally-related and artefactual fluctuation in fMRI data (both with the application of external stimuli and with the subject “at rest”). ICA decomposes fMRI data into patterns of activity (a set of spatial maps and their corresponding time series) that are statistically independent and add linearly to explain voxel-wise time series. Given the set of ICA components, if the components representing “signal” (brain activity) can be distinguished form the “noise” components (effects of motion, non-neuronal physiology, scanner artefacts and other nuisance sources), the latter can then be removed from the data, providing an effective cleanup of structured noise. Manual classification of components is labour intensive and requires expertise; hence, a fully automatic noise detection algorithm that can reliably detect various types of noise sources (in both task and resting fMRI) is desirable. In this paper, we introduce FIX (“FMRIB’s ICA-based X-noiseifier”), which provides an automatic solution for denoising fMRI data via accurate classification of ICA components. For each ICA component FIX generates a large number of distinct spatial and temporal features, each describing a different aspect of the data (e.g., what proportion of temporal fluctuations are at high frequencies). The set of features is then fed into a multi-level classifier (built around several different Classifiers). Once trained through the hand-classification of a sufficient number of training datasets, the classifier can then automatically classify new datasets. The noise components can then be subtracted from (or regressed out of) the original data, to provide automated cleanup. On conventional resting-state fMRI (rfMRI) single-run datasets, FIX achieved about 95% overall accuracy. On high-quality rfMRI data from the Human Connectome Project, FIX achieves over 99% classification accuracy, and as a result is being used in the default rfMRI processing pipeline for generating HCP connectomes. FIX is publicly available as a plugin for FSL. PMID:24389422

  7. Differentiating Obstructive from Central and Complex Sleep Apnea Using an Automated Electrocardiogram-Based Method

    PubMed Central

    Thomas, Robert Joseph; Mietus, Joseph E.; Peng, Chung-Kang; Gilmartin, Geoffrey; Daly, Robert W.; Goldberger, Ary L.; Gottlieb, Daniel J.

    2007-01-01

    Study Objectives: Complex sleep apnea is defined as sleep disordered breathing secondary to simultaneous upper airway obstruction and respiratory control dysfunction. The objective of this study was to assess the utility of an electrocardiogram (ECG)-based cardiopulmonary coupling technique to distinguish obstructive from central or complex sleep apnea. Design: Analysis of archived polysomnographic datasets. Setting: A laboratory for computational signal analysis. Interventions: None. Measurements and Results: The PhysioNet Sleep Apnea Database, consisting of 70 polysomnograms including single-lead ECG signals of approximately 8 hours duration, was used to train an ECG-based measure of autonomic and respiratory interactions (cardiopulmonary coupling) to detect periods of apnea and hypopnea, based on the presence of elevated low-frequency coupling (e-LFC). In the PhysioNet BIDMC Congestive Heart Failure Database (ECGs of 15 subjects), a pattern of “narrow spectral band” e-LFC was especially common. The algorithm was then applied to the Sleep Heart Health Study–I dataset, to select the 15 records with the highest amounts of broad and narrow spectral band e-LFC. The latter spectral characteristic seemed to detect not only periods of central apnea, but also obstructive hypopneas with a periodic breathing pattern. Applying the algorithm to 77 sleep laboratory split-night studies showed that the presence of narrow band e-LFC predicted an increased sensitivity to induction of central apneas by positive airway pressure. Conclusions: ECG-based spectral analysis allows automated, operator-independent characterization of probable interactions between respiratory dyscontrol and upper airway anatomical obstruction. The clinical utility of spectrographic phenotyping, especially in predicting failure of positive airway pressure therapy, remains to be more thoroughly tested. Citation: Thomas RJ; Mietus JE; Peng CK; Gilmartin G; Daly RW; Goldberger AL; Gottlieb DJ. Differentiating obstructive from central and complex sleep apnea using an automated electrocardiogram-based method. SLEEP 2007;30(12):1756-1769. PMID:18246985

  8. Automated vessel segmentation using cross-correlation and pooled covariance matrix analysis.

    PubMed

    Du, Jiang; Karimi, Afshin; Wu, Yijing; Korosec, Frank R; Grist, Thomas M; Mistretta, Charles A

    2011-04-01

    Time-resolved contrast-enhanced magnetic resonance angiography (CE-MRA) provides contrast dynamics in the vasculature and allows vessel segmentation based on temporal correlation analysis. Here we present an automated vessel segmentation algorithm including automated generation of regions of interest (ROIs), cross-correlation and pooled sample covariance matrix analysis. The dynamic images are divided into multiple equal-sized regions. In each region, ROIs for artery, vein and background are generated using an iterative thresholding algorithm based on the contrast arrival time map and contrast enhancement map. Region-specific multi-feature cross-correlation analysis and pooled covariance matrix analysis are performed to calculate the Mahalanobis distances (MDs), which are used to automatically separate arteries from veins. This segmentation algorithm is applied to a dual-phase dynamic imaging acquisition scheme where low-resolution time-resolved images are acquired during the dynamic phase followed by high-frequency data acquisition at the steady-state phase. The segmented low-resolution arterial and venous images are then combined with the high-frequency data in k-space and inverse Fourier transformed to form the final segmented arterial and venous images. Results from volunteer and patient studies demonstrate the advantages of this automated vessel segmentation and dual phase data acquisition technique. Copyright © 2011 Elsevier Inc. All rights reserved.

  9. Long-Term Quantitative Precipitation Estimates (QPE) at High Spatial and Temporal Resolution over CONUS: Bias-Adjustment of the Radar-Only National Mosaic and Multi-sensor QPE (NMQ/Q2) Precipitation Reanalysis (2001-2012)

    NASA Astrophysics Data System (ADS)

    Prat, Olivier; Nelson, Brian; Stevens, Scott; Seo, Dong-Jun; Kim, Beomgeun

    2015-04-01

    The processing of radar-only precipitation via the reanalysis from the National Mosaic and Multi-Sensor Quantitative (NMQ/Q2) based on the WSR-88D Next-generation Radar (NEXRAD) network over Continental United States (CONUS) is completed for the period covering from 2001 to 2012. This important milestone constitutes a unique opportunity to study precipitation processes at a 1-km spatial resolution for a 5-min temporal resolution. However, in order to be suitable for hydrological, meteorological and climatological applications, the radar-only product needs to be bias-adjusted and merged with in-situ rain gauge information. Several in-situ datasets are available to assess the biases of the radar-only product and to adjust for those biases to provide a multi-sensor QPE. The rain gauge networks that are used such as the Global Historical Climatology Network-Daily (GHCN-D), the Hydrometeorological Automated Data System (HADS), the Automated Surface Observing Systems (ASOS), and the Climate Reference Network (CRN), have different spatial density and temporal resolution. The challenges related to incorporating non-homogeneous networks over a vast area and for a long-term record are enormous. Among the challenges we are facing are the difficulties incorporating differing resolution and quality surface measurements to adjust gridded estimates of precipitation. Another challenge is the type of adjustment technique. The objective of this work is threefold. First, we investigate how the different in-situ networks can impact the precipitation estimates as a function of the spatial density, sensor type, and temporal resolution. Second, we assess conditional and un-conditional biases of the radar-only QPE for various time scales (daily, hourly, 5-min) using in-situ precipitation observations. Finally, after assessing the bias and applying reduction or elimination techniques, we are using a unique in-situ dataset merging the different RG networks (CRN, ASOS, HADS, GHCN-D) to adjust the radar-only QPE product via an Inverse Distance Weighting (IDW) approach. In addition, we also investigate alternate adjustment techniques such as the kriging method and its variants (Simple Kriging: SK; Ordinary Kriging: OK; Conditional Bias-Penalized Kriging: CBPK). From this approach, we also hope to generate estimates of uncertainty for the gridded bias-adjusted QPE. Further comparison with a suite of lower resolution QPEs derived from ground based radar measurements (Stage IV) and satellite products (TMPA, CMORPH, PERSIANN) is also provided in order to give a detailed picture of the improvements and remaining challenges.

  10. A formal concept analysis approach to consensus clustering of multi-experiment expression data

    PubMed Central

    2014-01-01

    Background Presently, with the increasing number and complexity of available gene expression datasets, the combination of data from multiple microarray studies addressing a similar biological question is gaining importance. The analysis and integration of multiple datasets are expected to yield more reliable and robust results since they are based on a larger number of samples and the effects of the individual study-specific biases are diminished. This is supported by recent studies suggesting that important biological signals are often preserved or enhanced by multiple experiments. An approach to combining data from different experiments is the aggregation of their clusterings into a consensus or representative clustering solution which increases the confidence in the common features of all the datasets and reveals the important differences among them. Results We propose a novel generic consensus clustering technique that applies Formal Concept Analysis (FCA) approach for the consolidation and analysis of clustering solutions derived from several microarray datasets. These datasets are initially divided into groups of related experiments with respect to a predefined criterion. Subsequently, a consensus clustering algorithm is applied to each group resulting in a clustering solution per group. These solutions are pooled together and further analysed by employing FCA which allows extracting valuable insights from the data and generating a gene partition over all the experiments. In order to validate the FCA-enhanced approach two consensus clustering algorithms are adapted to incorporate the FCA analysis. Their performance is evaluated on gene expression data from multi-experiment study examining the global cell-cycle control of fission yeast. The FCA results derived from both methods demonstrate that, although both algorithms optimize different clustering characteristics, FCA is able to overcome and diminish these differences and preserve some relevant biological signals. Conclusions The proposed FCA-enhanced consensus clustering technique is a general approach to the combination of clustering algorithms with FCA for deriving clustering solutions from multiple gene expression matrices. The experimental results presented herein demonstrate that it is a robust data integration technique able to produce good quality clustering solution that is representative for the whole set of expression matrices. PMID:24885407

  11. Classification and Compression of Multi-Resolution Vectors: A Tree Structured Vector Quantizer Approach

    DTIC Science & Technology

    2002-01-01

    their expression profile and for classification of cells into tumerous and non- tumerous classes. Then we will present a parallel tree method for... cancerous cells. We will use the same dataset and use tree structured classifiers with multi-resolution analysis for classifying cancerous from non- cancerous ...cells. We have the expressions of 4096 genes from 98 different cell types. Of these 98, 72 are cancerous while 26 are non- cancerous . We are interested

  12. BagMOOV: A novel ensemble for heart disease prediction bootstrap aggregation with multi-objective optimized voting.

    PubMed

    Bashir, Saba; Qamar, Usman; Khan, Farhan Hassan

    2015-06-01

    Conventional clinical decision support systems are based on individual classifiers or simple combination of these classifiers which tend to show moderate performance. This research paper presents a novel classifier ensemble framework based on enhanced bagging approach with multi-objective weighted voting scheme for prediction and analysis of heart disease. The proposed model overcomes the limitations of conventional performance by utilizing an ensemble of five heterogeneous classifiers: Naïve Bayes, linear regression, quadratic discriminant analysis, instance based learner and support vector machines. Five different datasets are used for experimentation, evaluation and validation. The datasets are obtained from publicly available data repositories. Effectiveness of the proposed ensemble is investigated by comparison of results with several classifiers. Prediction results of the proposed ensemble model are assessed by ten fold cross validation and ANOVA statistics. The experimental evaluation shows that the proposed framework deals with all type of attributes and achieved high diagnosis accuracy of 84.16 %, 93.29 % sensitivity, 96.70 % specificity, and 82.15 % f-measure. The f-ratio higher than f-critical and p value less than 0.05 for 95 % confidence interval indicate that the results are extremely statistically significant for most of the datasets.

  13. Exploratory Climate Data Visualization and Analysis Using DV3D and UVCDAT

    NASA Technical Reports Server (NTRS)

    Maxwell, Thomas

    2012-01-01

    Earth system scientists are being inundated by an explosion of data generated by ever-increasing resolution in both global models and remote sensors. Advanced tools for accessing, analyzing, and visualizing very large and complex climate data are required to maintain rapid progress in Earth system research. To meet this need, NASA, in collaboration with the Ultra-scale Visualization Climate Data Analysis Tools (UVCOAT) consortium, is developing exploratory climate data analysis and visualization tools which provide data analysis capabilities for the Earth System Grid (ESG). This paper describes DV3D, a UV-COAT package that enables exploratory analysis of climate simulation and observation datasets. OV3D provides user-friendly interfaces for visualization and analysis of climate data at a level appropriate for scientists. It features workflow inte rfaces, interactive 40 data exploration, hyperwall and stereo visualization, automated provenance generation, and parallel task execution. DV30's integration with CDAT's climate data management system (COMS) and other climate data analysis tools provides a wide range of high performance climate data analysis operations. DV3D expands the scientists' toolbox by incorporating a suite of rich new exploratory visualization and analysis methods for addressing the complexity of climate datasets.

  14. Feasibility of using a large Clinical Data Warehouse to automate the selection of diagnostic cohorts.

    PubMed

    Stephen, Reejis; Boxwala, Aziz; Gertman, Paul

    2003-01-01

    Data from Clinical Data Warehouses (CDWs) can be used for retrospective studies and for benchmarking. However, automated identification of cases from large datasets containing data items in free text fields is challenging. We developed an algorithm for categorizing pediatric patients presenting with respiratory distress into Bronchiolitis, Bacterial pneumonia and Asthma using clinical variables from a CDW. A feasibility study of this approach indicates that case selection may be automated.

  15. Automated detection of prostate cancer in digitized whole-slide images of H and E-stained biopsy specimens

    NASA Astrophysics Data System (ADS)

    Litjens, G.; Ehteshami Bejnordi, B.; Timofeeva, N.; Swadi, G.; Kovacs, I.; Hulsbergen-van de Kaa, C.; van der Laak, J.

    2015-03-01

    Automated detection of prostate cancer in digitized H and E whole-slide images is an important first step for computer-driven grading. Most automated grading algorithms work on preselected image patches as they are too computationally expensive to calculate on the multi-gigapixel whole-slide images. An automated multi-resolution cancer detection system could reduce the computational workload for subsequent grading and quantification in two ways: by excluding areas of definitely normal tissue within a single specimen or by excluding entire specimens which do not contain any cancer. In this work we present a multi-resolution cancer detection algorithm geared towards the latter. The algorithm methodology is as follows: at a coarse resolution the system uses superpixels, color histograms and local binary patterns in combination with a random forest classifier to assess the likelihood of cancer. The five most suspicious superpixels are identified and at a higher resolution more computationally expensive graph and gland features are added to refine classification for these superpixels. Our methods were evaluated in a data set of 204 digitized whole-slide H and E stained images of MR-guided biopsy specimens from 163 patients. A pathologist exhaustively annotated the specimens for areas containing cancer. The performance of our system was evaluated using ten-fold cross-validation, stratified according to patient. Image-based receiver operating characteristic (ROC) analysis was subsequently performed where a specimen containing cancer was considered positive and specimens without cancer negative. We obtained an area under the ROC curve of 0.96 and a 0.4 specificity at a 1.0 sensitivity.

  16. Fast multi-core based multimodal registration of 2D cross-sections and 3D datasets.

    PubMed

    Scharfe, Michael; Pielot, Rainer; Schreiber, Falk

    2010-01-11

    Solving bioinformatics tasks often requires extensive computational power. Recent trends in processor architecture combine multiple cores into a single chip to improve overall performance. The Cell Broadband Engine (CBE), a heterogeneous multi-core processor, provides power-efficient and cost-effective high-performance computing. One application area is image analysis and visualisation, in particular registration of 2D cross-sections into 3D image datasets. Such techniques can be used to put different image modalities into spatial correspondence, for example, 2D images of histological cuts into morphological 3D frameworks. We evaluate the CBE-driven PlayStation 3 as a high performance, cost-effective computing platform by adapting a multimodal alignment procedure to several characteristic hardware properties. The optimisations are based on partitioning, vectorisation, branch reducing and loop unrolling techniques with special attention to 32-bit multiplies and limited local storage on the computing units. We show how a typical image analysis and visualisation problem, the multimodal registration of 2D cross-sections and 3D datasets, benefits from the multi-core based implementation of the alignment algorithm. We discuss several CBE-based optimisation methods and compare our results to standard solutions. More information and the source code are available from http://cbe.ipk-gatersleben.de. The results demonstrate that the CBE processor in a PlayStation 3 accelerates computational intensive multimodal registration, which is of great importance in biological/medical image processing. The PlayStation 3 as a low cost CBE-based platform offers an efficient option to conventional hardware to solve computational problems in image processing and bioinformatics.

  17. Efficient, Multi-Scale Designs Take Flight

    NASA Technical Reports Server (NTRS)

    2003-01-01

    Engineers can solve aerospace design problems faster and more efficiently with a versatile software product that performs automated structural analysis and sizing optimization. Collier Research Corporation's HyperSizer Structural Sizing Software is a design, analysis, and documentation tool that increases productivity and standardization for a design team. Based on established aerospace structural methods for strength, stability, and stiffness, HyperSizer can be used all the way from the conceptual design to in service support. The software originated from NASA s efforts to automate its capability to perform aircraft strength analyses, structural sizing, and weight prediction and reduction. With a strategy to combine finite element analysis with an automated design procedure, NASA s Langley Research Center led the development of a software code known as ST-SIZE from 1988 to 1995. Collier Research employees were principal developers of the code along with Langley researchers. The code evolved into one that could analyze the strength and stability of stiffened panels constructed of any material, including light-weight, fiber-reinforced composites.

  18. Automated flow cytometric analysis across large numbers of samples and cell types.

    PubMed

    Chen, Xiaoyi; Hasan, Milena; Libri, Valentina; Urrutia, Alejandra; Beitz, Benoît; Rouilly, Vincent; Duffy, Darragh; Patin, Étienne; Chalmond, Bernard; Rogge, Lars; Quintana-Murci, Lluis; Albert, Matthew L; Schwikowski, Benno

    2015-04-01

    Multi-parametric flow cytometry is a key technology for characterization of immune cell phenotypes. However, robust high-dimensional post-analytic strategies for automated data analysis in large numbers of donors are still lacking. Here, we report a computational pipeline, called FlowGM, which minimizes operator input, is insensitive to compensation settings, and can be adapted to different analytic panels. A Gaussian Mixture Model (GMM)-based approach was utilized for initial clustering, with the number of clusters determined using Bayesian Information Criterion. Meta-clustering in a reference donor permitted automated identification of 24 cell types across four panels. Cluster labels were integrated into FCS files, thus permitting comparisons to manual gating. Cell numbers and coefficient of variation (CV) were similar between FlowGM and conventional gating for lymphocyte populations, but notably FlowGM provided improved discrimination of "hard-to-gate" monocyte and dendritic cell (DC) subsets. FlowGM thus provides rapid high-dimensional analysis of cell phenotypes and is amenable to cohort studies. Copyright © 2015. Published by Elsevier Inc.

  19. Improving cervical region of interest by eliminating vaginal walls and cotton-swabs for automated image analysis

    NASA Astrophysics Data System (ADS)

    Venkataraman, Sankar; Li, Wenjing

    2008-03-01

    Image analysis for automated diagnosis of cervical cancer has attained high prominence in the last decade. Automated image analysis at all levels requires a basic segmentation of the region of interest (ROI) within a given image. The precision of the diagnosis is often reflected by the precision in detecting the initial region of interest, especially when some features outside the ROI mimic the ones within the same. Work described here discusses algorithms that are used to improve the cervical region of interest as a part of automated cervical image diagnosis. A vital visual aid in diagnosing cervical cancer is the aceto-whitening of the cervix after the application of acetic acid. Color and texture are used to segment acetowhite regions within the cervical ROI. Vaginal walls along with cottonswabs sometimes mimic these essential features leading to several false positives. Work presented here is focused towards detecting in-focus vaginal wall boundaries and then extrapolating them to exclude vaginal walls from the cervical ROI. In addition, discussed here is a marker-controlled watershed segmentation that is used to detect cottonswabs from the cervical ROI. A dataset comprising 50 high resolution images of the cervix acquired after 60 seconds of acetic acid application were used to test the algorithm. Out of the 50 images, 27 benefited from a new cervical ROI. Significant improvement in overall diagnosis was observed in these images as false positives caused by features outside the actual ROI mimicking acetowhite region were eliminated.

  20. SU-C-207B-04: Automated Segmentation of Pectoral Muscle in MR Images of Dense Breasts

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Verburg, E; Waard, SN de; Veldhuis, WB

    Purpose: To develop and evaluate a fully automated method for segmentation of the pectoral muscle boundary in Magnetic Resonance Imaging (MRI) of dense breasts. Methods: Segmentation of the pectoral muscle is an important part of automatic breast image analysis methods. Current methods for segmenting the pectoral muscle in breast MRI have difficulties delineating the muscle border correctly in breasts with a large proportion of fibroglandular tissue (i.e., dense breasts). Hence, an automated method based on dynamic programming was developed, incorporating heuristics aimed at shape, location and gradient features.To assess the method, the pectoral muscle was segmented in 91 randomly selectedmore » participants (mean age 56.6 years, range 49.5–75.2 years) from a large MRI screening trial in women with dense breasts (ACR BI-RADS category 4). Each MR dataset consisted of 178 or 179 T1-weighted images with voxel size 0.64 × 0.64 × 1.00 mm3. All images (n=16,287) were reviewed and scored by a radiologist. In contrast to volume overlap coefficients, such as DICE, the radiologist detected deviations in the segmented muscle border and determined whether the result would impact the ability to accurately determine the volume of fibroglandular tissue and detection of breast lesions. Results: According to the radiologist’s scores, 95.5% of the slices did not mask breast tissue in such way that it could affect detection of breast lesions or volume measurements. In 13.1% of the slices a deviation in the segmented muscle border was present which would not impact breast lesion detection. In 70 datasets (78%) at least 95% of the slices were segmented in such a way it would not affect detection of breast lesions, and in 60 (66%) datasets this was 100%. Conclusion: Dynamic programming with dedicated heuristics shows promising potential to segment the pectoral muscle in women with dense breasts.« less

  1. FluReF, an automated flu virus reassortment finder based on phylogenetic trees.

    PubMed

    Yurovsky, Alisa; Moret, Bernard M E

    2011-01-01

    Reassortments are events in the evolution of the genome of influenza (flu), whereby segments of the genome are exchanged between different strains. As reassortments have been implicated in major human pandemics of the last century, their identification has become a health priority. While such identification can be done "by hand" on a small dataset, researchers and health authorities are building up enormous databases of genomic sequences for every flu strain, so that it is imperative to develop automated identification methods. However, current methods are limited to pairwise segment comparisons. We present FluReF, a fully automated flu virus reassortment finder. FluReF is inspired by the visual approach to reassortment identification and uses the reconstructed phylogenetic trees of the individual segments and of the full genome. We also present a simple flu evolution simulator, based on the current, source-sink, hypothesis for flu cycles. On synthetic datasets produced by our simulator, FluReF, tuned for a 0% false positive rate, yielded false negative rates of less than 10%. FluReF corroborated two new reassortments identified by visual analysis of 75 Human H3N2 New York flu strains from 2005-2008 and gave partial verification of reassortments found using another bioinformatics method. FluReF finds reassortments by a bottom-up search of the full-genome and segment-based phylogenetic trees for candidate clades--groups of one or more sampled viruses that are separated from the other variants from the same season. Candidate clades in each tree are tested to guarantee confidence values, using the lengths of key edges as well as other tree parameters; clades with reassortments must have validated incongruencies among segment trees. FluReF demonstrates robustness of prediction for geographically and temporally expanded datasets, and is not limited to finding reassortments with previously collected sequences. The complete source code is available from http://lcbb.epfl.ch/software.html.

  2. How Do Academic Disciplines Use PowerPoint?

    ERIC Educational Resources Information Center

    Garrett, Nathan

    2016-01-01

    How do academic disciplines use PowerPoint? This project analyzed PowerPoint files created by an academic publisher to supplement textbooks. An automated analysis of 30,263 files revealed clear differences by disciplines. Single-paradigm "hard" disciplines used less complex writing but had more words than multi-paradigm "soft"…

  3. A multi-dataset data-collection strategy produces better diffraction data

    PubMed Central

    Liu, Zhi-Jie; Chen, Lirong; Wu, Dong; Ding, Wei; Zhang, Hua; Zhou, Weihong; Fu, Zheng-Qing; Wang, Bi-Cheng

    2011-01-01

    A multi-dataset (MDS) data-collection strategy is proposed and analyzed for macromolecular crystal diffraction data acquisition. The theoretical analysis indicated that the MDS strategy can reduce the standard deviation (background noise) of diffraction data compared with the commonly used single-dataset strategy for a fixed X-ray dose. In order to validate the hypothesis experimentally, a data-quality evaluation process, termed a readiness test of the X-ray data-collection system, was developed. The anomalous signals of sulfur atoms in zinc-free insulin crystals were used as the probe to differentiate the quality of data collected using different data-collection strategies. The data-collection results using home-laboratory-based rotating-anode X-ray and synchrotron X-ray systems indicate that the diffraction data collected with the MDS strategy contain more accurate anomalous signals from sulfur atoms than the data collected with a regular data-collection strategy. In addition, the MDS strategy offered more advantages with respect to radiation-damage-sensitive crystals and better usage of rotating-anode as well as synchrotron X-rays. PMID:22011470

  4. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Jurrus, Elizabeth R.; Hodas, Nathan O.; Baker, Nathan A.

    Forensic analysis of nanoparticles is often conducted through the collection and identifi- cation of electron microscopy images to determine the origin of suspected nuclear material. Each image is carefully studied by experts for classification of materials based on texture, shape, and size. Manually inspecting large image datasets takes enormous amounts of time. However, automatic classification of large image datasets is a challenging problem due to the complexity involved in choosing image features, the lack of training data available for effective machine learning methods, and the availability of user interfaces to parse through images. Therefore, a significant need exists for automatedmore » and semi-automated methods to help analysts perform accurate image classification in large image datasets. We present INStINCt, our Intelligent Signature Canvas, as a framework for quickly organizing image data in a web based canvas framework. Images are partitioned using small sets of example images, chosen by users, and presented in an optimal layout based on features derived from convolutional neural networks.« less

  5. Using statistical text classification to identify health information technology incidents

    PubMed Central

    Chai, Kevin E K; Anthony, Stephen; Coiera, Enrico; Magrabi, Farah

    2013-01-01

    Objective To examine the feasibility of using statistical text classification to automatically identify health information technology (HIT) incidents in the USA Food and Drug Administration (FDA) Manufacturer and User Facility Device Experience (MAUDE) database. Design We used a subset of 570 272 incidents including 1534 HIT incidents reported to MAUDE between 1 January 2008 and 1 July 2010. Text classifiers using regularized logistic regression were evaluated with both ‘balanced’ (50% HIT) and ‘stratified’ (0.297% HIT) datasets for training, validation, and testing. Dataset preparation, feature extraction, feature selection, cross-validation, classification, performance evaluation, and error analysis were performed iteratively to further improve the classifiers. Feature-selection techniques such as removing short words and stop words, stemming, lemmatization, and principal component analysis were examined. Measurements κ statistic, F1 score, precision and recall. Results Classification performance was similar on both the stratified (0.954 F1 score) and balanced (0.995 F1 score) datasets. Stemming was the most effective technique, reducing the feature set size to 79% while maintaining comparable performance. Training with balanced datasets improved recall (0.989) but reduced precision (0.165). Conclusions Statistical text classification appears to be a feasible method for identifying HIT reports within large databases of incidents. Automated identification should enable more HIT problems to be detected, analyzed, and addressed in a timely manner. Semi-supervised learning may be necessary when applying machine learning to big data analysis of patient safety incidents and requires further investigation. PMID:23666777

  6. Preliminary performance assessment of biotoxin detection for UWS applications using a MicroChemLab device.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    VanderNoot, Victoria A.; Haroldsen, Brent L.; Renzi, Ronald F.

    2010-03-01

    In a multiyear research agreement with Tenix Investments Pty. Ltd., Sandia has been developing field deployable technologies for detection of biotoxins in water supply systems. The unattended water sensor or UWS employs microfluidic chip based gel electrophoresis for monitoring biological analytes in a small integrated sensor platform. This instrument collects, prepares, and analyzes water samples in an automated manner. Sample analysis is done using the {mu}ChemLab{trademark} analysis module. This report uses analysis results of two datasets collected using the UWS to estimate performance of the device. The first dataset is made up of samples containing ricin at varying concentrations andmore » is used for assessing instrument response and detection probability. The second dataset is comprised of analyses of water samples collected at a water utility which are used to assess the false positive probability. The analyses of the two sets are used to estimate the Receiver Operating Characteristic or ROC curves for the device at one set of operational and detection algorithm parameters. For these parameters and based on a statistical estimate, the ricin probability of detection is about 0.9 at a concentration of 5 nM for a false positive probability of 1 x 10{sup -6}.« less

  7. The sensitivity of the atmospheric branch of the global water cycle to temperature fluctuations at synoptic to decadal time-scales in different satellite- and model-based products

    NASA Astrophysics Data System (ADS)

    Nogueira, Miguel

    2018-02-01

    Spectral analysis of global-mean precipitation, P, evaporation, E, precipitable water, W, and surface temperature, Ts, revealed significant variability from sub-daily to multi-decadal time-scales, superposed on high-amplitude diurnal and yearly peaks. Two distinct regimes emerged from a transition in the spectral exponents, β. The weather regime covering time-scales < 10 days with β ≥ 1; and the macroweather regime extending from a few months to a few decades with 0 <β <1. Additionally, the spectra showed a generally good statistical agreement amongst several different model- and satellite-based datasets. Detrended cross-correlation analysis (DCCA) revealed three important results which are robust across all datasets: (1) Clausius-Clapeyron (C-C) relationship is the dominant mechanism of W non-periodic variability at multi-year time-scales; (2) C-C is not the dominant control of W, P or E non-periodic variability at time-scales below about 6 months, where the weather regime is approached and other mechanisms become important; (3) C-C is not a dominant control for P or E over land throughout the entire time-scale range considered. Furthermore, it is suggested that the atmosphere and oceans start to act as a single coupled system at time-scales > 1-2 years, while at time-scales < 6 months they are not the dominant drivers of each other. For global-ocean and full-globe averages, ρDCCA showed large spread of the C-C importance for P and E variability amongst different datasets at multi-year time-scales, ranging from negligible (< 0.3) to high ( 0.6-0.8) values. Hence, state-of-the-art climate datasets have significant uncertainties in the representation of macroweather precipitation and evaporation variability and its governing mechanisms.

  8. Fully automated macular pathology detection in retina optical coherence tomography images using sparse coding and dictionary learning

    NASA Astrophysics Data System (ADS)

    Sun, Yankui; Li, Shan; Sun, Zhongyang

    2017-01-01

    We propose a framework for automated detection of dry age-related macular degeneration (AMD) and diabetic macular edema (DME) from retina optical coherence tomography (OCT) images, based on sparse coding and dictionary learning. The study aims to improve the classification performance of state-of-the-art methods. First, our method presents a general approach to automatically align and crop retina regions; then it obtains global representations of images by using sparse coding and a spatial pyramid; finally, a multiclass linear support vector machine classifier is employed for classification. We apply two datasets for validating our algorithm: Duke spectral domain OCT (SD-OCT) dataset, consisting of volumetric scans acquired from 45 subjects-15 normal subjects, 15 AMD patients, and 15 DME patients; and clinical SD-OCT dataset, consisting of 678 OCT retina scans acquired from clinics in Beijing-168, 297, and 213 OCT images for AMD, DME, and normal retinas, respectively. For the former dataset, our classifier correctly identifies 100%, 100%, and 93.33% of the volumes with DME, AMD, and normal subjects, respectively, and thus performs much better than the conventional method; for the latter dataset, our classifier leads to a correct classification rate of 99.67%, 99.67%, and 100.00% for DME, AMD, and normal images, respectively.

  9. Fully automated macular pathology detection in retina optical coherence tomography images using sparse coding and dictionary learning.

    PubMed

    Sun, Yankui; Li, Shan; Sun, Zhongyang

    2017-01-01

    We propose a framework for automated detection of dry age-related macular degeneration (AMD) and diabetic macular edema (DME) from retina optical coherence tomography (OCT) images, based on sparse coding and dictionary learning. The study aims to improve the classification performance of state-of-the-art methods. First, our method presents a general approach to automatically align and crop retina regions; then it obtains global representations of images by using sparse coding and a spatial pyramid; finally, a multiclass linear support vector machine classifier is employed for classification. We apply two datasets for validating our algorithm: Duke spectral domain OCT (SD-OCT) dataset, consisting of volumetric scans acquired from 45 subjects—15 normal subjects, 15 AMD patients, and 15 DME patients; and clinical SD-OCT dataset, consisting of 678 OCT retina scans acquired from clinics in Beijing—168, 297, and 213 OCT images for AMD, DME, and normal retinas, respectively. For the former dataset, our classifier correctly identifies 100%, 100%, and 93.33% of the volumes with DME, AMD, and normal subjects, respectively, and thus performs much better than the conventional method; for the latter dataset, our classifier leads to a correct classification rate of 99.67%, 99.67%, and 100.00% for DME, AMD, and normal images, respectively.

  10. HYDRA Hyperspectral Data Research Application Tom Rink and Tom Whittaker

    NASA Astrophysics Data System (ADS)

    Rink, T.; Whittaker, T.

    2005-12-01

    HYDRA is a freely available, easy to install tool for visualization and analysis of large local or remote hyper/multi-spectral datasets. HYDRA is implemented on top of the open source VisAD Java library via Jython - the Java implementation of the user friendly Python programming language. VisAD provides data integration, through its generalized data model, user-display interaction and display rendering. Jython has an easy to read, concise, scripting-like, syntax which eases software development. HYDRA allows data sharing of large datasets through its support of the OpenDAP and OpenADDE server-client protocols. The users can explore and interrogate data, and subset in physical and/or spectral space to isolate key areas of interest for further analysis without having to download an entire dataset. It also has an extensible data input architecture to recognize new instruments and understand different local file formats, currently NetCDF and HDF4 are supported.

  11. Volumetric analysis of pelvic hematomas after blunt trauma using semi-automated seeded region growing segmentation: a method validation study.

    PubMed

    Dreizin, David; Bodanapally, Uttam K; Neerchal, Nagaraj; Tirada, Nikki; Patlas, Michael; Herskovits, Edward

    2016-11-01

    Manually segmented traumatic pelvic hematoma volumes are strongly predictive of active bleeding at conventional angiography, but the method is time intensive, limiting its clinical applicability. We compared volumetric analysis using semi-automated region growing segmentation to manual segmentation and diameter-based size estimates in patients with pelvic hematomas after blunt pelvic trauma. A 14-patient cohort was selected in an anonymous randomized fashion from a dataset of patients with pelvic binders at MDCT, collected retrospectively as part of a HIPAA-compliant IRB-approved study from January 2008 to December 2013. To evaluate intermethod differences, one reader (R1) performed three volume measurements using the manual technique and three volume measurements using the semi-automated technique. To evaluate interobserver differences for semi-automated segmentation, a second reader (R2) performed three semi-automated measurements. One-way analysis of variance was used to compare differences in mean volumes. Time effort was also compared. Correlation between the two methods as well as two shorthand appraisals (greatest diameter, and the ABC/2 method for estimating ellipsoid volumes) was assessed with Spearman's rho (r). Intraobserver variability was lower for semi-automated compared to manual segmentation, with standard deviations ranging between ±5-32 mL and ±17-84 mL, respectively (p = 0.0003). There was no significant difference in mean volumes between the two readers' semi-automated measurements (p = 0.83); however, means were lower for the semi-automated compared with the manual technique (manual: mean and SD 309.6 ± 139 mL; R1 semi-auto: 229.6 ± 88.2 mL, p = 0.004; R2 semi-auto: 243.79 ± 99.7 mL, p = 0.021). Despite differences in means, the correlation between the two methods was very strong and highly significant (r = 0.91, p < 0.001). Correlations with diameter-based methods were only moderate and nonsignificant. Mean semi-automated segmentation time effort was 2 min and 6 s and 2 min and 35 s for R1 and R2, respectively, vs. 22 min and 8 s for manual segmentation. Semi-automated pelvic hematoma volumes correlate strongly with manually segmented volumes. Since semi-automated segmentation can be performed reliably and efficiently, volumetric analysis of traumatic pelvic hematomas is potentially valuable at the point-of-care.

  12. Automated Geo/Co-Registration of Multi-Temporal Very-High-Resolution Imagery.

    PubMed

    Han, Youkyung; Oh, Jaehong

    2018-05-17

    For time-series analysis using very-high-resolution (VHR) multi-temporal satellite images, both accurate georegistration to the map coordinates and subpixel-level co-registration among the images should be conducted. However, applying well-known matching methods, such as scale-invariant feature transform and speeded up robust features for VHR multi-temporal images, has limitations. First, they cannot be used for matching an optical image to heterogeneous non-optical data for georegistration. Second, they produce a local misalignment induced by differences in acquisition conditions, such as acquisition platform stability, the sensor's off-nadir angle, and relief displacement of the considered scene. Therefore, this study addresses the problem by proposing an automated geo/co-registration framework for full-scene multi-temporal images acquired from a VHR optical satellite sensor. The proposed method comprises two primary steps: (1) a global georegistration process, followed by (2) a fine co-registration process. During the first step, two-dimensional multi-temporal satellite images are matched to three-dimensional topographic maps to assign the map coordinates. During the second step, a local analysis of registration noise pixels extracted between the multi-temporal images that have been mapped to the map coordinates is conducted to extract a large number of well-distributed corresponding points (CPs). The CPs are finally used to construct a non-rigid transformation function that enables minimization of the local misalignment existing among the images. Experiments conducted on five Kompsat-3 full scenes confirmed the effectiveness of the proposed framework, showing that the georegistration performance resulted in an approximately pixel-level accuracy for most of the scenes, and the co-registration performance further improved the results among all combinations of the georegistered Kompsat-3 image pairs by increasing the calculated cross-correlation values.

  13. The CMS dataset bookkeeping service

    NASA Astrophysics Data System (ADS)

    Afaq, A.; Dolgert, A.; Guo, Y.; Jones, C.; Kosyakov, S.; Kuznetsov, V.; Lueking, L.; Riley, D.; Sekhri, V.

    2008-07-01

    The CMS Dataset Bookkeeping Service (DBS) has been developed to catalog all CMS event data from Monte Carlo and Detector sources. It provides the ability to identify MC or trigger source, track data provenance, construct datasets for analysis, and discover interesting data. CMS requires processing and analysis activities at various service levels and the DBS system provides support for localized processing or private analysis, as well as global access for CMS users at large. Catalog entries can be moved among the various service levels with a simple set of migration tools, thus forming a loose federation of databases. DBS is available to CMS users via a Python API, Command Line, and a Discovery web page interfaces. The system is built as a multi-tier web application with Java servlets running under Tomcat, with connections via JDBC to Oracle or MySQL database backends. Clients connect to the service through HTTP or HTTPS with authentication provided by GRID certificates and authorization through VOMS. DBS is an integral part of the overall CMS Data Management and Workflow Management systems.

  14. KNIME4NGS: a comprehensive toolbox for next generation sequencing analysis.

    PubMed

    Hastreiter, Maximilian; Jeske, Tim; Hoser, Jonathan; Kluge, Michael; Ahomaa, Kaarin; Friedl, Marie-Sophie; Kopetzky, Sebastian J; Quell, Jan-Dominik; Mewes, H Werner; Küffner, Robert

    2017-05-15

    Analysis of Next Generation Sequencing (NGS) data requires the processing of large datasets by chaining various tools with complex input and output formats. In order to automate data analysis, we propose to standardize NGS tasks into modular workflows. This simplifies reliable handling and processing of NGS data, and corresponding solutions become substantially more reproducible and easier to maintain. Here, we present a documented, linux-based, toolbox of 42 processing modules that are combined to construct workflows facilitating a variety of tasks such as DNAseq and RNAseq analysis. We also describe important technical extensions. The high throughput executor (HTE) helps to increase the reliability and to reduce manual interventions when processing complex datasets. We also provide a dedicated binary manager that assists users in obtaining the modules' executables and keeping them up to date. As basis for this actively developed toolbox we use the workflow management software KNIME. See http://ibisngs.github.io/knime4ngs for nodes and user manual (GPLv3 license). robert.kueffner@helmholtz-muenchen.de. Supplementary data are available at Bioinformatics online.

  15. RepExplore: addressing technical replicate variance in proteomics and metabolomics data analysis.

    PubMed

    Glaab, Enrico; Schneider, Reinhard

    2015-07-01

    High-throughput omics datasets often contain technical replicates included to account for technical sources of noise in the measurement process. Although summarizing these replicate measurements by using robust averages may help to reduce the influence of noise on downstream data analysis, the information on the variance across the replicate measurements is lost in the averaging process and therefore typically disregarded in subsequent statistical analyses.We introduce RepExplore, a web-service dedicated to exploit the information captured in the technical replicate variance to provide more reliable and informative differential expression and abundance statistics for omics datasets. The software builds on previously published statistical methods, which have been applied successfully to biomedical omics data but are difficult to use without prior experience in programming or scripting. RepExplore facilitates the analysis by providing a fully automated data processing and interactive ranking tables, whisker plot, heat map and principal component analysis visualizations to interpret omics data and derived statistics. Freely available at http://www.repexplore.tk enrico.glaab@uni.lu Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press.

  16. iClimate: a climate data and analysis portal

    NASA Astrophysics Data System (ADS)

    Goodman, P. J.; Russell, J. L.; Merchant, N.; Miller, S. J.; Juneja, A.

    2015-12-01

    We will describe a new climate data and analysis portal called iClimate that facilitates direct comparisons between available climate observations and climate simulations. Modeled after the successful iPlant Collaborative Discovery Environment (www.iplantcollaborative.org) that allows plant scientists to trade and share environmental, physiological and genetic data and analyses, iClimate provides an easy-to-use platform for large-scale climate research, including the storage, sharing, automated preprocessing, analysis and high-end visualization of large and often disparate observational and model datasets. iClimate will promote data exploration and scientific discovery by providing: efficient and high-speed transfer of data from nodes around the globe (e.g. PCMDI and NASA); standardized and customized data/model metrics; efficient subsampling of datasets based on temporal period, geographical region or variable; and collaboration tools for sharing data, workflows, analysis results, and data visualizations with collaborators or with the community at large. We will present iClimate's capabilities, and demonstrate how it will simplify and enhance the ability to do basic or cutting-edge climate research by professionals, laypeople and students.

  17. Adverse drug events with hyperkalaemia during inpatient stays: evaluation of an automated method for retrospective detection in hospital databases

    PubMed Central

    2014-01-01

    Background Adverse drug reactions and adverse drug events (ADEs) are major public health issues. Many different prospective tools for the automated detection of ADEs in hospital databases have been developed and evaluated. The objective of the present study was to evaluate an automated method for the retrospective detection of ADEs with hyperkalaemia during inpatient stays. Methods We used a set of complex detection rules to take account of the patient’s clinical and biological context and the chronological relationship between the causes and the expected outcome. The dataset consisted of 3,444 inpatient stays in a French general hospital. An automated review was performed for all data and the results were compared with those of an expert chart review. The complex detection rules’ analytical quality was evaluated for ADEs. Results In terms of recall, 89.5% of ADEs with hyperkalaemia “with or without an abnormal symptom” were automatically identified (including all three serious ADEs). In terms of precision, 63.7% of the automatically identified ADEs with hyperkalaemia were true ADEs. Conclusions The use of context-sensitive rules appears to improve the automated detection of ADEs with hyperkalaemia. This type of tool may have an important role in pharmacoepidemiology via the routine analysis of large inter-hospital databases. PMID:25212108

  18. Adverse drug events with hyperkalaemia during inpatient stays: evaluation of an automated method for retrospective detection in hospital databases.

    PubMed

    Ficheur, Grégoire; Chazard, Emmanuel; Beuscart, Jean-Baptiste; Merlin, Béatrice; Luyckx, Michel; Beuscart, Régis

    2014-09-12

    Adverse drug reactions and adverse drug events (ADEs) are major public health issues. Many different prospective tools for the automated detection of ADEs in hospital databases have been developed and evaluated. The objective of the present study was to evaluate an automated method for the retrospective detection of ADEs with hyperkalaemia during inpatient stays. We used a set of complex detection rules to take account of the patient's clinical and biological context and the chronological relationship between the causes and the expected outcome. The dataset consisted of 3,444 inpatient stays in a French general hospital. An automated review was performed for all data and the results were compared with those of an expert chart review. The complex detection rules' analytical quality was evaluated for ADEs. In terms of recall, 89.5% of ADEs with hyperkalaemia "with or without an abnormal symptom" were automatically identified (including all three serious ADEs). In terms of precision, 63.7% of the automatically identified ADEs with hyperkalaemia were true ADEs. The use of context-sensitive rules appears to improve the automated detection of ADEs with hyperkalaemia. This type of tool may have an important role in pharmacoepidemiology via the routine analysis of large inter-hospital databases.

  19. Multi-scale pixel-based image fusion using multivariate empirical mode decomposition.

    PubMed

    Rehman, Naveed ur; Ehsan, Shoaib; Abdullah, Syed Muhammad Umer; Akhtar, Muhammad Jehanzaib; Mandic, Danilo P; McDonald-Maier, Klaus D

    2015-05-08

    A novel scheme to perform the fusion of multiple images using the multivariate empirical mode decomposition (MEMD) algorithm is proposed. Standard multi-scale fusion techniques make a priori assumptions regarding input data, whereas standard univariate empirical mode decomposition (EMD)-based fusion techniques suffer from inherent mode mixing and mode misalignment issues, characterized respectively by either a single intrinsic mode function (IMF) containing multiple scales or the same indexed IMFs corresponding to multiple input images carrying different frequency information. We show that MEMD overcomes these problems by being fully data adaptive and by aligning common frequency scales from multiple channels, thus enabling their comparison at a pixel level and subsequent fusion at multiple data scales. We then demonstrate the potential of the proposed scheme on a large dataset of real-world multi-exposure and multi-focus images and compare the results against those obtained from standard fusion algorithms, including the principal component analysis (PCA), discrete wavelet transform (DWT) and non-subsampled contourlet transform (NCT). A variety of image fusion quality measures are employed for the objective evaluation of the proposed method. We also report the results of a hypothesis testing approach on our large image dataset to identify statistically-significant performance differences.

  20. Multi-Scale Pixel-Based Image Fusion Using Multivariate Empirical Mode Decomposition

    PubMed Central

    Rehman, Naveed ur; Ehsan, Shoaib; Abdullah, Syed Muhammad Umer; Akhtar, Muhammad Jehanzaib; Mandic, Danilo P.; McDonald-Maier, Klaus D.

    2015-01-01

    A novel scheme to perform the fusion of multiple images using the multivariate empirical mode decomposition (MEMD) algorithm is proposed. Standard multi-scale fusion techniques make a priori assumptions regarding input data, whereas standard univariate empirical mode decomposition (EMD)-based fusion techniques suffer from inherent mode mixing and mode misalignment issues, characterized respectively by either a single intrinsic mode function (IMF) containing multiple scales or the same indexed IMFs corresponding to multiple input images carrying different frequency information. We show that MEMD overcomes these problems by being fully data adaptive and by aligning common frequency scales from multiple channels, thus enabling their comparison at a pixel level and subsequent fusion at multiple data scales. We then demonstrate the potential of the proposed scheme on a large dataset of real-world multi-exposure and multi-focus images and compare the results against those obtained from standard fusion algorithms, including the principal component analysis (PCA), discrete wavelet transform (DWT) and non-subsampled contourlet transform (NCT). A variety of image fusion quality measures are employed for the objective evaluation of the proposed method. We also report the results of a hypothesis testing approach on our large image dataset to identify statistically-significant performance differences. PMID:26007714

  1. Multi-Atlas Segmentation using Partially Annotated Data: Methods and Annotation Strategies.

    PubMed

    Koch, Lisa M; Rajchl, Martin; Bai, Wenjia; Baumgartner, Christian F; Tong, Tong; Passerat-Palmbach, Jonathan; Aljabar, Paul; Rueckert, Daniel

    2017-08-22

    Multi-atlas segmentation is a widely used tool in medical image analysis, providing robust and accurate results by learning from annotated atlas datasets. However, the availability of fully annotated atlas images for training is limited due to the time required for the labelling task. Segmentation methods requiring only a proportion of each atlas image to be labelled could therefore reduce the workload on expert raters tasked with annotating atlas images. To address this issue, we first re-examine the labelling problem common in many existing approaches and formulate its solution in terms of a Markov Random Field energy minimisation problem on a graph connecting atlases and the target image. This provides a unifying framework for multi-atlas segmentation. We then show how modifications in the graph configuration of the proposed framework enable the use of partially annotated atlas images and investigate different partial annotation strategies. The proposed method was evaluated on two Magnetic Resonance Imaging (MRI) datasets for hippocampal and cardiac segmentation. Experiments were performed aimed at (1) recreating existing segmentation techniques with the proposed framework and (2) demonstrating the potential of employing sparsely annotated atlas data for multi-atlas segmentation.

  2. ConTour: Data-Driven Exploration of Multi-Relational Datasets for Drug Discovery.

    PubMed

    Partl, Christian; Lex, Alexander; Streit, Marc; Strobelt, Hendrik; Wassermann, Anne-Mai; Pfister, Hanspeter; Schmalstieg, Dieter

    2014-12-01

    Large scale data analysis is nowadays a crucial part of drug discovery. Biologists and chemists need to quickly explore and evaluate potentially effective yet safe compounds based on many datasets that are in relationship with each other. However, there is a lack of tools that support them in these processes. To remedy this, we developed ConTour, an interactive visual analytics technique that enables the exploration of these complex, multi-relational datasets. At its core ConTour lists all items of each dataset in a column. Relationships between the columns are revealed through interaction: selecting one or multiple items in one column highlights and re-sorts the items in other columns. Filters based on relationships enable drilling down into the large data space. To identify interesting items in the first place, ConTour employs advanced sorting strategies, including strategies based on connectivity strength and uniqueness, as well as sorting based on item attributes. ConTour also introduces interactive nesting of columns, a powerful method to show the related items of a child column for each item in the parent column. Within the columns, ConTour shows rich attribute data about the items as well as information about the connection strengths to other datasets. Finally, ConTour provides a number of detail views, which can show items from multiple datasets and their associated data at the same time. We demonstrate the utility of our system in case studies conducted with a team of chemical biologists, who investigate the effects of chemical compounds on cells and need to understand the underlying mechanisms.

  3. Multi-class computational evolution: development, benchmark evaluation and application to RNA-Seq biomarker discovery.

    PubMed

    Crabtree, Nathaniel M; Moore, Jason H; Bowyer, John F; George, Nysia I

    2017-01-01

    A computational evolution system (CES) is a knowledge discovery engine that can identify subtle, synergistic relationships in large datasets. Pareto optimization allows CESs to balance accuracy with model complexity when evolving classifiers. Using Pareto optimization, a CES is able to identify a very small number of features while maintaining high classification accuracy. A CES can be designed for various types of data, and the user can exploit expert knowledge about the classification problem in order to improve discrimination between classes. These characteristics give CES an advantage over other classification and feature selection algorithms, particularly when the goal is to identify a small number of highly relevant, non-redundant biomarkers. Previously, CESs have been developed only for binary class datasets. In this study, we developed a multi-class CES. The multi-class CES was compared to three common feature selection and classification algorithms: support vector machine (SVM), random k-nearest neighbor (RKNN), and random forest (RF). The algorithms were evaluated on three distinct multi-class RNA sequencing datasets. The comparison criteria were run-time, classification accuracy, number of selected features, and stability of selected feature set (as measured by the Tanimoto distance). The performance of each algorithm was data-dependent. CES performed best on the dataset with the smallest sample size, indicating that CES has a unique advantage since the accuracy of most classification methods suffer when sample size is small. The multi-class extension of CES increases the appeal of its application to complex, multi-class datasets in order to identify important biomarkers and features.

  4. Phonetic Variation and Interactional Contingencies in Simultaneous Responses

    ERIC Educational Resources Information Center

    Walker, Gareth

    2016-01-01

    An auspicious but unexplored environment for studying phonetic variation in naturalistic interaction is where two or more participants say the same thing at the same time. Working with a core dataset built from the multimodal Augmented Multi-party Interaction corpus, the principles of conversation analysis were followed to analyze the sequential…

  5. CPTAC Develops LinkedOmics – Public Web Portal to Analyze Multi-Omics Data Within and Across Cancer Types | Office of Cancer Clinical Proteomics Research

    Cancer.gov

    Multi-omics analysis has grown in popularity among biomedical researchers given the comprehensive characterization of thousands of molecular attributes in addition to clinical attributes. Several data portals have been devised to make these datasets directly available to the cancer research community. However, none of the existing data portals allow systematic exploration and interpretation of the complex relationships between the vast amount of clinical and molecular attributes. CPTAC investigator Dr.

  6. Dynamic analysis of apoptosis using cyanine SYTO probes: From classical to microfluidic cytometry

    PubMed Central

    Wlodkowic, Donald; Skommer, Joanna; Faley, Shannon; Darzynkiewicz, Zbigniew; Cooper, Jonathan M.

    2013-01-01

    Cell death is a stochastic process, often initiated and/or executed in a multi-pathway/multi-organelle fashion. Therefore, high-throughput single-cell analysis platforms are required to provide detailed characterization of kinetics and mechanisms of cell death in heterogeneous cell populations. However, there is still a largely unmet need for inert fluorescent probes, suitable for prolonged kinetic studies. Here, we compare the use of innovative adaptation of unsymmetrical SYTO dyes for dynamic real-time analysis of apoptosis in conventional as well as microfluidic chip-based systems. We show that cyanine SYTO probes allow non-invasive tracking of intracellular events over extended time. Easy handling and “stain–no wash” protocols open up new opportunities for high-throughput analysis and live-cell sorting. Furthermore, SYTO probes are easily adaptable for detection of cell death using automated microfluidic chip-based cytometry. Overall, the combined use of SYTO probes and state-of-the-art Lab-on-a-Chip platform emerges as a cost effective solution for automated drug screening compared to conventional Annexin V or TUNEL assays. In particular, it should allow for dynamic analysis of samples where low cell number has so far been an obstacle, e.g. primary cancer stems cells or circulating minimal residual tumors. PMID:19298813

  7. An automated, high-throughput plant phenotyping system using machine learning-based plant segmentation and image analysis.

    PubMed

    Lee, Unseok; Chang, Sungyul; Putra, Gian Anantrio; Kim, Hyoungseok; Kim, Dong Hwan

    2018-01-01

    A high-throughput plant phenotyping system automatically observes and grows many plant samples. Many plant sample images are acquired by the system to determine the characteristics of the plants (populations). Stable image acquisition and processing is very important to accurately determine the characteristics. However, hardware for acquiring plant images rapidly and stably, while minimizing plant stress, is lacking. Moreover, most software cannot adequately handle large-scale plant imaging. To address these problems, we developed a new, automated, high-throughput plant phenotyping system using simple and robust hardware, and an automated plant-imaging-analysis pipeline consisting of machine-learning-based plant segmentation. Our hardware acquires images reliably and quickly and minimizes plant stress. Furthermore, the images are processed automatically. In particular, large-scale plant-image datasets can be segmented precisely using a classifier developed using a superpixel-based machine-learning algorithm (Random Forest), and variations in plant parameters (such as area) over time can be assessed using the segmented images. We performed comparative evaluations to identify an appropriate learning algorithm for our proposed system, and tested three robust learning algorithms. We developed not only an automatic analysis pipeline but also a convenient means of plant-growth analysis that provides a learning data interface and visualization of plant growth trends. Thus, our system allows end-users such as plant biologists to analyze plant growth via large-scale plant image data easily.

  8. Automated Slicing for a Multi-Axis Metal Deposition System (Preprint)

    DTIC Science & Technology

    2006-09-01

    experimented with different materials like H13 tool steel to build the part. Following the same slicing and scanning toolpath result, there is a geometric...and analysis tool -centroidal axis. Similar to medial axis, it contains geometry and topological information but is significantly computationally...geometry reasoning and analysis tool -centroidal axis. Similar to medial axis, it contains geometry and topological information but is significantly

  9. a Multi-Data Source and Multi-Sensor Approach for the 3d Reconstruction and Visualization of a Complex Archaelogical Site: the Case Study of Tolmo de Minateda

    NASA Astrophysics Data System (ADS)

    Torres-Martínez, J. A.; Seddaiu, M.; Rodríguez-Gonzálvez, P.; Hernández-López, D.; González-Aguilera, D.

    2015-02-01

    The complexity of archaeological sites hinders to get an integral modelling using the actual Geomatic techniques (i.e. aerial, closerange photogrammetry and terrestrial laser scanner) individually, so a multi-sensor approach is proposed as the best solution to provide a 3D reconstruction and visualization of these complex sites. Sensor registration represents a riveting milestone when automation is required and when aerial and terrestrial dataset must be integrated. To this end, several problems must be solved: coordinate system definition, geo-referencing, co-registration of point clouds, geometric and radiometric homogeneity, etc. Last but not least, safeguarding of tangible archaeological heritage and its associated intangible expressions entails a multi-source data approach in which heterogeneous material (historical documents, drawings, archaeological techniques, habit of living, etc.) should be collected and combined with the resulting hybrid 3D of "Tolmo de Minateda" located models. The proposed multi-data source and multi-sensor approach is applied to the study case of "Tolmo de Minateda" archaeological site. A total extension of 9 ha is reconstructed, with an adapted level of detail, by an ultralight aerial platform (paratrike), an unmanned aerial vehicle, a terrestrial laser scanner and terrestrial photogrammetry. In addition, the own defensive nature of the site (i.e. with the presence of three different defensive walls) together with the considerable stratification of the archaeological site (i.e. with different archaeological surfaces and constructive typologies) require that tangible and intangible archaeological heritage expressions can be integrated with the hybrid 3D models obtained, to analyse, understand and exploit the archaeological site by different experts and heritage stakeholders.

  10. A robust automated left ventricle region of interest localization technique using a cardiac cine MRI atlas

    NASA Astrophysics Data System (ADS)

    Ben-Zikri, Yehuda Kfir; Linte, Cristian A.

    2016-03-01

    Region of interest detection is a precursor to many medical image processing and analysis applications, including segmentation, registration and other image manipulation techniques. The optimal region of interest is often selected manually, based on empirical knowledge and features of the image dataset. However, if inconsistently identified, the selected region of interest may greatly affect the subsequent image analysis or interpretation steps, in turn leading to incomplete assessment during computer-aided diagnosis or incomplete visualization or identification of the surgical targets, if employed in the context of pre-procedural planning or image-guided interventions. Therefore, the need for robust, accurate and computationally efficient region of interest localization techniques is prevalent in many modern computer-assisted diagnosis and therapy applications. Here we propose a fully automated, robust, a priori learning-based approach that provides reliable estimates of the left and right ventricle features from cine cardiac MR images. The proposed approach leverages the temporal frame-to-frame motion extracted across a range of short axis left ventricle slice images with small training set generated from les than 10% of the population. This approach is based on histogram of oriented gradients features weighted by local intensities to first identify an initial region of interest depicting the left and right ventricles that exhibits the greatest extent of cardiac motion. This region is correlated with the homologous region that belongs to the training dataset that best matches the test image using feature vector correlation techniques. Lastly, the optimal left ventricle region of interest of the test image is identified based on the correlation of known ground truth segmentations associated with the training dataset deemed closest to the test image. The proposed approach was tested on a population of 100 patient datasets and was validated against the ground truth region of interest of the test images manually annotated by experts. This tool successfully identified a mask around the LV and RV and furthermore the minimal region of interest around the LV that fully enclosed the left ventricle from all testing datasets, yielding a 98% overlap with their corresponding ground truth. The achieved mean absolute distance error between the two contours that normalized by the radius of the ground truth is 0.20 +/- 0.09.

  11. Benchmark of Machine Learning Methods for Classification of a SENTINEL-2 Image

    NASA Astrophysics Data System (ADS)

    Pirotti, F.; Sunar, F.; Piragnolo, M.

    2016-06-01

    Thanks to mainly ESA and USGS, a large bulk of free images of the Earth is readily available nowadays. One of the main goals of remote sensing is to label images according to a set of semantic categories, i.e. image classification. This is a very challenging issue since land cover of a specific class may present a large spatial and spectral variability and objects may appear at different scales and orientations. In this study, we report the results of benchmarking 9 machine learning algorithms tested for accuracy and speed in training and classification of land-cover classes in a Sentinel-2 dataset. The following machine learning methods (MLM) have been tested: linear discriminant analysis, k-nearest neighbour, random forests, support vector machines, multi layered perceptron, multi layered perceptron ensemble, ctree, boosting, logarithmic regression. The validation is carried out using a control dataset which consists of an independent classification in 11 land-cover classes of an area about 60 km2, obtained by manual visual interpretation of high resolution images (20 cm ground sampling distance) by experts. In this study five out of the eleven classes are used since the others have too few samples (pixels) for testing and validating subsets. The classes used are the following: (i) urban (ii) sowable areas (iii) water (iv) tree plantations (v) grasslands. Validation is carried out using three different approaches: (i) using pixels from the training dataset (train), (ii) using pixels from the training dataset and applying cross-validation with the k-fold method (kfold) and (iii) using all pixels from the control dataset. Five accuracy indices are calculated for the comparison between the values predicted with each model and control values over three sets of data: the training dataset (train), the whole control dataset (full) and with k-fold cross-validation (kfold) with ten folds. Results from validation of predictions of the whole dataset (full) show the random forests method with the highest values; kappa index ranging from 0.55 to 0.42 respectively with the most and least number pixels for training. The two neural networks (multi layered perceptron and its ensemble) and the support vector machines - with default radial basis function kernel - methods follow closely with comparable performance.

  12. REBOCOL (Robotic Calorimetry): An automated NDA (Nondestructive assay) calorimetry and gamma isotopic system

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hurd, J.R.; Bonner, C.A.; Ostenak, C.A.

    1989-01-01

    ROBOCAL, which is presently being developed and tested at Los Alamos National Laboratory, is a full-scale, prototypical robotic system, for remote calorimetric and gamma-ray analysis of special nuclear materials. It integrates a fully automated, multi-drawer, vertical stacker-retriever system for staging unmeasured nuclear materials, and a fully automated gantry robot for computer-based selection and transfer of nuclear materials to calorimetric and gamma-ray measurement stations. Since ROBOCAL is designed for minimal operator intervention, a completely programmed user interface and data-base system are provided to interact with the automated mechanical and assay systems. The assay system is designed to completely integrate calorimetric andmore » gamma-ray data acquisition and to perform state-of-the-art analyses on both homogeneous and heterogeneous distributions of nuclear materials in a wide variety of matrices. 10 refs., 10 figs., 4 tabs.« less

  13. Towards human behavior recognition based on spatio temporal features and support vector machines

    NASA Astrophysics Data System (ADS)

    Ghabri, Sawsen; Ouarda, Wael; Alimi, Adel M.

    2017-03-01

    Security and surveillance are vital issues in today's world. The recent acts of terrorism have highlighted the urgent need for efficient surveillance. There is indeed a need for an automated system for video surveillance which can detect identity and activity of person. In this article, we propose a new paradigm to recognize an aggressive human behavior such as boxing action. Our proposed system for human activity detection includes the use of a fusion between Spatio Temporal Interest Point (STIP) and Histogram of Oriented Gradient (HoG) features. The novel feature called Spatio Temporal Histogram Oriented Gradient (STHOG). To evaluate the robustness of our proposed paradigm with a local application of HoG technique on STIP points, we made experiments on KTH human action dataset based on Multi Class Support Vector Machines classification. The proposed scheme outperforms basic descriptors like HoG and STIP to achieve 82.26% us an accuracy value of classification rate.

  14. Automatic layout of structured hierarchical reports.

    PubMed

    Bakke, Eirik; Karger, David R; Miller, Robert C

    2013-12-01

    Domain-specific database applications tend to contain a sizable number of table-, form-, and report-style views that must each be designed and maintained by a software developer. A significant part of this job is the necessary tweaking of low-level presentation details such as label placements, text field dimensions, list or table styles, and so on. In this paper, we present a horizontally constrained layout management algorithm that automates the display of structured hierarchical data using the traditional visual idioms of hand-designed database UIs: tables, multi-column forms, and outline-style indented lists. We compare our system with pure outline and nested table layouts with respect to space efficiency and readability, the latter with an online user study on 27 subjects. Our layouts are 3.9 and 1.6 times more compact on average than outline layouts and horizontally unconstrained table layouts, respectively, and are as readable as table layouts even for large datasets.

  15. Re-identification of persons in multi-camera surveillance under varying viewpoints and illumination

    NASA Astrophysics Data System (ADS)

    Bouma, Henri; Borsboom, Sander; den Hollander, Richard J. M.; Landsmeer, Sander H.; Worring, Marcel

    2012-06-01

    The capability to track individuals in CCTV cameras is important for surveillance and forensics alike. However, it is laborious to do over multiple cameras. Therefore, an automated system is desirable. In literature several methods have been proposed, but their robustness against varying viewpoints and illumination is limited. Hence performance in realistic settings is also limited. In this paper, we present a novel method for the automatic re-identification of persons in video from surveillance cameras in a realistic setting. The method is computationally efficient, robust to a wide variety of viewpoints and illumination, simple to implement and it requires no training. We compare the performance of our method to several state-of-the-art methods on a publically available dataset that contains the variety of viewpoints and illumination to allow benchmarking. The results indicate that our method shows good performance and enables a human operator to track persons five times faster.

  16. Voxel-based morphometry and automated lobar volumetry: The trade-off between spatial scale and statistical correction

    PubMed Central

    Voormolen, Eduard H.J.; Wei, Corie; Chow, Eva W.C.; Bassett, Anne S.; Mikulis, David J.; Crawley, Adrian P.

    2011-01-01

    Voxel-based morphometry (VBM) and automated lobar region of interest (ROI) volumetry are comprehensive and fast methods to detect differences in overall brain anatomy on magnetic resonance images. However, VBM and automated lobar ROI volumetry have detected dissimilar gray matter differences within identical image sets in our own experience and in previous reports. To gain more insight into how diverging results arise and to attempt to establish whether one method is superior to the other, we investigated how differences in spatial scale and in the need to statistically correct for multiple spatial comparisons influence the relative sensitivity of either technique to group differences in gray matter volumes. We assessed the performance of both techniques on a small dataset containing simulated gray matter deficits and additionally on a dataset of 22q11-deletion syndrome patients with schizophrenia (22q11DS-SZ) vs. matched controls. VBM was more sensitive to simulated focal deficits compared to automated ROI volumetry, and could detect global cortical deficits equally well. Moreover, theoretical calculations of VBM and ROI detection sensitivities to focal deficits showed that at increasing ROI size, ROI volumetry suffers more from loss in sensitivity than VBM. Furthermore, VBM and automated ROI found corresponding GM deficits in 22q11DS-SZ patients, except in the parietal lobe. Here, automated lobar ROI volumetry found a significant deficit only after a smaller subregion of interest was employed. Thus, sensitivity to focal differences is impaired relatively more by averaging over larger volumes in automated ROI methods than by the correction for multiple comparisons in VBM. These findings indicate that VBM is to be preferred over automated lobar-scale ROI volumetry for assessing gray matter volume differences between groups. PMID:19619660

  17. Mapping moderate-scale land-cover over very large geographic areas within a collaborative framework: A case study of the Southwest Regional Gap Analysis Project (SWReGAP)

    USGS Publications Warehouse

    Lowry, J.; Ramsey, R.D.; Thomas, K.; Schrupp, D.; Sajwaj, T.; Kirby, J.; Waller, E.; Schrader, S.; Falzarano, S.; Langs, L.; Manis, G.; Wallace, C.; Schulz, K.; Comer, P.; Pohs, K.; Rieth, W.; Velasquez, C.; Wolk, B.; Kepner, W.; Boykin, K.; O'Brien, L.; Bradford, D.; Thompson, B.; Prior-Magee, J.

    2007-01-01

    Land-cover mapping efforts within the USGS Gap Analysis Program have traditionally been state-centered; each state having the responsibility of implementing a project design for the geographic area within their state boundaries. The Southwest Regional Gap Analysis Project (SWReGAP) was the first formal GAP project designed at a regional, multi-state scale. The project area comprises the southwestern states of Arizona, Colorado, Nevada, New Mexico, and Utah. The land-cover map/dataset was generated using regionally consistent geospatial data (Landsat ETM+ imagery (1999-2001) and DEM derivatives), similar field data collection protocols, a standardized land-cover legend, and a common modeling approach (decision tree classifier). Partitioning of mapping responsibilities amongst the five collaborating states was organized around ecoregion-based "mapping zones". Over the course of 21/2 field seasons approximately 93,000 reference samples were collected directly, or obtained from other contemporary projects, for the land-cover modeling effort. The final map was made public in 2004 and contains 125 land-cover classes. An internal validation of 85 of the classes, representing 91% of the land area was performed. Agreement between withheld samples and the validated dataset was 61% (KHAT = .60, n = 17,030). This paper presents an overview of the methodologies used to create the regional land-cover dataset and highlights issues associated with large-area mapping within a coordinated, multi-institutional management framework. ?? 2006 Elsevier Inc. All rights reserved.

  18. Automated brainstem co-registration (ABC) for MRI.

    PubMed

    Napadow, Vitaly; Dhond, Rupali; Kennedy, David; Hui, Kathleen K S; Makris, Nikos

    2006-09-01

    Group data analysis in brainstem neuroimaging is predicated on accurate co-registration of anatomy. As the brainstem is comprised of many functionally heterogeneous nuclei densely situated adjacent to one another, relatively small errors in co-registration can manifest in increased variance or decreased sensitivity (or significance) in detecting activations. We have devised a 2-stage automated, reference mask guided registration technique (Automated Brainstem Co-registration, or ABC) for improved brainstem co-registration. Our approach utilized a brainstem mask dataset to weight an automated co-registration cost function. Our method was validated through measurement of RMS error at 12 manually defined landmarks. These landmarks were also used as guides for a secondary manual co-registration option, intended for outlier individuals that may not adequately co-register with our automated method. Our methodology was tested on 10 healthy human subjects and compared to traditional co-registration techniques (Talairach transform and automated affine transform to the MNI-152 template). We found that ABC had a significantly lower mean RMS error (1.22 +/- 0.39 mm) than Talairach transform (2.88 +/- 1.22 mm, mu +/- sigma) and the global affine (3.26 +/- 0.81 mm) method. Improved accuracy was also found for our manual-landmark-guided option (1.51 +/- 0.43 mm). Visualizing individual brainstem borders demonstrated more consistent and uniform overlap for ABC compared to traditional global co-registration techniques. Improved robustness (lower susceptibility to outliers) was demonstrated with ABC through lower inter-subject RMS error variance compared with traditional co-registration methods. The use of easily available and validated tools (AFNI and FSL) for this method should ease adoption by other investigators interested in brainstem data group analysis.

  19. Automating data analysis for two-dimensional gas chromatography/time-of-flight mass spectrometry non-targeted analysis of comparative samples.

    PubMed

    Titaley, Ivan A; Ogba, O Maduka; Chibwe, Leah; Hoh, Eunha; Cheong, Paul H-Y; Simonich, Staci L Massey

    2018-03-16

    Non-targeted analysis of environmental samples, using comprehensive two-dimensional gas chromatography coupled with time-of-flight mass spectrometry (GC × GC/ToF-MS), poses significant data analysis challenges due to the large number of possible analytes. Non-targeted data analysis of complex mixtures is prone to human bias and is laborious, particularly for comparative environmental samples such as contaminated soil pre- and post-bioremediation. To address this research bottleneck, we developed OCTpy, a Python™ script that acts as a data reduction filter to automate GC × GC/ToF-MS data analysis from LECO ® ChromaTOF ® software and facilitates selection of analytes of interest based on peak area comparison between comparative samples. We used data from polycyclic aromatic hydrocarbon (PAH) contaminated soil, pre- and post-bioremediation, to assess the effectiveness of OCTpy in facilitating the selection of analytes that have formed or degraded following treatment. Using datasets from the soil extracts pre- and post-bioremediation, OCTpy selected, on average, 18% of the initial suggested analytes generated by the LECO ® ChromaTOF ® software Statistical Compare feature. Based on this list, 63-100% of the candidate analytes identified by a highly trained individual were also selected by OCTpy. This process was accomplished in several minutes per sample, whereas manual data analysis took several hours per sample. OCTpy automates the analysis of complex mixtures of comparative samples, reduces the potential for human error during heavy data handling and decreases data analysis time by at least tenfold. Copyright © 2018 Elsevier B.V. All rights reserved.

  20. Annual Changes of Paddy Rice Planting Areas in Northeastern Asia from MODIS images in 2000-2014

    NASA Astrophysics Data System (ADS)

    Xiao, X.; Zhang, G.; Dong, J.; Menarguez, M. A.; Kou, W.; Jin, C.; Qin, Y.; Zhou, Y.; Wang, J.; Moore, B., III

    2014-12-01

    Knowledge of the area and spatial distribution of paddy rice is important for assessment of food security, management of water resources, estimation of greenhouse gas (methane) emissions, and understanding avian influenza virus transmission. Over the past two decades, paddy rice cultivation has expanded northward in temperate and cold temperate zones, particularly in Northeastern China. There is a need to quantify and map changes in paddy rice planting areas in Northeastern Asia (Japan, North and South Korea, and northeast China) at annual interval. We developed a pixel- and phenology-based image analysis system, MODIS-RICE, to map the paddy rice in Northeastern Asia by using multi-temporal MODIS thermal and surface reflectance imagery. Paddy rice fields during the flooding and transplanting phases have unique physical and spectral characteristics, which make it possible for the development of an automated and robust algorithm to track flooding and transplanting phases of paddy rice fields over time. In this presentation, we will show the MODIS-based annual maps of paddy rice planting area in the Northeastern Asia from 2000-2014 (500-m spatial resolution). Accuracy assessments using high-resolution images show that the resultant paddy rice map of Northeastern Asia had a comparable accuracy to the existing products, including 2010 Landsat-based National Land Cover Dataset (NLCD) of China, the 2010 RapidEye-based paddy rice map in North Korea, and the 2010 AVNIR-2-based National Land Cover Dataset in Japan in terms of both area and spatial pattern of paddy rice. This study has demonstrated that our novel MODIS-Rice system, which use both thermal and optical MODIS data over a year, are simple and robust tools to identify and map paddy rice fields in temperate and cold temperate zones.

  1. Automated quantification of renal interstitial fibrosis for computer-aided diagnosis: A comprehensive tissue structure segmentation method.

    PubMed

    Tey, Wei Keat; Kuang, Ye Chow; Ooi, Melanie Po-Leen; Khoo, Joon Joon

    2018-03-01

    Interstitial fibrosis in renal biopsy samples is a scarring tissue structure that may be visually quantified by pathologists as an indicator to the presence and extent of chronic kidney disease. The standard method of quantification by visual evaluation presents reproducibility issues in the diagnoses. This study proposes an automated quantification system for measuring the amount of interstitial fibrosis in renal biopsy images as a consistent basis of comparison among pathologists. The system extracts and segments the renal tissue structures based on colour information and structural assumptions of the tissue structures. The regions in the biopsy representing the interstitial fibrosis are deduced through the elimination of non-interstitial fibrosis structures from the biopsy area and quantified as a percentage of the total area of the biopsy sample. A ground truth image dataset has been manually prepared by consulting an experienced pathologist for the validation of the segmentation algorithms. The results from experiments involving experienced pathologists have demonstrated a good correlation in quantification result between the automated system and the pathologists' visual evaluation. Experiments investigating the variability in pathologists also proved the automated quantification error rate to be on par with the average intra-observer variability in pathologists' quantification. Interstitial fibrosis in renal biopsy samples is a scarring tissue structure that may be visually quantified by pathologists as an indicator to the presence and extent of chronic kidney disease. The standard method of quantification by visual evaluation presents reproducibility issues in the diagnoses due to the uncertainties in human judgement. An automated quantification system for accurately measuring the amount of interstitial fibrosis in renal biopsy images is presented as a consistent basis of comparison among pathologists. The system identifies the renal tissue structures through knowledge-based rules employing colour space transformations and structural features extraction from the images. In particular, the renal glomerulus identification is based on a multiscale textural feature analysis and a support vector machine. The regions in the biopsy representing interstitial fibrosis are deduced through the elimination of non-interstitial fibrosis structures from the biopsy area. The experiments conducted evaluate the system in terms of quantification accuracy, intra- and inter-observer variability in visual quantification by pathologists, and the effect introduced by the automated quantification system on the pathologists' diagnosis. A 40-image ground truth dataset has been manually prepared by consulting an experienced pathologist for the validation of the segmentation algorithms. The results from experiments involving experienced pathologists have demonstrated an average error of 9 percentage points in quantification result between the automated system and the pathologists' visual evaluation. Experiments investigating the variability in pathologists involving samples from 70 kidney patients also proved the automated quantification error rate to be on par with the average intra-observer variability in pathologists' quantification. The accuracy of the proposed quantification system has been validated with the ground truth dataset and compared against the pathologists' quantification results. It has been shown that the correlation between different pathologists' estimation of interstitial fibrosis area has significantly improved, demonstrating the effectiveness of the quantification system as a diagnostic aide. Copyright © 2017 Elsevier B.V. All rights reserved.

  2. Engine Icing Data - An Analytics Approach

    NASA Technical Reports Server (NTRS)

    Fitzgerald, Brooke A.; Flegel, Ashlie B.

    2017-01-01

    Engine icing researchers at the NASA Glenn Research Center use the Escort data acquisition system in the Propulsion Systems Laboratory (PSL) to generate and collect a tremendous amount of data every day. Currently these researchers spend countless hours processing and formatting their data, selecting important variables, and plotting relationships between variables, all by hand, generally analyzing data in a spreadsheet-style program (such as Microsoft Excel). Though spreadsheet-style analysis is familiar and intuitive to many, processing data in spreadsheets is often unreproducible and small mistakes are easily overlooked. Spreadsheet-style analysis is also time inefficient. The same formatting, processing, and plotting procedure has to be repeated for every dataset, which leads to researchers performing the same tedious data munging process over and over instead of making discoveries within their data. This paper documents a data analysis tool written in Python hosted in a Jupyter notebook that vastly simplifies the analysis process. From the file path of any folder containing time series datasets, this tool batch loads every dataset in the folder, processes the datasets in parallel, and ingests them into a widget where users can search for and interactively plot subsets of columns in a number of ways with a click of a button, easily and intuitively comparing their data and discovering interesting dynamics. Furthermore, comparing variables across data sets and integrating video data (while extremely difficult with spreadsheet-style programs) is quite simplified in this tool. This tool has also gathered interest outside the engine icing branch, and will be used by researchers across NASA Glenn Research Center. This project exemplifies the enormous benefit of automating data processing, analysis, and visualization, and will help researchers move from raw data to insight in a much smaller time frame.

  3. TU-F-CAMPUS-I-05: Semi-Automated, Open Source MRI Quality Assurance and Quality Control Program for Multi-Unit Institution

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Yung, J; Stefan, W; Reeve, D

    2015-06-15

    Purpose: Phantom measurements allow for the performance of magnetic resonance (MR) systems to be evaluated. Association of Physicists in Medicine (AAPM) Report No. 100 Acceptance Testing and Quality Assurance Procedures for MR Imaging Facilities, American College of Radiology (ACR) MR Accreditation Program MR phantom testing, and ACR MRI quality control (QC) program documents help to outline specific tests for establishing system performance baselines as well as system stability over time. Analyzing and processing tests from multiple systems can be time-consuming for medical physicists. Besides determining whether tests are within predetermined limits or criteria, monitoring longitudinal trends can also help preventmore » costly downtime of systems during clinical operation. In this work, a semi-automated QC program was developed to analyze and record measurements in a database that allowed for easy access to historical data. Methods: Image analysis was performed on 27 different MR systems of 1.5T and 3.0T field strengths from GE and Siemens manufacturers. Recommended measurements involved the ACR MRI Accreditation Phantom, spherical homogenous phantoms, and a phantom with an uniform hole pattern. Measurements assessed geometric accuracy and linearity, position accuracy, image uniformity, signal, noise, ghosting, transmit gain, center frequency, and magnetic field drift. The program was designed with open source tools, employing Linux, Apache, MySQL database and Python programming language for the front and backend. Results: Processing time for each image is <2 seconds. Figures are produced to show regions of interests (ROIs) for analysis. Historical data can be reviewed to compare previous year data and to inspect for trends. Conclusion: A MRI quality assurance and QC program is necessary for maintaining high quality, ACR MRI Accredited MR programs. A reviewable database of phantom measurements assists medical physicists with processing and monitoring of large datasets. Longitudinal data can reveal trends that although are within passing criteria indicate underlying system issues.« less

  4. Automated Melanoma Recognition in Dermoscopy Images via Very Deep Residual Networks.

    PubMed

    Yu, Lequan; Chen, Hao; Dou, Qi; Qin, Jing; Heng, Pheng-Ann

    2017-04-01

    Automated melanoma recognition in dermoscopy images is a very challenging task due to the low contrast of skin lesions, the huge intraclass variation of melanomas, the high degree of visual similarity between melanoma and non-melanoma lesions, and the existence of many artifacts in the image. In order to meet these challenges, we propose a novel method for melanoma recognition by leveraging very deep convolutional neural networks (CNNs). Compared with existing methods employing either low-level hand-crafted features or CNNs with shallower architectures, our substantially deeper networks (more than 50 layers) can acquire richer and more discriminative features for more accurate recognition. To take full advantage of very deep networks, we propose a set of schemes to ensure effective training and learning under limited training data. First, we apply the residual learning to cope with the degradation and overfitting problems when a network goes deeper. This technique can ensure that our networks benefit from the performance gains achieved by increasing network depth. Then, we construct a fully convolutional residual network (FCRN) for accurate skin lesion segmentation, and further enhance its capability by incorporating a multi-scale contextual information integration scheme. Finally, we seamlessly integrate the proposed FCRN (for segmentation) and other very deep residual networks (for classification) to form a two-stage framework. This framework enables the classification network to extract more representative and specific features based on segmented results instead of the whole dermoscopy images, further alleviating the insufficiency of training data. The proposed framework is extensively evaluated on ISBI 2016 Skin Lesion Analysis Towards Melanoma Detection Challenge dataset. Experimental results demonstrate the significant performance gains of the proposed framework, ranking the first in classification and the second in segmentation among 25 teams and 28 teams, respectively. This study corroborates that very deep CNNs with effective training mechanisms can be employed to solve complicated medical image analysis tasks, even with limited training data.

  5. An Automated Method to Identify Mesoscale Convective Complexes (MCCs) Implementing Graph Theory

    NASA Astrophysics Data System (ADS)

    Whitehall, K. D.; Mattmann, C. A.; Jenkins, G. S.; Waliser, D. E.; Rwebangira, R.; Demoz, B.; Kim, J.; Goodale, C. E.; Hart, A. F.; Ramirez, P.; Joyce, M. J.; Loikith, P.; Lee, H.; Khudikyan, S.; Boustani, M.; Goodman, A.; Zimdars, P. A.; Whittell, J.

    2013-12-01

    Mesoscale convective complexes (MCCs) are convectively-driven weather systems with a duration of ~10 - 12 hours and contributions of large amounts to the rainfall daily and monthly totals. More than 400 MCCs occur annually over various locations on the globe. In West Africa, ~170 MCCs occur annually during the 180 days representing the summer months (June - November), and contribute ~75% of the annual wet season rainfall. The main objective of this study is to improve automatic identification of MCC over West Africa. The spatial expanse of MCCs and the spatio-temporal variability in their convective characteristics make them difficult to characterize even in dense networks of radars and/or surface gauges. As such there exist criteria for identifying MCCs with satellite images - mostly using infrared (IR) data. Automated MCC identification methods are based on forward and/or backward in time spatial-temporal analysis of the IR satellite data and characteristically incorporate a manual component as these algorithms routinely falter with merging and splitting cloud systems between satellite images. However, these algorithms are not readily transferable to voluminous data or other satellite-derived datasets (e.g. TRMM), thus hindering comprehensive studies of these features both at weather and climate timescales. Recognizing the existing limitations of automated methods, this study explores the applicability of graph theory to creating a fully automated method for deriving a West African MCC dataset from hourly infrared satellite images between 2001- 2012. Graph theory, though not heavily implemented in the atmospheric sciences, has been used for the predicting (nowcasting) of thunderstorms from radar and satellite data by considering the relationship between atmospheric variables at a given time, or for the spatial-temporal analysis of cloud volumes. From these few studies, graph theory appears to be innately applicable to the complexity, non-linearity and inherent chaos of the atmospheric system. Our preliminary results show that the use of graph theory improves data management thus allowing for longer periods to be studied, and creates a transferable method that allows for other data to be utilized.

  6. Multi-Centrality Graph Spectral Decompositions and Their Application to Cyber Intrusion Detection

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Chen, Pin-Yu; Choudhury, Sutanay; Hero, Alfred

    Many modern datasets can be represented as graphs and hence spectral decompositions such as graph principal component analysis (PCA) can be useful. Distinct from previous graph decomposition approaches based on subspace projection of a single topological feature, e.g., the centered graph adjacency matrix (graph Laplacian), we propose spectral decomposition approaches to graph PCA and graph dictionary learning that integrate multiple features, including graph walk statistics, centrality measures and graph distances to reference nodes. In this paper we propose a new PCA method for single graph analysis, called multi-centrality graph PCA (MC-GPCA), and a new dictionary learning method for ensembles ofmore » graphs, called multi-centrality graph dictionary learning (MC-GDL), both based on spectral decomposition of multi-centrality matrices. As an application to cyber intrusion detection, MC-GPCA can be an effective indicator of anomalous connectivity pattern and MC-GDL can provide discriminative basis for attack classification.« less

  7. A simple rapid process for semi-automated brain extraction from magnetic resonance images of the whole mouse head.

    PubMed

    Delora, Adam; Gonzales, Aaron; Medina, Christopher S; Mitchell, Adam; Mohed, Abdul Faheem; Jacobs, Russell E; Bearer, Elaine L

    2016-01-15

    Magnetic resonance imaging (MRI) is a well-developed technique in neuroscience. Limitations in applying MRI to rodent models of neuropsychiatric disorders include the large number of animals required to achieve statistical significance, and the paucity of automation tools for the critical early step in processing, brain extraction, which prepares brain images for alignment and voxel-wise statistics. This novel timesaving automation of template-based brain extraction ("skull-stripping") is capable of quickly and reliably extracting the brain from large numbers of whole head images in a single step. The method is simple to install and requires minimal user interaction. This method is equally applicable to different types of MR images. Results were evaluated with Dice and Jacquard similarity indices and compared in 3D surface projections with other stripping approaches. Statistical comparisons demonstrate that individual variation of brain volumes are preserved. A downloadable software package not otherwise available for extraction of brains from whole head images is included here. This software tool increases speed, can be used with an atlas or a template from within the dataset, and produces masks that need little further refinement. Our new automation can be applied to any MR dataset, since the starting point is a template mask generated specifically for that dataset. The method reliably and rapidly extracts brain images from whole head images, rendering them useable for subsequent analytical processing. This software tool will accelerate the exploitation of mouse models for the investigation of human brain disorders by MRI. Copyright © 2015 Elsevier B.V. All rights reserved.

  8. Automated diagnosis of Alzheimer's disease with multi-atlas based whole brain segmentations

    NASA Astrophysics Data System (ADS)

    Luo, Yuan; Tang, Xiaoying

    2017-03-01

    Voxel-based analysis is widely used in quantitative analysis of structural brain magnetic resonance imaging (MRI) and automated disease detection, such as Alzheimer's disease (AD). However, noise at the voxel level may cause low sensitivity to AD-induced structural abnormalities. This can be addressed with the use of a whole brain structural segmentation approach which greatly reduces the dimension of features (the number of voxels). In this paper, we propose an automatic AD diagnosis system that combines such whole brain segmen- tations with advanced machine learning methods. We used a multi-atlas segmentation technique to parcellate T1-weighted images into 54 distinct brain regions and extract their structural volumes to serve as the features for principal-component-analysis-based dimension reduction and support-vector-machine-based classification. The relationship between the number of retained principal components (PCs) and the diagnosis accuracy was systematically evaluated, in a leave-one-out fashion, based on 28 AD subjects and 23 age-matched healthy subjects. Our approach yielded pretty good classification results with 96.08% overall accuracy being achieved using the three foremost PCs. In addition, our approach yielded 96.43% specificity, 100% sensitivity, and 0.9891 area under the receiver operating characteristic curve.

  9. Automated image analysis reveals the dynamic 3-dimensional organization of multi-ciliary arrays

    PubMed Central

    Galati, Domenico F.; Abuin, David S.; Tauber, Gabriel A.; Pham, Andrew T.; Pearson, Chad G.

    2016-01-01

    ABSTRACT Multi-ciliated cells (MCCs) use polarized fields of undulating cilia (ciliary array) to produce fluid flow that is essential for many biological processes. Cilia are positioned by microtubule scaffolds called basal bodies (BBs) that are arranged within a spatially complex 3-dimensional geometry (3D). Here, we develop a robust and automated computational image analysis routine to quantify 3D BB organization in the ciliate, Tetrahymena thermophila. Using this routine, we generate the first morphologically constrained 3D reconstructions of Tetrahymena cells and elucidate rules that govern the kinetics of MCC organization. We demonstrate the interplay between BB duplication and cell size expansion through the cell cycle. In mutant cells, we identify a potential BB surveillance mechanism that balances large gaps in BB spacing by increasing the frequency of closely spaced BBs in other regions of the cell. Finally, by taking advantage of a mutant predisposed to BB disorganization, we locate the spatial domains that are most prone to disorganization by environmental stimuli. Collectively, our analyses reveal the importance of quantitative image analysis to understand the principles that guide the 3D organization of MCCs. PMID:26700722

  10. a Fully Automated Pipeline for Classification Tasks with AN Application to Remote Sensing

    NASA Astrophysics Data System (ADS)

    Suzuki, K.; Claesen, M.; Takeda, H.; De Moor, B.

    2016-06-01

    Nowadays deep learning has been intensively in spotlight owing to its great victories at major competitions, which undeservedly pushed `shallow' machine learning methods, relatively naive/handy algorithms commonly used by industrial engineers, to the background in spite of their facilities such as small requisite amount of time/dataset for training. We, with a practical point of view, utilized shallow learning algorithms to construct a learning pipeline such that operators can utilize machine learning without any special knowledge, expensive computation environment, and a large amount of labelled data. The proposed pipeline automates a whole classification process, namely feature-selection, weighting features and the selection of the most suitable classifier with optimized hyperparameters. The configuration facilitates particle swarm optimization, one of well-known metaheuristic algorithms for the sake of generally fast and fine optimization, which enables us not only to optimize (hyper)parameters but also to determine appropriate features/classifier to the problem, which has conventionally been a priori based on domain knowledge and remained untouched or dealt with naïve algorithms such as grid search. Through experiments with the MNIST and CIFAR-10 datasets, common datasets in computer vision field for character recognition and object recognition problems respectively, our automated learning approach provides high performance considering its simple setting (i.e. non-specialized setting depending on dataset), small amount of training data, and practical learning time. Moreover, compared to deep learning the performance stays robust without almost any modification even with a remote sensing object recognition problem, which in turn indicates that there is a high possibility that our approach contributes to general classification problems.

  11. Use of In-Situ and Remotely Sensed Snow Observations for the National Water Model in Both an Analysis and Calibration Framework.

    NASA Astrophysics Data System (ADS)

    Karsten, L. R.; Gochis, D.; Dugger, A. L.; McCreight, J. L.; Barlage, M. J.; Fall, G. M.; Olheiser, C.

    2017-12-01

    Since version 1.0 of the National Water Model (NWM) has gone operational in Summer 2016, several upgrades to the model have occurred to improve hydrologic prediction for the continental United States. Version 1.1 of the NWM (Spring 2017) includes upgrades to parameter datasets impacting land surface hydrologic processes. These parameter datasets were upgraded using an automated calibration workflow that utilizes the Dynamic Data Search (DDS) algorithm to adjust parameter values using observed streamflow. As such, these upgrades to parameter values took advantage of various observations collected for snow analysis. In particular, in-situ SNOTEL observations in the Western US, volunteer in-situ observations across the entire US, gamma-derived snow water equivalent (SWE) observations courtesy of the NWS NOAA Corps program, gridded snow depth and SWE products from the Jet Propulsion Laboratory (JPL) Airborne Snow Observatory (ASO), gridded remotely sensed satellite-based snow products (MODIS,AMSR2,VIIRS,ATMS), and gridded SWE from the NWS Snow Data Assimilation System (SNODAS). This study explores the use of these observations to quantify NWM error and improvements from version 1.0 to version 1.1, along with subsequent work since then. In addition, this study explores the use of snow observations for use within the automated calibration workflow. Gridded parameter fields impacting the accumulation and ablation of snow states in the NWM were adjusted and calibrated using gridded remotely sensed snow states, SNODAS products, and in-situ snow observations. This calibration adjustment took place over various ecological regions in snow-dominated parts of the US for a retrospective period of time to capture a variety of climatological conditions. Specifically, the latest calibrated parameters impacting streamflow were held constant and only parameters impacting snow physics were tuned using snow observations and analysis. The adjusted parameter datasets were then used to force the model over an independent period for analysis against both snow and streamflow observations to see if improvements took place. The goal of this work is to further improve snow physics in the NWM, along with identifying areas where further work will take place in the future, such as data assimilation or further forcing improvements.

  12. Associative image analysis: a method for automated quantification of 3D multi-parameter images of brain tissue

    PubMed Central

    Bjornsson, Christopher S; Lin, Gang; Al-Kofahi, Yousef; Narayanaswamy, Arunachalam; Smith, Karen L; Shain, William; Roysam, Badrinath

    2009-01-01

    Brain structural complexity has confounded prior efforts to extract quantitative image-based measurements. We present a systematic ‘divide and conquer’ methodology for analyzing three-dimensional (3D) multi-parameter images of brain tissue to delineate and classify key structures, and compute quantitative associations among them. To demonstrate the method, thick (~100 μm) slices of rat brain tissue were labeled using 3 – 5 fluorescent signals, and imaged using spectral confocal microscopy and unmixing algorithms. Automated 3D segmentation and tracing algorithms were used to delineate cell nuclei, vasculature, and cell processes. From these segmentations, a set of 23 intrinsic and 8 associative image-based measurements was computed for each cell. These features were used to classify astrocytes, microglia, neurons, and endothelial cells. Associations among cells and between cells and vasculature were computed and represented as graphical networks to enable further analysis. The automated results were validated using a graphical interface that permits investigator inspection and corrective editing of each cell in 3D. Nuclear counting accuracy was >89%, and cell classification accuracy ranged from 81–92% depending on cell type. We present a software system named FARSIGHT implementing our methodology. Its output is a detailed XML file containing measurements that may be used for diverse quantitative hypothesis-driven and exploratory studies of the central nervous system. PMID:18294697

  13. Biogeochemical Typing of Paddy Field by a Data-Driven Approach Revealing Sub-Systems within a Complex Environment - A Pipeline to Filtrate, Organize and Frame Massive Dataset from Multi-Omics Analyses

    PubMed Central

    Ogawa, Diogo M. O.; Moriya, Shigeharu; Tsuboi, Yuuri; Date, Yasuhiro; Prieto-da-Silva, Álvaro R. B.; Rádis-Baptista, Gandhi; Yamane, Tetsuo; Kikuchi, Jun

    2014-01-01

    We propose the technique of biogeochemical typing (BGC typing) as a novel methodology to set forth the sub-systems of organismal communities associated to the correlated chemical profiles working within a larger complex environment. Given the intricate characteristic of both organismal and chemical consortia inherent to the nature, many environmental studies employ the holistic approach of multi-omics analyses undermining as much information as possible. Due to the massive amount of data produced applying multi-omics analyses, the results are hard to visualize and to process. The BGC typing analysis is a pipeline built using integrative statistical analysis that can treat such huge datasets filtering, organizing and framing the information based on the strength of the various mutual trends of the organismal and chemical fluctuations occurring simultaneously in the environment. To test our technique of BGC typing, we choose a rich environment abounding in chemical nutrients and organismal diversity: the surficial freshwater from Japanese paddy fields and surrounding waters. To identify the community consortia profile we employed metagenomics as high throughput sequencing (HTS) for the fragments amplified from Archaea rRNA, universal 16S rRNA and 18S rRNA; to assess the elemental content we employed ionomics by inductively coupled plasma optical emission spectroscopy (ICP-OES); and for the organic chemical profile, metabolomics employing both Fourier transformed infrared (FT-IR) spectroscopy and proton nuclear magnetic resonance (1H-NMR) all these analyses comprised our multi-omics dataset. The similar trends between the community consortia against the chemical profiles were connected through correlation. The result was then filtered, organized and framed according to correlation strengths and peculiarities. The output gave us four BGC types displaying uniqueness in community and chemical distribution, diversity and richness. We conclude therefore that the BGC typing is a successful technique for elucidating the sub-systems of organismal communities with associated chemical profiles in complex ecosystems. PMID:25330259

  14. Biogeochemical typing of paddy field by a data-driven approach revealing sub-systems within a complex environment--a pipeline to filtrate, organize and frame massive dataset from multi-omics analyses.

    PubMed

    Ogawa, Diogo M O; Moriya, Shigeharu; Tsuboi, Yuuri; Date, Yasuhiro; Prieto-da-Silva, Álvaro R B; Rádis-Baptista, Gandhi; Yamane, Tetsuo; Kikuchi, Jun

    2014-01-01

    We propose the technique of biogeochemical typing (BGC typing) as a novel methodology to set forth the sub-systems of organismal communities associated to the correlated chemical profiles working within a larger complex environment. Given the intricate characteristic of both organismal and chemical consortia inherent to the nature, many environmental studies employ the holistic approach of multi-omics analyses undermining as much information as possible. Due to the massive amount of data produced applying multi-omics analyses, the results are hard to visualize and to process. The BGC typing analysis is a pipeline built using integrative statistical analysis that can treat such huge datasets filtering, organizing and framing the information based on the strength of the various mutual trends of the organismal and chemical fluctuations occurring simultaneously in the environment. To test our technique of BGC typing, we choose a rich environment abounding in chemical nutrients and organismal diversity: the surficial freshwater from Japanese paddy fields and surrounding waters. To identify the community consortia profile we employed metagenomics as high throughput sequencing (HTS) for the fragments amplified from Archaea rRNA, universal 16S rRNA and 18S rRNA; to assess the elemental content we employed ionomics by inductively coupled plasma optical emission spectroscopy (ICP-OES); and for the organic chemical profile, metabolomics employing both Fourier transformed infrared (FT-IR) spectroscopy and proton nuclear magnetic resonance (1H-NMR) all these analyses comprised our multi-omics dataset. The similar trends between the community consortia against the chemical profiles were connected through correlation. The result was then filtered, organized and framed according to correlation strengths and peculiarities. The output gave us four BGC types displaying uniqueness in community and chemical distribution, diversity and richness. We conclude therefore that the BGC typing is a successful technique for elucidating the sub-systems of organismal communities with associated chemical profiles in complex ecosystems.

  15. Development and validation of an automated and marker-free CT-based spatial analysis method (CTSA) for assessment of femoral hip implant migration: In vitro accuracy and precision comparable to that of radiostereometric analysis (RSA).

    PubMed

    Scheerlinck, Thierry; Polfliet, Mathias; Deklerck, Rudi; Van Gompel, Gert; Buls, Nico; Vandemeulebroucke, Jef

    2016-01-01

    We developed a marker-free automated CT-based spatial analysis (CTSA) method to detect stem-bone migration in consecutive CT datasets and assessed the accuracy and precision in vitro. Our aim was to demonstrate that in vitro accuracy and precision of CTSA is comparable to that of radiostereometric analysis (RSA). Stem and bone were segmented in 2 CT datasets and both were registered pairwise. The resulting rigid transformations were compared and transferred to an anatomically sound coordinate system, taking the stem as reference. This resulted in 3 translation parameters and 3 rotation parameters describing the relative amount of stem-bone displacement, and it allowed calculation of the point of maximal stem migration. Accuracy was evaluated in 39 comparisons by imposing known stem migration on a stem-bone model. Precision was estimated in 20 comparisons based on a zero-migration model, and in 5 patients without stem loosening. Limits of the 95% tolerance intervals (TIs) for accuracy did not exceed 0.28 mm for translations and 0.20° for rotations (largest standard deviation of the signed error (SD(SE)): 0.081 mm and 0.057°). In vitro, limits of the 95% TI for precision in a clinically relevant setting (8 comparisons) were below 0.09 mm and 0.14° (largest SD(SE): 0.012 mm and 0.020°). In patients, the precision was lower, but acceptable, and dependent on CT scan resolution. CTSA allows detection of stem-bone migration with an accuracy and precision comparable to that of RSA. It could be valuable for evaluation of subtle stem loosening in clinical practice.

  16. Xray: N-dimensional, labeled arrays for analyzing physical datasets in Python

    NASA Astrophysics Data System (ADS)

    Hoyer, S.

    2015-12-01

    Efficient analysis of geophysical datasets requires tools that both preserve and utilize metadata, and that transparently scale to process large datas. Xray is such a tool, in the form of an open source Python library for analyzing the labeled, multi-dimensional array (tensor) datasets that are ubiquitous in the Earth sciences. Xray's approach pairs Python data structures based on the data model of the netCDF file format with the proven design and user interface of pandas, the popular Python data analysis library for labeled tabular data. On top of the NumPy array, xray adds labeled dimensions (e.g., "time") and coordinate values (e.g., "2015-04-10"), which it uses to enable a host of operations powered by these labels: selection, aggregation, alignment, broadcasting, split-apply-combine, interoperability with pandas and serialization to netCDF/HDF5. Many of these operations are enabled by xray's tight integration with pandas. Finally, to allow for easy parallelism and to enable its labeled data operations to scale to datasets that does not fit into memory, xray integrates with the parallel processing library dask.

  17. SYRIAC: The SYstematic Review Information Automated Collection System A Data Warehouse for Facilitating Automated Biomedical Text Classification

    PubMed Central

    Yang, Jianji J.; Cohen, Aaron M.; McDonagh, Marian S.

    2008-01-01

    Automatic document classification can be valuable in increasing the efficiency in updating systematic reviews (SR). In order for the machine learning process to work well, it is critical to create and maintain high-quality training datasets consisting of expert SR inclusion/exclusion decisions. This task can be laborious, especially when the number of topics is large and source data format is inconsistent. To approach this problem, we build an automated system to streamline the required steps, from initial notification of update in source annotation files to loading the data warehouse, along with a web interface to monitor the status of each topic. In our current collection of 26 SR topics, we were able to standardize almost all of the relevance judgments and recovered PMIDs for over 80% of all articles. Of those PMIDs, over 99% were correct in a manual random sample study. Our system performs an essential function in creating training and evaluation datasets for SR text mining research. PMID:18999194

  18. Performance evaluation of automated segmentation software on optical coherence tomography volume data

    PubMed Central

    Tian, Jing; Varga, Boglarka; Tatrai, Erika; Fanni, Palya; Somfai, Gabor Mark; Smiddy, William E.

    2016-01-01

    Over the past two decades a significant number of OCT segmentation approaches have been proposed in the literature. Each methodology has been conceived for and/or evaluated using specific datasets that do not reflect the complexities of the majority of widely available retinal features observed in clinical settings. In addition, there does not exist an appropriate OCT dataset with ground truth that reflects the realities of everyday retinal features observed in clinical settings. While the need for unbiased performance evaluation of automated segmentation algorithms is obvious, the validation process of segmentation algorithms have been usually performed by comparing with manual labelings from each study and there has been a lack of common ground truth. Therefore, a performance comparison of different algorithms using the same ground truth has never been performed. This paper reviews research-oriented tools for automated segmentation of the retinal tissue on OCT images. It also evaluates and compares the performance of these software tools with a common ground truth. PMID:27159849

  19. Automated anatomical labeling of bronchial branches extracted from CT datasets based on machine learning and combination optimization and its application to bronchoscope guidance.

    PubMed

    Mori, Kensaku; Ota, Shunsuke; Deguchi, Daisuke; Kitasaka, Takayuki; Suenaga, Yasuhito; Iwano, Shingo; Hasegawa, Yosihnori; Takabatake, Hirotsugu; Mori, Masaki; Natori, Hiroshi

    2009-01-01

    This paper presents a method for the automated anatomical labeling of bronchial branches extracted from 3D CT images based on machine learning and combination optimization. We also show applications of anatomical labeling on a bronchoscopy guidance system. This paper performs automated labeling by using machine learning and combination optimization. The actual procedure consists of four steps: (a) extraction of tree structures of the bronchus regions extracted from CT images, (b) construction of AdaBoost classifiers, (c) computation of candidate names for all branches by using the classifiers, (d) selection of best combination of anatomical names. We applied the proposed method to 90 cases of 3D CT datasets. The experimental results showed that the proposed method can assign correct anatomical names to 86.9% of the bronchial branches up to the sub-segmental lobe branches. Also, we overlaid the anatomical names of bronchial branches on real bronchoscopic views to guide real bronchoscopy.

  20. Automated detection and analysis of particle beams in laser-plasma accelerator simulations

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ushizima, Daniela Mayumi; Geddes, C.G.; Cormier-Michel, E.

    Numerical simulations of laser-plasma wakefield (particle) accelerators model the acceleration of electrons trapped in plasma oscillations (wakes) left behind when an intense laser pulse propagates through the plasma. The goal of these simulations is to better understand the process involved in plasma wake generation and how electrons are trapped and accelerated by the wake. Understanding of such accelerators, and their development, offer high accelerating gradients, potentially reducing size and cost of new accelerators. One operating regime of interest is where a trapped subset of electrons loads the wake and forms an isolated group of accelerated particles with low spread inmore » momentum and position, desirable characteristics for many applications. The electrons trapped in the wake may be accelerated to high energies, the plasma gradient in the wake reaching up to a gigaelectronvolt per centimeter. High-energy electron accelerators power intense X-ray radiation to terahertz sources, and are used in many applications including medical radiotherapy and imaging. To extract information from the simulation about the quality of the beam, a typical approach is to examine plots of the entire dataset, visually determining the adequate parameters necessary to select a subset of particles, which is then further analyzed. This procedure requires laborious examination of massive data sets over many time steps using several plots, a routine that is unfeasible for large data collections. Demand for automated analysis is growing along with the volume and size of simulations. Current 2D LWFA simulation datasets are typically between 1GB and 100GB in size, but simulations in 3D are of the order of TBs. The increase in the number of datasets and dataset sizes leads to a need for automatic routines to recognize particle patterns as particle bunches (beam of electrons) for subsequent analysis. Because of the growth in dataset size, the application of machine learning techniques for scientific data mining is increasingly considered. In plasma simulations, Bagherjeiran et al. presented a comprehensive report on applying graph-based techniques for orbit classification. They used the KAM classifier to label points and components in single and multiple orbits. Love et al. conducted an image space analysis of coherent structures in plasma simulations. They used a number of segmentation and region-growing techniques to isolate regions of interest in orbit plots. Both approaches analyzed particle accelerator data, targeting the system dynamics in terms of particle orbits. However, they did not address particle dynamics as a function of time or inspected the behavior of bunches of particles. Ruebel et al. addressed the visual analysis of massive laser wakefield acceleration (LWFA) simulation data using interactive procedures to query the data. Sophisticated visualization tools were provided to inspect the data manually. Ruebel et al. have integrated these tools to the visualization and analysis system VisIt, in addition to utilizing efficient data management based on HDF5, H5Part, and the index/query tool FastBit. In Ruebel et al. proposed automatic beam path analysis using a suite of methods to classify particles in simulation data and to analyze their temporal evolution. To enable researchers to accurately define particle beams, the method computes a set of measures based on the path of particles relative to the distance of the particles to a beam. To achieve good performance, this framework uses an analysis pipeline designed to quickly reduce the amount of data that needs to be considered in the actual path distance computation. As part of this process, region-growing methods are utilized to detect particle bunches at single time steps. Efficient data reduction is essential to enable automated analysis of large data sets as described in the next section, where data reduction methods are steered to the particular requirements of our clustering analysis. Previously, we have described the application of a set of algorithms to automate the data analysis and classification of particle beams in the LWFA simulation data, identifying locations with high density of high energy particles. These algorithms detected high density locations (nodes) in each time step, i.e. maximum points on the particle distribution for only one spatial variable. Each node was correlated to a node in previous or later time steps by linking these nodes according to a pruned minimum spanning tree (PMST). We call the PMST representation 'a lifetime diagram', which is a graphical tool to show temporal information of high dense groups of particles in the longitudinal direction for the time series. Electron bunch compactness was described by another step of the processing, designed to partition each time step, using fuzzy clustering, into a fixed number of clusters.« less

Top